Checkpointing strategies to protect parallel jobs from non-memoryless fail-stop errors
Source code
used for performance evaluation.