Checkpointing strategies to protect parallel jobs from non-memoryless fail-stop errors