
Retries should not start from scratch
Retries are normal. A job times out, a publish step breaks, a review gets kicked back, or a dependency fails at the wrong moment. Teams can live with that. What slows everything down is when the retry returns like the first attempt never happened.
That is where the second cost shows up. Someone has to reconstruct what already cleared, what failed, what changed since the last pass, and what should be skipped this time. The original failure may have taken two minutes. The rebuild around it takes twenty.
Good operations make retries cumulative. If four steps already cleared, the fifth step should be the one that needs attention. If a known failure point exists, it should be visible before the next run starts. If the fix is already known, the retry should carry that context with it instead of making the team rediscover it.
A retry should not erase the last attempt. It should build on it. When every rerun starts from zero, the system is multiplying the cost of failure instead of containing it.


