forked from apache/flink
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[FLINK-9788] Fix ExecutionGraph inconsistency for global failures whe…
…n restarting The problem was that a concurrent global failure could start a concurrent restart operation without terminating the previous operation. Terminating the previous restart operation means to cancel all current Executions and wait for cancellation completion. Due to the missing wait, it could happen that previously reset Executions are being tried to reset again. This violates a sanity check and would lead to a restart loop. The problem is fixed by not distinguishing between a fail which happens in state JobStatus.RESTARTING and in any other state. Due to this, we will always cancel all existing Executions and only trigger the restart after all Executions have reached a terminal state.
- Loading branch information
1 parent
0b0c261
commit 5ae68f8
Showing
2 changed files
with
98 additions
and
31 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters