Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Feature: provide failFast flag, allow a DAG to run all branches of the DAG (either success or failure) #1443

Merged
merged 5 commits into from
Jul 1, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Move failFast flag to DAG template spec
  • Loading branch information
xianlubird authored and xianlu committed Jun 26, 2019
commit fb7f06cb900096afcaddb103b03b58eaa21faec8
8 changes: 4 additions & 4 deletions api/openapi-spec/swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,10 @@
"tasks"
],
"properties": {
"failFast": {
"description": "This flag is for DAG logic. The DAG logic has a built-in \"fail fast\" feature to stop scheduling new steps, as soon as it detects that one of the DAG nodes is failed. Then it waits until all DAG nodes are completed before failing the DAG itself. The FailFast flag default is true, if set to false, it will allow a DAG to run all branches of the DAG to completion (either success or failure), regardless of the failed outcomes of branches in the DAG. More info and example about this feature at https://github.com/argoproj/argo/issues/1442",
"type": "boolean"
},
"target": {
"description": "Target are one or more names of targets to execute in a DAG",
"type": "string"
Expand Down Expand Up @@ -1160,10 +1164,6 @@
"description": "Entrypoint is a template reference to the starting point of the workflow",
"type": "string"
},
"failFast": {
"description": "This flag is for DAG logic. The DAG logic has a built-in \"fail fast\" feature to stop scheduling new steps, as soon as it detects that one of the DAG nodes is failed. Then it waits until all DAG nodes are completed before failing the DAG itself. The FailFast flag default is true, if set to false, it will allow a DAG to run all branches of the DAG to completion (either success or failure), regardless of the failed outcomes of branches in the DAG. More info and example about this feature at https://github.com/argoproj/argo/issues/1442",
"type": "boolean"
},
"hostAliases": {
"description": "HostAliases is an optional list of hosts and IPs that will be injected into the pod spec",
"type": "array",
Expand Down
14 changes: 7 additions & 7 deletions pkg/apis/workflow/v1alpha1/openapi_generated.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 8 additions & 8 deletions pkg/apis/workflow/v1alpha1/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -167,14 +167,6 @@ type WorkflowSpec struct {

// HostAliases is an optional list of hosts and IPs that will be injected into the pod spec
HostAliases []apiv1.HostAlias `json:"hostAliases,omitempty"`

// This flag is for DAG logic. The DAG logic has a built-in "fail fast" feature to stop scheduling new steps,
// as soon as it detects that one of the DAG nodes is failed. Then it waits until all DAG nodes are completed
// before failing the DAG itself.
// The FailFast flag default is true, if set to false, it will allow a DAG to run all branches of the DAG to
// completion (either success or failure), regardless of the failed outcomes of branches in the DAG.
// More info and example about this feature at https://github.com/argoproj/argo/issues/1442
FailFast *bool `json:"failFast,omitempty"`
}

// Template is a reusable and composable unit of execution in a workflow
Expand Down Expand Up @@ -899,6 +891,14 @@ type DAGTemplate struct {

// Tasks are a list of DAG tasks
Tasks []DAGTask `json:"tasks"`

// This flag is for DAG logic. The DAG logic has a built-in "fail fast" feature to stop scheduling new steps,
// as soon as it detects that one of the DAG nodes is failed. Then it waits until all DAG nodes are completed
// before failing the DAG itself.
// The FailFast flag default is true, if set to false, it will allow a DAG to run all branches of the DAG to
// completion (either success or failure), regardless of the failed outcomes of branches in the DAG.
// More info and example about this feature at https://github.com/argoproj/argo/issues/1442
FailFast *bool `json:"failFast,omitempty"`
}

// DAGTask represents a node in the graph during DAG execution
Expand Down
10 changes: 5 additions & 5 deletions pkg/apis/workflow/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion test/e2e/functional/dag-disable-failFast.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ kind: Workflow
metadata:
generateName: dag-primay-branch-
spec:
failFast: false
entrypoint: statis
templates:
- name: a
Expand Down Expand Up @@ -32,6 +31,7 @@ spec:
args: ["hello world"]
- name: statis
dag:
failFast: false
tasks:
- name: A
template: a
Expand Down
34 changes: 18 additions & 16 deletions workflow/controller/dag.go
Original file line number Diff line number Diff line change
Expand Up @@ -89,25 +89,27 @@ func (d *dagContext) assessDAGPhase(targetTasks []string, nodes map[string]wfv1.
}

if unsuccessfulPhase != "" {
// If failFast set to false, we should return Running to continue this workflow for other DAG branch
if d.wf.Spec.FailFast != nil && !*d.wf.Spec.FailFast {
tmpOverAllFinished := true
// If all the nodes have finished, we should mark the failed node to finish overall workflow
// So we should check all the targetTasks have finished
for _, tmpDepName := range targetTasks {
tmpDepNode := d.getTaskNode(tmpDepName)
if tmpDepNode == nil {
tmpOverAllFinished = false
break
if d.tmpl != nil && d.tmpl.DAG != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this check. DAG contexts are always instantiated with a template of type DAG

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

// If failFast set to false, we should return Running to continue this workflow for other DAG branch
if d.tmpl.DAG.FailFast != nil && !*d.tmpl.DAG.FailFast {
tmpOverAllFinished := true
// If all the nodes have finished, we should mark the failed node to finish overall workflow
// So we should check all the targetTasks have finished
for _, tmpDepName := range targetTasks {
tmpDepNode := d.getTaskNode(tmpDepName)
if tmpDepNode == nil {
tmpOverAllFinished = false
break
}
if tmpDepNode.Type == wfv1.NodeTypeRetry && hasMoreRetries(tmpDepNode, d.wf) {
tmpOverAllFinished = false
break
}
}
if tmpDepNode.Type == wfv1.NodeTypeRetry && hasMoreRetries(tmpDepNode, d.wf) {
tmpOverAllFinished = false
break
if !tmpOverAllFinished {
return wfv1.NodeRunning
}
}
if !tmpOverAllFinished {
return wfv1.NodeRunning
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind writing a unit test for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure


// if we were unsuccessful, we can return *only* if all retry nodes have ben exhausted.
Expand Down
8 changes: 8 additions & 0 deletions workflow/controller/dag_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,11 @@ func TestDagRetryExhaustedXfail(t *testing.T) {
woc.operate()
assert.Equal(t, string(wfv1.NodeFailed), string(woc.wf.Status.Phase))
}

// TestDagDisableFailFast test disable fail fast function
func TestDagDisableFailFast(t *testing.T) {
wf := test.LoadTestWorkflow("testdata/dag-disable-fail-fast.yaml")
woc := newWoc(*wf)
woc.operate()
assert.Equal(t, string(wfv1.NodeFailed), string(woc.wf.Status.Phase))
}
211 changes: 211 additions & 0 deletions workflow/controller/testdata/dag-disable-fail-fast.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
creationTimestamp: 2019-06-26T09:11:58Z
generateName: dag-disable-fail-fast-
generation: 1
labels:
workflows.argoproj.io/completed: "true"
workflows.argoproj.io/phase: Failed
name: dag-disable-fail-fast-r6xdc
namespace: default
resourceVersion: "15772210"
selfLink: /apis/argoproj.io/v1alpha1/namespaces/default/workflows/dag-disable-fail-fast-r6xdc
uid: 734516b5-97f2-11e9-9fea-00163e00cf4e
spec:
arguments: {}
entrypoint: statis
templates:
- container:
args:
- hello world
command:
- cowsay
image: docker/whalesay:latest
name: ""
resources: {}
inputs: {}
metadata: {}
name: a
outputs: {}
- container:
args:
- sleep 30; echo haha
command:
- sh
- -c
image: alpine:latest
name: ""
resources: {}
inputs: {}
metadata: {}
name: b
outputs: {}
retryStrategy:
limit: 2
- container:
args:
- echo intentional failure; exit 2
command:
- sh
- -c
image: alpine:latest
name: ""
resources: {}
inputs: {}
metadata: {}
name: c
outputs: {}
retryStrategy:
limit: 1
- container:
args:
- hello world
command:
- cowsay
image: docker/whalesay:latest
name: ""
resources: {}
inputs: {}
metadata: {}
name: d
outputs: {}
- dag:
failFast: false
tasks:
- arguments: {}
name: A
template: a
- arguments: {}
dependencies:
- A
name: B
template: b
- arguments: {}
dependencies:
- A
name: C
template: c
- arguments: {}
dependencies:
- B
name: D
template: d
- arguments: {}
dependencies:
- D
name: E
template: d
inputs: {}
metadata: {}
name: statis
outputs: {}
status:
finishedAt: 2019-06-26T09:12:46Z
nodes:
dag-disable-fail-fast-r6xdc:
children:
- dag-disable-fail-fast-r6xdc-3928436299
displayName: dag-disable-fail-fast-r6xdc
finishedAt: 2019-06-26T09:12:46Z
id: dag-disable-fail-fast-r6xdc
name: dag-disable-fail-fast-r6xdc
phase: Failed
startedAt: 2019-06-26T09:11:58Z
templateName: statis
type: DAG
dag-disable-fail-fast-r6xdc-3256495944:
boundaryID: dag-disable-fail-fast-r6xdc
displayName: C(0)
finishedAt: 2019-06-26T09:12:08Z
id: dag-disable-fail-fast-r6xdc-3256495944
message: failed with exit code 2
name: dag-disable-fail-fast-r6xdc.C(0)
phase: Failed
startedAt: 2019-06-26T09:12:03Z
templateName: c
type: Pod
dag-disable-fail-fast-r6xdc-3457680277:
boundaryID: dag-disable-fail-fast-r6xdc
displayName: C(1)
finishedAt: 2019-06-26T09:12:12Z
id: dag-disable-fail-fast-r6xdc-3457680277
message: failed with exit code 2
name: dag-disable-fail-fast-r6xdc.C(1)
phase: Failed
startedAt: 2019-06-26T09:12:09Z
templateName: c
type: Pod
dag-disable-fail-fast-r6xdc-3928436299:
boundaryID: dag-disable-fail-fast-r6xdc
children:
- dag-disable-fail-fast-r6xdc-3945213918
- dag-disable-fail-fast-r6xdc-3961991537
displayName: A
finishedAt: 2019-06-26T09:12:02Z
id: dag-disable-fail-fast-r6xdc-3928436299
name: dag-disable-fail-fast-r6xdc.A
phase: Succeeded
startedAt: 2019-06-26T09:11:58Z
templateName: a
type: Pod
dag-disable-fail-fast-r6xdc-3945213918:
boundaryID: dag-disable-fail-fast-r6xdc
children:
- dag-disable-fail-fast-r6xdc-4286504589
displayName: B
finishedAt: 2019-06-26T09:12:36Z
id: dag-disable-fail-fast-r6xdc-3945213918
name: dag-disable-fail-fast-r6xdc.B
phase: Succeeded
startedAt: 2019-06-26T09:12:03Z
type: Retry
dag-disable-fail-fast-r6xdc-3961991537:
boundaryID: dag-disable-fail-fast-r6xdc
children:
- dag-disable-fail-fast-r6xdc-3256495944
- dag-disable-fail-fast-r6xdc-3457680277
displayName: C
finishedAt: 2019-06-26T09:12:13Z
id: dag-disable-fail-fast-r6xdc-3961991537
message: No more retries left
name: dag-disable-fail-fast-r6xdc.C
phase: Failed
startedAt: 2019-06-26T09:12:03Z
type: Retry
dag-disable-fail-fast-r6xdc-3978769156:
boundaryID: dag-disable-fail-fast-r6xdc
children:
- dag-disable-fail-fast-r6xdc-3995546775
displayName: D
finishedAt: 2019-06-26T09:12:41Z
id: dag-disable-fail-fast-r6xdc-3978769156
name: dag-disable-fail-fast-r6xdc.D
phase: Succeeded
startedAt: 2019-06-26T09:12:37Z
templateName: d
type: Pod
dag-disable-fail-fast-r6xdc-3995546775:
boundaryID: dag-disable-fail-fast-r6xdc
displayName: E
finishedAt: 2019-06-26T09:12:45Z
id: dag-disable-fail-fast-r6xdc-3995546775
name: dag-disable-fail-fast-r6xdc.E
phase: Succeeded
startedAt: 2019-06-26T09:12:42Z
templateName: d
type: Pod
dag-disable-fail-fast-r6xdc-4286504589:
boundaryID: dag-disable-fail-fast-r6xdc
children:
- dag-disable-fail-fast-r6xdc-3978769156
displayName: B(0)
finishedAt: 2019-06-26T09:12:36Z
id: dag-disable-fail-fast-r6xdc-4286504589
name: dag-disable-fail-fast-r6xdc.B(0)
phase: Succeeded
startedAt: 2019-06-26T09:12:03Z
templateName: b
type: Pod
phase: Failed
startedAt: 2019-06-26T09:11:58Z