Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Consider expanded tasks in getTaskFromNode #2756

Merged
merged 2 commits into from
Apr 20, 2020

Conversation

simster7
Copy link
Member

@simster7 simster7 commented Apr 19, 2020

Fixes: #2743

workflow/controller/dag.go Outdated Show resolved Hide resolved
@@ -591,7 +602,7 @@ func findLeafTaskNames(tasks []wfv1.DAGTask) []string {
}

// expandTask expands a single DAG task containing withItems, withParams, withSequence into multiple parallel tasks
func (woc *wfOperationCtx) expandTask(task wfv1.DAGTask) ([]wfv1.DAGTask, error) {
func expandTask(task wfv1.DAGTask) ([]wfv1.DAGTask, error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The woc receiver was not used.

@nenadzaric
Copy link

nenadzaric commented Apr 20, 2020

Hi @simster7,

First, thank you for considering this fix, I really appreciate it!

Since I depend a lot on this feature, because of the scalability of my workflows, I wanted to test it for myself.

It looks like it is not working yet, and I am not sure if you finished working on this, but just in case I wanted to share what is happening.

kind: Workflow
metadata:
  generateName: parameter-aggregation-one-will-fail2-
spec:
  entrypoint: parameter-aggregation-one-will-fail2
  templates:
  - name: parameter-aggregation-one-will-fail2
    dag:
      tasks:

      - name: generate
        template: gen-number-list
        continueOn:
            failed: true
        
      - name: one-will-fail
        template: one-will-fail
        dependencies:
            - generate
        arguments:
          parameters:
          - name: num
            value: "{{item}}"
        withParam: "{{tasks.generate.outputs.result}}"
        continueOn:
          failed: true

      - name: whalesay
        template: whalesay
        dependencies:
          - one-will-fail
        continueOn:
          failed: true

  - name: one-will-fail
    inputs:
      parameters:
      - name: num
    container:
      image: alpine:latest
      command: [sh, -xc]
      args:
      - |
        if [ $(({{inputs.parameters.num}})) == 1 ]; then
          exit 1;
        else
          echo {{inputs.parameters.num}}
        fi

  - name: whalesay
    container:
      image: docker/whalesay:latest
      command: [cowsay]

  - name: gen-number-list
    script:
      image: python:alpine3.6
      command: [python]
      source: |
        import json
        import sys
        json.dump([i for i in range(0, 2)], sys.stdout)

So if parameters are passed through the parameter aggregation JSON list, with withParam: "{{tasks.generate.outputs.result}}" the workflow will fail.

Name:                parameter-aggregation-one-will-fail2-kv5dk
Namespace:           default
ServiceAccount:      default
Status:              Error
Message:             could not expand task one-will-fail while looking for task corresponding to node parameter-aggregation-one-will-fail2-kv5dk.one-will-fail(1:1): withParam value could not be parsed as a JSON list: {{tasks.generate.outputs.result}}
Created:             Mon Apr 20 16:24:53 +0200 (15 seconds ago)
Started:             Mon Apr 20 16:24:53 +0200 (15 seconds ago)
Finished:            Mon Apr 20 16:25:08 +0200 (now)
Duration:            15 seconds

STEP                                           TEMPLATE                              PODNAME                                                DURATION  MESSAGE
 ⚠ parameter-aggregation-one-will-fail2-kv5dk  parameter-aggregation-one-will-fail2                                                                   Workflow operation error
 ├-✔ generate                                  gen-number-list                       parameter-aggregation-one-will-fail2-kv5dk-2730234955  6s
 ├-✔ one-will-fail(0:0)                        one-will-fail                         parameter-aggregation-one-will-fail2-kv5dk-1938682683  5s
 └-✖ one-will-fail(1:1)                        one-will-fail                         parameter-aggregation-one-will-fail2-kv5dk-528872991   3s        failed with exit code 1

With an error

could not expand task one-will-fail while looking for task corresponding to node parameter-aggregation-one-will-fail2-kv5dk.one-will-fail(1:1): withParam value could not be parsed as a JSON list: {{tasks.generate.outputs.result}}

By the way, the workflow mentioned in the previous issue is working as intended.

Thanks!

@sonarcloud
Copy link

sonarcloud bot commented Apr 20, 2020

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 0 Security Hotspots to review)
Code Smell A 0 Code Smells

87.5% 87.5% Coverage
0.0% 0.0% Duplication

// nodeTaskName formulates the corresponding task name for a dag node. Note that this is not simply the inverse of
// taskNodeName. A task name might be from an expanded task, in which case it will not have an explicit task defined for it.
// When that is the case, we formulate the name of the original expanded task by removing the fields after "("
func (d *dagContext) taskNameFromNodeName(nodeName string) string {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After much thinking, I've found this to be the most robust way to get a task name from a node name: expanding all tasks when to match them to the node is costly and would require a code refactor, storing a node name to task name map is futile because tasks are not expanded on every operation of the Workflow, adding a field to NodeStatus is undesirable because it exposes implementation details to the API and the field is only used by DAG Nodes.

This function is safe because node names generated from task names are deterministic, so we should always be able to recover the node name from the task name.

If the naming convention of node names changes, this function will also have to change. To ensure that this happens I have created tests that cover this and have also added assert statements that ensure a node name can be reversed to the correct task name every time one is created.

@@ -336,6 +343,10 @@ func (woc *wfOperationCtx) executeDAGTask(dagCtx *dagContext, taskName string) {
dependenciesCompleted := true
dependenciesSuccessful := true
nodeName := dagCtx.taskNodeName(taskName)
// Ensure that the generated taskNodeName can be reversed into the original (not expanded) task name
if dagCtx.taskNameFromNodeName(nodeName) != task.Name {
panic("unreachable: node name cannot be reversed into task name; please file a bug on GitHub")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made sure that this code is unreachable now, but if the naming convention is changed in the future this is meant to catch the discrepancy before the code can be shipped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should panic if there is a nasty bug - we log and report on panics

@simster7
Copy link
Member Author

Thanks for the catch @nenadzaric!

Copy link
Contributor

@alexec alexec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the tests

@simster7 simster7 merged commit 7975952 into argoproj:master Apr 20, 2020
@simster7
Copy link
Member Author

Back-ported to 2.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Continue on fail not working with parameter agregation
3 participants