RFC: `across` step #29

vito · 2019-05-21T20:51:09Z

Rendered

Related to, but not dependent on #24.

Signed-off-by: Alex Suraci <[email protected]>

029-across-step/proposal.md

siennathesane · 2020-02-12T19:19:22Z

How would across work with a large build matrix?

I could see the value here of across having some functional limiter, like this:

across: [1.8, 1.9, 1.10, 1.11, 1.12, 1.13, 1.8-win, 1.9-win, 1.10-win, 1.11-win, 1.12-win, 1.13-win, 1.8-macos, 1.9-macos, 1.10-macos, 1.11-macos, 1.12-macos, 1.13-macos, etc.]
max_in_flight: 4
as: go-version
do:
- task: unit
  vars: {go_version: ((go_version))}

That way it limits the parallelism of the build matrix a bit without stealing too much from the global and job max_in_flight and prevent the pipeline from moving forward on other tasks in the same pipeline/job.

JohannesRudolph · 2020-04-14T20:43:48Z

Is it planned that across also adds a notion for "skipping" a job for certain resource combinations? Say we have a build matrix but one particular job may not apply to a certain combination of vars, would it be possible to skip running the job? Or should that rather be implemented inside the job itself (i.e. NOP-ing all tasks inside)? I'm asking because it has proven very difficult so far to implement pipeline control flow, see e.g. https://discuss.concourse-ci.org/t/trigger-jobs-conditionally/306/13

I know that pipeline control flow is a controversial topic and having "idempotent" pipelines that always do the same thing is a good design goal for Concourse. However there's already try and on_success and on_failure. However, they don't support all real world use cases, especially when it comes to optimising pipelines. I'd rather have concourse support some sort of declarative notion (like job.if in GitHub Actions) than abusing task scripts or other mechanisms.

...and resolve open questions Signed-off-by: Alex Suraci <[email protected]>

029-across-step/proposal.md

Signed-off-by: Alex Suraci <[email protected]>

029-across-step/proposal.md

Signed-off-by: Alex Suraci <[email protected]>

aoldershaw · 2020-08-06T12:13:32Z

029-across-step/proposal.md

+ instance_vars: {pr_number: ((.:pr.number))}
+ across:
+ - var: pr
+ source: booklit-prs


Have you thought about allowing the across step to iterate over objects from an arbitrary prototype (i.e. not just constrained to var_source prototypes and their list message)? That could make the object streams returned by prototypes more useful outside of the special prototype interfaces that Concourse will interact with natively (resources, var_sources, notifications, etc.)

In this comment concourse/concourse#5936 (comment), you suggested that the run step could possibly support a set_var field to store the returned object as a local variable. Perhaps an alternative to set_var would be to use the across step in its place, e.g.

- across: - var: some-var run: eval type: gval params: {fields: {some-field: hello}} run: echo type: debug params: {message: ((.:some-var.some-field))}

Kind of weird in this case since we're iterating over a single value always - but I guess for set_var to work in general, it would also have to set the var to a list of objects (even if a single object is emitted)

This also makes me realize that the across step can be used for conditional portions of build plans, by having the list of values be either 0 or 1 element long 🤔

It's also making me wonder about dynamic build plans in the context of the across step (i.e. the list of values to iterate over can be determined at runtime) - what are your thoughts on that? I think it could greatly extend the use cases that across can support, but it could possibly make pipelines difficult to understand since you could introduce arbitrarily complicated control flow

I have, and your comments are pretty much the same thought process I went thought. 😄 It seems like a really cool idea, but yeah, the difficulty is that it seems to require some sort of dynamic build plan construct.

I'm running into another area that would benefit from dynamic build plans: the way file: and image_resource: interact on a task step. I'm working on a refactor that pulls image fetching out into explicit check and get steps in the build plan, but the existence of file: means I can't know whether they're needed (Darwin and Windows have no images, for example), or what image to even check and get, until the build is running. In this case I can probably just preemptively configure steps which may just do no-ops, and they can probably get their config in a similar way to how the Get plan uses VersionFrom to get some config from a prior step, but it all feels kind of held together with duct tape. (There's also a bit of complexity in that we may need to check + get a custom type that the image_resource: uses, but I think have a way to get around that.)

I still don't see a super obvious way to implement dynamic build plans though. 🤔 But I haven't really tried, either, I guess.

The only thought I had was for there to be some sort of build event that says "inject this build plan under this plan ID" or "replace this plan ID with this build plan". So, the plan would initially look something like:

{ "schema": "exec.v2", "plan": { "id": "across-id", "across": {} } }

and there could be a build event like:

{ "event":"dynamic-build-plan", "version":"1.0", "data": { "id": "across-id", "across": { "some": "data" } } }

which the UI would interpret to construct a final build plan of:

{ "schema": "exec.v2", "plan": { "id": "across-id", "across": { "some": "data" } } }

I'm sure there are some technical challenges I haven't considered, though (/it may not work at all!)

That seems close to what I'm proposing - whatever the step results in gets used as the value. I think it'd just look like this with the syntax I had before:

var: foo do: - get: data trigger: true - task: filter-list file: ci/tasks/filter-list.yml input_mapping: {input: data} output_mapping: {output: filtered} - load_var: filtered file: filtered/data.json

...but this is just a different syntax. values_from is probably clearer. I think we'd still need to handle values: ((.:foo)), but that could probably be handled by the across step too instead of a separate values: step.

In either case, I'm not sure how the across step would access the result in the case of a do. 🤔 Currently results are stored under step IDs stored in the RunState, but this is almost like a return value from an expression. With a single step it's obvious what step ID to use, but with a do it's not so easy. It's almost like we need (exec.Step).Run to be able to return a value on its own, and do would return the last value. But that changes the interface.

Maybe this whole storing-results-in-RunState pattern could/should change? @clarafu @chenbh Has your recent experience in this area led to a similar line of thinking?

Oh, also re: get_vars vs list_vars, yeah depending on what the resulting value is list_vars might be more accurate. I think either get_vars would actually fetch all the var values individually, using a list call under the hood for the scheduling/triggering, or we would have list_vars instead and allow the user to run a separate get_var step for each var value if that's desired. Don't know which; haven't put much thought into it yet.

What I was proposing is a bit more complicated than just a multi-step values_from (where the value comes from the last step), but I didn't have the proper framing. Now, I realize it the proposal was actually a combination of two ideas:

A new way to communicate between jobs by emitting/receiving data

Anonymous jobs (run nested in other jobs)

I'll flesh out my original example a bit more:

jobs: - name: parent plan: - get: ci - across: - var: foo values_from: do: # <----------------------------- A - get: data trigger: true # <------------------ B - task: filter-list file: ci/tasks/filter-list.yml input_mapping: {input: data} output_mapping: {output: filtered} - load_var: filtered # <------------- C file: filtered/data.json ...

Here, A is an "anonymous job". When the trigger on B gets a new version, the parent job doesn't trigger - only the anonymous job A. The load_var C implicitly emits the value it loads, and parent implicitly receives and triggers on changes to this value.

The implicit emitting and receiving is very non-obvious, though, and the anonymous job idea is a bit odd (either we make it fully self-contained and need to duplicate the get: ci, or it magically can borrow artifacts from the parent job). What if instead we made it explicit and gave the filtering it's own job:

jobs: - name: filter-list plan: - get: ci - get: data trigger: true - task: filter-list file: ci/tasks/filter-list.yml input_mapping: {input: data} output_mapping: {output: filtered} - load_var: filtered file: filtered/data.json emit: data # <--------------------- A - name: parent plan: - get: ci - receive: data # <------------------ B job: filter-list trigger: true - across: - var: foo values: ((.:data)) ...

A emits a new value and stores it as data. B triggers on changes to that emitted value and stores the result in a local variable.

This mechanism could even be used for {get_var: ..., trigger: true} - the builds we run periodically to fetch the var could emit the value to the DB, and the job would receive that changed value from the build.

EDIT:

This could even possibly be used for {get: ..., trigger: true} - the periodic check builds could emit the latest version, and the generated plan could effectively be:

- receive: version # from the check build, somehow trigger: true - get: version

Wouldn't work with version: every or pinned/disabled versions, though, so doesn't make much sense

In either case, I'm not sure how the across step would access the result in the case of a do. 🤔 Currently results are stored under step IDs stored in the RunState, but this is almost like a return value from an expression. With a single step it's obvious what step ID to use, but with a do it's not so easy. It's almost like we need (exec.Step).Run to be able to return a value on its own, and do would return the last value. But that changes the interface.

I thought of it like every type of step would have a single output (e.g. check is version, get is image spec, load_var and get_var are var), in do's case maybe we just need to figure out the contract for what it should return (is it time to introduce a result step/modifier)?

It would then be up to across to figure out what to do with this info (which might be wild cause that implies we can run across a list of versions from a check).

I am not sure if I am adding anything useful to the conversation, but when I saw the the 7.0.0 release notes I immediately saw a good use case for one of our pipelines and I tried to build a working prototype using pipeline instances and the across step.

- name: set-pipeline-instances serial: true plan: - in_parallel: - get: ci - get: src-branches trigger: true ... - load_var: branches file: branches-json/branches.json - set_pipeline: {{ (datasource "config").name }} file: ci/ci/build.yml instance_vars: {branch: ((.:b))} across: - var: b values: ((.:branches))

I am keen for the normal resource type to trigger the job and then use the load_var to load a variable that is a list and then just use that for for the argument to values.

vito · 2020-08-11T15:35:05Z

029-across-step/proposal.md

+
+## Open Questions
+
+* n/a


@JohannesRudolph Sorry, completely forgot to address your point! (Raising this as a review comment so the replies can be threaded.)

Would it be sufficient to allow a static list of combinations to skip to be provided?

across: - var: go values: [1.16, 1.15, 1.8] - var: platform values: [darwin, linux, windows] except: - platform: windows go: 1.18

This would work similarly to excluding jobs in Travis CI, i.e. a subset of vars may be specified, and all variations with those vars will be skipped.

Related thought: it would be kind of cool if we could except combinations based on a relationship between variables. e.g. if you want to test various upgrade paths, you only want to test combinations of from and to where to > from. I have no idea how this would work - it might just be a use case for var sources rather than except

drunkirishcoder · 2021-04-07T20:27:09Z

would there be a way to rerun only the failed combinations without having to rerun all the combinations again?

aoldershaw · 2021-04-07T21:06:50Z

would there be a way to rerun only the failed combinations without having to rerun all the combinations again?

@drunkirishcoder that's discussed here. It's just a matter of adding an attempts modifier to the step:

task: unit
timeout: 1h # interrupt the task after 1 hour
attempts: 3 # attempt the task 3 times
across:
- var: go_version
  values: [1.12, 1.13]
on_failure: # do something after all steps complete and at least one failed

drunkirishcoder · 2021-04-07T21:13:03Z

@aoldershaw cool. that can auto retry x attempts. but would there be a manual way? like in the concourse UI rerun the task with the same settings but only retry the failed combo?

the use case I'm trying to design for is I want to use across to deploy to multiple clusters/environments. and if one of the env failed, I wouldn't want to rerun the deployment to the successful ones again. so ideally if I can rerun the same matrix, but only for the failed ones again.

aoldershaw · 2021-04-08T13:54:09Z

029-across-step/proposal.md

+`ensure` and `on_*` bind to the `across` step so that they may be run after the
+full matrix completes.
+
+`attempts` binds to the inner step because it doesn't seem to make a whole lot


@drunkirishcoder going to move this discussion to a review comment thread to avoid cluttering the top-level.

that can auto retry x attempts. but would there be a manual way? like in the concourse UI rerun the task with the same settings but only retry the failed combo?

the use case I'm trying to design for is I want to use across to deploy to multiple clusters/environments. and if one of the env failed, I wouldn't want to rerun the deployment to the successful ones again. so ideally if I can rerun the same matrix, but only for the failed ones again.

Ah, so that's what you meant. There are no plans for this currently, and it would be a bit challenging to get right - there's no precedence for partial reruns of a build. In general, we'd still need to run everything before the across step again, since those steps may produce artifacts that the across step relies on (e.g. gets or task outputs). Plus, if the step that's run in a build matrix emits build outputs (via get/put steps), we'd still need to emit the build outputs for the previously successful combinations (in order for build scheduling to work properly).

I think the ideal solution would for the deploy steps to be idempotent - i.e. given the same inputs, if the deployment already succeeded the step would no-op, so re-running the whole matrix wouldn't be an issue. That's perhaps easier said than done, though.

Is there a reason why manual retries is preferable to using attempts for automated retries for your use case?

@aoldershaw ok, thank you for the detailed response. that makes a lot of sense. as to why prefer manual over automated retries. I'm just thinking sometimes a deployment would fail due to circumstances that may take longer to troubleshoot and resolve, like someone forgot to apply a firewall rule. but yeah I understand why it would be difficult to do in concourse.

would be nice if there's a similar feature that will represent each one as an independent pipeline level task instead, that can be restarted. just thinking out loud.

One thing you could do is configure a pipeline for each cell in the matrix using the set_pipeline step, rather than setting up all the environments within the single build. That way you could retrigger the failed job in each pipeline.

aoldershaw · 2021-04-25T19:20:20Z

029-across-step/proposal.md

+
+## Open Questions
+
+* n/a


Something that didn't really come up in the proposal is how we handle steps that emit artifacts (e.g. get, task.outputs, prototype outputs). Currently, the implementation of the static across step runs each iteration in its own local artifact scope. This means that any artifacts go out of scope past the across step and cannot be used.

It'd be cool if there was a way to "fan-in" on artifacts emitted within an across step. For instance, suppose there was a prototype (or just a task) for compiling a Go binary that outputs an artifact called binary. You might want to compile it for multiple platforms/architectures, which could make use of the across step:

across: - var: platform values: [linux, darwin, windows] - var: arch values: [amd64, arm64] run: compile type: go params: package: repo/cmd/my-program platform: ((.:platform)) arch: ((.:arch)) outputs: [binary]

(side-note: it might make more sense from a performance PoV for a go prototype to support compiling multiple platforms/architectures in a single run step where possible, but bear with me for the example)

The trouble is - under the current implementation, there's no way to access binary outside of the across step. Even if we used a shared artifact scope for each iteration, it'd just be "last write wins" (so, with no parallelism, binary would be the result for (windows, arm64) only) - since they all share a name, the results would clobber each other. This could be avoided by encoding more information in the output name via an output_mapping (e.g. name the artifacts binary-linux-amd64, binary-linux-arm64, ...) - but that only works for simple values like strings/numbers.

So, what if we let artifacts in an across step share a name, but be uniquely identified by the set of vars that were used to produce that artifact (in the same way that instanced pipelines share a name and are uniquely identified by their instance_vars). A possible syntax for referencing the binary created for (linux, amd64) could be binary{platform: linux, arch: amd64}

Here, binary really refers to a matrix of outputs (that mirrors the across matrix). What if we also provided a way to filter down that matrix to get at a subset of the outputs. e.g. binary{platform: windows} would give the (2) outputs built for windows, and binary{arch: arm64} would give the (3) outputs built for arm64.

So, suppose you wanted to upload a github release containing all 6 binaries produced in the matrix. You could do something like:

put: release params: globs: [binary/*] # or maybe [binary{*}/*] to make it clear it's a matrix

Suppose you wanted to bundle the different architectures separately - then, you could do:

across: - var: arch values: [amd64, arm64] put: ((.:arch))-bundle params: globs: [binary{arch: ((.:arch))}/*]

This may warrant a separate proposal but figured I'd bring it up here since it's definitely a limitation of the across step as it stands

One idea, maybe we could re-use the concept of output_mappings. Similar to how a task defined in a file can have its outputs mapped by an output_mappings configuration in the pipeline referencing that task, an across step could have its outputs made available via a similar mapping.

As an example:

across: - var: arch values: [amd64, arm64] - var: version values: [1.0, 1.1] task: create-artifact output_mapping: output: task-artifact across_output_mapping: - vars: # match vars from an execution of the task arch: amd64 version: 1.0 outputs: # map an output from the modified step to an output of the across step task-artifact: amd64-bundle-v1.0

This would be a fairly verbose if you wanted to map every output from the matrix of steps:

# [...] across_output_mapping: - vars: {arch: amd64, version: 1.0} outputs: {task-artifact: amd64-bundle-v1.0} - vars: {arch: arm64, version: 1.0} outputs: {task-artifact: arm64-bundle-v1.0} - vars: {arch: amd64, version: 1.1} outputs: {task-artifact: amd64-bundle-v1.1} - vars: {arch: arm64, version: 1.1} outputs: {task-artifact: arm64-bundle-v1.1}

Though it would possibly simpler to implement, without relying on prototypes or other Concourse functionality.

When, or if, this limitation on using vars in get: and put: steps (eg. get: bundle-((.:arch))-((.:version))) is lifted, the across_output_mapping could be updated to allow for the use of vars as well, simplifying the configuration to something like...

# [...] across_output_mapping: - outputs: task-artifact: ((.:arch))-bundle-v((.:version))

Fix typo

dumez-k · 2022-02-08T20:51:56Z

Hi @taylorsilva have there been any updates to this epic recently? Is this still under active development?

Also a little confusing from looking through the initial spatial resources RFC but was support ever implemented for this in any release version? Or is all of this work just prototype unreleased stuff?

Thanks!

initial spatial resources rfc

724ff71

Signed-off-by: Alex Suraci <[email protected]>

vito mentioned this pull request May 22, 2019

RFC: generalize resource interface #24

Closed

reframe proposal around across step

ce2481d

Signed-off-by: Alex Suraci <[email protected]>

This was referenced Jul 30, 2019

Epic: spatial resource flows concourse/concourse#1707

Closed

version: every will skip versions if a parallel upstream job's latest build finishes before an older one concourse/concourse#736

Open

RFC: set_pipeline step #31

Merged

vito added 2 commits December 22, 2019 15:04

simplify across step by building on var_sources

c136fdf

Signed-off-by: Alex Suraci <[email protected]>

expand on open questions and new implications

efe6eff

Signed-off-by: Alex Suraci <[email protected]>

vito changed the title ~~RFC: spatial resources~~ RFC: across step Dec 24, 2019

vito marked this pull request as ready for review December 24, 2019 23:27

siennathesane reviewed Feb 12, 2020

View reviewed changes

029-across-step/proposal.md Outdated Show resolved Hide resolved

vito mentioned this pull request Mar 24, 2020

Removing dependency on a web browser for fly to access anything other than local authentication concourse/concourse#3208

Closed

vito mentioned this pull request May 7, 2020

RFC: instanced pipelines #34

Merged

jamieklassen mentioned this pull request May 12, 2020

Trigger job with custom parameters concourse/concourse#783

Closed

vito mentioned this pull request May 20, 2020

RFC: Prototypes #37

Merged

matthewpereira mentioned this pull request May 25, 2020

Spatial Automation (across step) concourse/concourse#5656

Closed

vito mentioned this pull request Jun 8, 2020

Gitea support concourse/concourse#4681

Closed

across: revise syntax, add max_in_flight

2643e55

...and resolve open questions Signed-off-by: Alex Suraci <[email protected]>

aoldershaw reviewed Jul 8, 2020

View reviewed changes

029-across-step/proposal.md Show resolved Hide resolved

vito added 3 commits July 9, 2020 09:56

across: cover modifier precedence, add fail_fast

82b84e9

Signed-off-by: Alex Suraci <[email protected]>

across: add across/timeout precedence reasoning

4c2ea25

Signed-off-by: Alex Suraci <[email protected]>

across: continue running unless fail_fast given

ba81898

Signed-off-by: Alex Suraci <[email protected]>

aoldershaw reviewed Jul 9, 2020

View reviewed changes

029-across-step/proposal.md Show resolved Hide resolved

aoldershaw reviewed Jul 10, 2020

View reviewed changes

029-across-step/proposal.md Show resolved Hide resolved

vito mentioned this pull request Jul 14, 2020

atc: ignore unknown step fields when reading from DB concourse/concourse#5878

Merged

11 tasks

aoldershaw mentioned this pull request Jul 16, 2020

Add experimental across step for running build plans across a matrix of values concourse/concourse#5887

Merged

11 tasks

across: cover var scoping and shadowing

cfec275

Signed-off-by: Alex Suraci <[email protected]>

vito force-pushed the spatial-resources branch from 015de30 to cfec275 Compare July 21, 2020 20:40

aoldershaw reviewed Aug 6, 2020

View reviewed changes

vito commented Aug 11, 2020

View reviewed changes

aoldershaw reviewed Apr 8, 2021

View reviewed changes

vito assigned aoldershaw Apr 15, 2021

aoldershaw reviewed Apr 25, 2021

View reviewed changes

aoldershaw mentioned this pull request May 4, 2021

Figure out best approach for providing inputs/outputs to prototypes concourse/concourse#6980

Closed

vito mentioned this pull request May 6, 2021

RFC: Simple pipeline merging and templating #19

Closed

Zhou Yu and others added 2 commits August 17, 2021 11:35

Fix typo

a8f6046

Merge pull request #4 from jutkko/patch-1

a89bb2c

Fix typo

aoldershaw removed their assignment Aug 27, 2021

taylorsilva self-assigned this Sep 13, 2021

This was referenced Oct 5, 2021

Outputs inside across task not working concourse/concourse#7577

Open

Don't check identifiers for across step if they contain a var concourse/concourse#7660

Closed

iomarcovalente mentioned this pull request Jun 23, 2022

var_sources not accessible when inside an across task concourse/concourse#8191

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: `across` step #29

RFC: `across` step #29

vito commented May 21, 2019 •

edited

Loading

siennathesane commented Feb 12, 2020

JohannesRudolph commented Apr 14, 2020

aoldershaw Aug 6, 2020 •

edited

Loading

aoldershaw Aug 6, 2020

aoldershaw Aug 6, 2020 •

edited

Loading

vito Aug 6, 2020

aoldershaw Aug 6, 2020 •

edited

Loading

vito Feb 23, 2021

vito Feb 23, 2021

aoldershaw Feb 24, 2021 •

edited

Loading

chenbh Feb 25, 2021

kurtmc Mar 1, 2021

vito Aug 11, 2020

aoldershaw Aug 13, 2020

drunkirishcoder commented Apr 7, 2021

aoldershaw commented Apr 7, 2021

drunkirishcoder commented Apr 7, 2021 •

edited

Loading

aoldershaw Apr 8, 2021 •

edited

Loading

drunkirishcoder Apr 8, 2021 •

edited

Loading

vito Apr 8, 2021

aoldershaw Apr 25, 2021

multimac Feb 26, 2022

dumez-k commented Feb 8, 2022

RFC: across step #29

Are you sure you want to change the base?

RFC: across step #29

Conversation

vito commented May 21, 2019 • edited Loading

siennathesane commented Feb 12, 2020

JohannesRudolph commented Apr 14, 2020

aoldershaw Aug 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aoldershaw Aug 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aoldershaw Aug 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aoldershaw Feb 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drunkirishcoder commented Apr 7, 2021

aoldershaw commented Apr 7, 2021

drunkirishcoder commented Apr 7, 2021 • edited Loading

aoldershaw Apr 8, 2021 • edited Loading

Choose a reason for hiding this comment

drunkirishcoder Apr 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dumez-k commented Feb 8, 2022

RFC: `across` step #29

RFC: `across` step #29

vito commented May 21, 2019 •

edited

Loading

aoldershaw Aug 6, 2020 •

edited

Loading

aoldershaw Aug 6, 2020 •

edited

Loading

aoldershaw Aug 6, 2020 •

edited

Loading

aoldershaw Feb 24, 2021 •

edited

Loading

drunkirishcoder commented Apr 7, 2021 •

edited

Loading

aoldershaw Apr 8, 2021 •

edited

Loading

drunkirishcoder Apr 8, 2021 •

edited

Loading