-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: self-reporting workflow progress. Fixes #1658, #4245 #6714
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my original PoC has some issues:
- It requires tailing the pod logs, and that is expensive. Instead we should build on the pattern that the Emissary uses. The process should write a file to a shared volume (
e.g. /var/run/argo/progress
) to indicate its progress. I know this is not as user friendly as printing to log files, but I think logs files is too expensive. - We want to move away from requiring
pod patch
permission, to some other mechanism. However, that's a big piece of work, not something to do in this PR. I think we can still run progress on a ticker, but the polling period should be configurable (using env var) maybe starting with1m
rather than30s
. It should only patch if the progress has changed.
Thanks for the comment. I did wonder about what your thoughts were so that clears things up! |
a0bb1e5
to
b13a971
Compare
Some additional thoughts on this:
|
when doing file based progress watching the question is a bit how to do that for other executors than emissary. Should I mount an emptyDir similar to what emissary does when creating the pod (if not emissary executor) and specify using e.g. an env variable what file to watch for progress? |
Yes. It only gets mounted for the emissary today, but we can mount it for all pods. |
as I'm continuing working on this I came accross the Would it make sense to move the updating of progress to this function (within Reason why I came across this: due to #6647 I need to change the way I access the pods via the informer because they have different keys in the index. Just querying the |
e708e5e
to
c31bbcd
Compare
re-requested a review to see if that goes into the right direction. Several things still need to be done:
|
another TODO (mostly for me, but if anyone has inputs on those todos, please chime in!)
|
9b6f526
to
64987e3
Compare
634bbad
to
2accf4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is looking great! I'm going to keep the comments coming
784bd4c
to
63a2da0
Compare
Codecov Report
@@ Coverage Diff @@
## master #6714 +/- ##
==========================================
+ Coverage 48.58% 48.75% +0.16%
==========================================
Files 267 265 -2
Lines 19457 19395 -62
==========================================
+ Hits 9454 9456 +2
+ Misses 8946 8881 -65
- Partials 1057 1058 +1
Continue to review full report at Codecov.
|
To be able to test the progress feature in an e2e test I need to adjust the workflow-controller-configmap for that specific test. I couldn't find an example how to do that, is this not yet a case covered? |
aa58fe2
to
e89c7a2
Compare
94a412a
to
11a7e34
Compare
I would just use a script to do this rather then force a new argosay image. Argosay does contain a shell. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to push you to simplify.
Try syncing with master to fix your builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look great btw!
30f918e
to
d2c7f65
Compare
51fee1b
to
be986ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- One comment on what looks like a bug.
- Please sync with master.
- Please dismiss this review when done.
83dc731
to
966e677
Compare
2ccc723
to
747e2e8
Compare
original code from: https://github.com/argoproj/argo-workflows/pull/4015/files closes argoproj#1658, argoproj#4245 Signed-off-by: Michael Weibel <[email protected]>
747e2e8
to
98b8d56
Compare
@@ -77,6 +79,12 @@ type WorkflowExecutor struct { | |||
// list of errors that occurred during execution. | |||
// the first of these is used as the overall message of the node | |||
errors []error | |||
|
|||
// current progress which is synced every `annotationPatchTickDuration` to the pods annotations. | |||
progress string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor - could be Progress
type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make a PR to change this, probably tomorrow 👍
This is pretty cool! 🚀 |
I think this is a neat feature, as esp useful for CI use cases, or long running tasks. |
amazing, thanks for merging and the great reviews @alexec! We started using this already and it works well so far (though we used a to version 3.2 backported branch of it). Looking forward to the release :) |
port of argo-workflow pod progress code to current codebase and extended to handle EOF streams.
most of the code is by @alexec from: https://github.com/argoproj/argo-workflows/pull/4015/files
closes #1658, #4245
I just started to test this code in my cluster, therefore I'd like to wait with considering this PR ready until I'm more confident.
An initial review would be great though!
I didn't implement the part with reporting also a progress message as the whole pod status reporting got changed meanwhile and I'm not sure if we need it anyway.
Thanks :)