Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Telemetry Reporting to Metrics Server and/or Prometheus #896

Closed
rrichardson opened this issue Jun 28, 2018 · 1 comment
Closed
Milestone

Comments

@rrichardson
Copy link

Feature Request

For analytic and batch workflows, having precise telemetry is terribly important. We need to analyze run time as well as memory and CPU usage to be able to further tune scheduling of jobs.

What would be terribly nice to have is a hierarchy of stats posted to MetricsServer or the Prometheus PushGateway, or some configurable endpoint. All of the standard pod stats that are reported by metrics-server would be great.

I'm not terribly clear on the mechanics of metrics-server -> Prometheus. I'm assuming that there is some logic that tells it to only report on pods if they were spun up by Deployments or similar. Maybe there is something that can be issued from the Argo operator when it is creating new workflows to ensure they are captured into Prometheus.

I am happy to help implement this, but I have no idea where to begin with regards to where to hook for stat collection and reporting.

jessesuen pushed a commit that referenced this issue Aug 13, 2018
* Prometheus metrics server
* Use unstructured informer
* Fix linter errors
* Use a dedicated informer for metrics
* Pass context to RunServer. Close the http server
* Check the return value in metrics defer Close
@jessesuen jessesuen added this to the v2.2 milestone Aug 13, 2018
@jessesuen
Copy link
Member

Implemented in b9cffe9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants