Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request]I would like to monitor the kubedl_jobs_failed metric, but the label only supports kind and does not allow retrieving the jobName. The experience with exposed metrics is very unsatisfactory. #309

Open
13241308289 opened this issue Feb 27, 2024 · 3 comments · May be fixed by #313
Assignees

Comments

@13241308289
Copy link
Contributor

I would like to monitor the kubedl_jobs_failed metric, but the label only supports kind and does not allow retrieving the jobName. The experience with exposed metrics is very unsatisfactory.

eg:

kubedl_jobs_failed{endpoint="metrics", instance="", job="kubedl", kind="marsjob", namespace="kubedl-system", pod="", service="kubedl"}


@SimonCqk
Copy link
Collaborator

@13241308289 Hi, thanks for the feedback! The reason we didn't initially include this label was due to the limited capacity of Prometheus's data backend, which doesn't actively purge data that's been stored for an extended period. We assessed that it might not be well-suited for job scenarios. However, it seems user experience is also quite significant, so let's go ahead and add it. Would you be interested in contributing to this?

@13241308289
Copy link
Contributor Author

I took another look at the code, and it turns out that this metric is initialized at the controller layer, which is why it's not possible to expose the jobName label. I believe that if we want to expose specific labels like jobName, we should adopt an implementation similar to kubedl_jobs_first_pod_launch_delay_seconds. Of course, I would be very happy to implement this, as my business also has this requirement.

@SimonCqk
Copy link
Collaborator

I took another look at the code, and it turns out that this metric is initialized at the controller layer, which is why it's not possible to expose the jobName label. I believe that if we want to expose specific labels like jobName, we should adopt an implementation similar to kubedl_jobs_first_pod_launch_delay_seconds. Of course, I would be very happy to implement this, as my business also has this requirement.

thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants