Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metrics): add Vector Throughput & health (via prometheus) #5030

Open
Tracked by #5029
lsampras opened this issue Jun 18, 2024 · 4 comments
Open
Tracked by #5029

feat(metrics): add Vector Throughput & health (via prometheus) #5030

lsampras opened this issue Jun 18, 2024 · 4 comments
Assignees
Labels
A-infra Area: Infrastructure C-feature Category: Feature request or enhancement good first issue Good for newcomers

Comments

@lsampras
Copy link
Member

lsampras commented Jun 18, 2024

Add a dashboard to monitor vector throughput usage and log loss.
The dashboard should show throughput for the following pipes

Throughput

  1. stdout -> loki
  2. stdout -> opensearch
  3. kafka -> loki
  4. Kafka -> transform -> opensearch
    These flows should include incoming events / outgoing events & dropped events as a time series chart

kafka source should contain consumer lag metrics as well

Health (this would be primarily powered by these metrics)

  • CPU usage of vector
  • Memory usage of vector
  • buffer size
  • errors happening in transforms
  • utilization of each component

Ideally we can take most of the components from a openly available data source by modifying some components to make it geared towards our setup

@lsampras lsampras added A-infra Area: Infrastructure C-feature Category: Feature request or enhancement labels Jun 18, 2024
@lsampras lsampras changed the title Vector Throughput (via prometheus) feat(metrics): add Vector Throughput & health (via prometheus) Jun 18, 2024
@lsampras lsampras added the good first issue Good for newcomers label Jun 19, 2024
@Prashant-dot1
Copy link

@lsampras I am interested in working on this task

@lsampras
Copy link
Member Author

Hey @Prashant-dot1,
Thanks for your interest, this issue is available for contribution.

Since this is somewhat of an open issue without fixed specifications.
We prefer to get a bit of details about the implementation

  • is there any existing dashboard that you would be using entirely or as a reference?
  • do you plan to create your own dashboard for this?

@Prashant-dot1
Copy link

@lsampras I am thinking of taking help of these openly available dashboards (these would need modification according to the task)-

Health metrics or system-level metrics, tracking how well the Vector instance is handling all the event pipes together - https://grafana.com/grafana/dashboards/19649-vector-monitoring/

https://grafana.com/grafana/dashboards/721-kafka/

The dashboard structure could be something like this -
Row 1: Four panels (one for each pipeline) that show throughput metrics: incoming, outgoing, and dropped events.
Row 2: Kafka metrics, specifically consumer lag for the Kafka-related pipelines.
Row 3: General health metrics like CPU usage, memory usage, buffer utilization, and error tracking for the overall system.

@lsampras
Copy link
Member Author

@Prashant-dot1 the shared design looks good...
I'll assign this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-infra Area: Infrastructure C-feature Category: Feature request or enhancement good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants