Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC for rocksdb memory usage metrics #3604

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

dacox
Copy link

@dacox dacox commented Feb 13, 2024

Related to payload memory issues discussed in #3501 and tackled in #3557 and #3565

This PR is a PoC for exposing get_memory_usage_stats from rust-rocksdb for tracking memory usage in rocksdb.

This PR adds segment level stats to /telemetry and aggregated stats to /metrics.

As far as I can tell, rust-rocksdb does not make stats available for column families, only for databases.

However, on my local test bench I saw one trend which correlated with the extreme RAM usage in our production environment, mem_table_readers_total, which seems to grow linearly wrt the number of vectors with payloads.

Screenshot 2024-02-12 at 4 37 43 PM

I built this to scratch an itch and try and solve the memory problems plaguing us - but am also interested in getting a concept ACK on this, as I'm sure the code could be improved.

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@dacox
Copy link
Author

dacox commented Feb 14, 2024

@timvisee @agourlay curious to get your thoughts on this when you have time

@agourlay
Copy link
Member

@dacox Thanks for putting this together, I am not sure yet if we want to expose details regarding Rocksdb in our telemetry.

In any case, you could try rebasing since I merged a PR that upgrade RocksDB to a much newer version #3624

First you might be able to see if the memory usage improves.

Second there is a new statistics API available that you could play with (rust-rocksdb/rust-rocksdb#853 & rust-rocksdb/rust-rocksdb#854)

@dacox
Copy link
Author

dacox commented Feb 20, 2024

@dacox Thanks for putting this together, I am not sure yet if we want to expose details regarding Rocksdb in our telemetry.

In any case, you could try rebasing since I merged a PR that upgrade RocksDB to a much newer version #3624

First you might be able to see if the memory usage improves.

Second there is a new statistics API available that you could play with (rust-rocksdb/rust-rocksdb#853 & rust-rocksdb/rust-rocksdb#854)

Thanks @agourlay I'll take a look at the new methods.

It makes sense you don't want users digging into rocksdb stuff. However, I think it could have real utility at a different "metrics reporting level" (ie. not the default configuration for /telemetry and /metrics).

Because if there is weirdness relating to rocksdb memory usage, without some way to dig into telemetry the cluster operator is flying blind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants