Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory consumption due to ConfigMap watches #967

Open
jotak opened this issue Nov 2, 2023 · 1 comment
Open

High memory consumption due to ConfigMap watches #967

jotak opened this issue Nov 2, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@jotak
Copy link

jotak commented Nov 2, 2023

Before anything, note than I am not a datadog user: I'm a developer of another OLM-based operator and, while investigating memory issues, out of curiosity I wanted to test a bunch of other operators to see who else was impacted by the same issue, and it seems datadog operator is. I haven't done a deep investigation on datadog-operator in particular, if you think that this is a false-positive then I apologize for the inconvenience and you can close this issue.

Describe what happened:

I ran a simple test: installing a bunch of operators, monitoring memory consumption, creating a dummy namespace with many configmaps within. On some operators, the memory consumption remained stable; on others like this one, it increased linearly with the created configmaps.

Capture d’écran du 2023-10-27 09-03-34

This could be on purpose but my assumption is that there is little chance that your operator actually needs to watch every configmaps (is it correct?). This is a quite common problem that has been documented here: https://sdk.operatorframework.io/docs/best-practices/designing-lean-operators/#overview :

"One of the pitfalls that many operators are failing into is that they watch resources with high cardinality like secrets possibly in all namespaces. This has a massive impact on the memory used by the controller on big clusters."

From my experience, with some customers it counts in gigabytes of overhead. And I would add that it's not only about memory usage, it's also stressing Kube API with a lot of traffic.

The article above suggests a remediation using cache configuration: if this would solve the problem for you, that's great!

But in case it's more complicated, you might want to chime in here: kubernetes-sigs/controller-runtime#2570 . I'm proposing to add to controller-runtime more possibilities regarding cache management, but for that I would like to probe a bit the different use cases among OLM users, in order to understand if the solution that I'm suggesting would help others or not. I guess the goal is to find a solution that suits for the most of OLM-based operators that still struggle with that, rather than each implementing its own custom cache management.

Steps to reproduce the issue:

  • Install the operator
  • Watch memory used
  • kubectl create namespace test
  • for i in {1..500}; do kubectl create cm test-cm-$i -n test --from-file=<INSERT BIG FILE HERE> ; done

Additional environment details (Operating System, Cloud provider, etc):

Using Openshift 4.14 on AWS

@celenechang
Copy link
Contributor

Hi @jotak , really appreciate the issue and the details provided. We will look into this on our end.

@CharlyF CharlyF added the bug Something isn't working label Nov 22, 2023
Dog-Gone-Earl added a commit that referenced this issue May 7, 2024
Adding *Note* on increased memory usage with adding namesapces referenced: #967
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants