High memory consumption due to ConfigMap watches #967

jotak · 2023-11-02T10:07:22Z

Before anything, note than I am not a datadog user: I'm a developer of another OLM-based operator and, while investigating memory issues, out of curiosity I wanted to test a bunch of other operators to see who else was impacted by the same issue, and it seems datadog operator is. I haven't done a deep investigation on datadog-operator in particular, if you think that this is a false-positive then I apologize for the inconvenience and you can close this issue.

Describe what happened:

I ran a simple test: installing a bunch of operators, monitoring memory consumption, creating a dummy namespace with many configmaps within. On some operators, the memory consumption remained stable; on others like this one, it increased linearly with the created configmaps.

This could be on purpose but my assumption is that there is little chance that your operator actually needs to watch every configmaps (is it correct?). This is a quite common problem that has been documented here: https://sdk.operatorframework.io/docs/best-practices/designing-lean-operators/#overview :

"One of the pitfalls that many operators are failing into is that they watch resources with high cardinality like secrets possibly in all namespaces. This has a massive impact on the memory used by the controller on big clusters."

From my experience, with some customers it counts in gigabytes of overhead. And I would add that it's not only about memory usage, it's also stressing Kube API with a lot of traffic.

The article above suggests a remediation using cache configuration: if this would solve the problem for you, that's great!

But in case it's more complicated, you might want to chime in here: kubernetes-sigs/controller-runtime#2570 . I'm proposing to add to controller-runtime more possibilities regarding cache management, but for that I would like to probe a bit the different use cases among OLM users, in order to understand if the solution that I'm suggesting would help others or not. I guess the goal is to find a solution that suits for the most of OLM-based operators that still struggle with that, rather than each implementing its own custom cache management.

Steps to reproduce the issue:

Install the operator
Watch memory used
kubectl create namespace test
for i in {1..500}; do kubectl create cm test-cm-$i -n test --from-file=<INSERT BIG FILE HERE> ; done

Additional environment details (Operating System, Cloud provider, etc):

Using Openshift 4.14 on AWS

The text was updated successfully, but these errors were encountered:

celenechang · 2023-11-03T16:28:57Z

Hi @jotak , really appreciate the issue and the details provided. We will look into this on our end.

Adding *Note* on increased memory usage with adding namesapces referenced: #967

CharlyF added the bug Something isn't working label Nov 22, 2023

Dog-Gone-Earl added a commit that referenced this issue May 7, 2024

Update datadog_monitor.md

6eff04d

Adding *Note* on increased memory usage with adding namesapces referenced: #967

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High memory consumption due to ConfigMap watches #967

High memory consumption due to ConfigMap watches #967

jotak commented Nov 2, 2023

celenechang commented Nov 3, 2023

High memory consumption due to ConfigMap watches #967

High memory consumption due to ConfigMap watches #967

Comments

jotak commented Nov 2, 2023

celenechang commented Nov 3, 2023