Skip to content
This repository has been archived by the owner on Jun 7, 2023. It is now read-only.

Out Of Memory issue on k8s #167

Closed
Sindvero opened this issue Oct 20, 2021 · 0 comments
Closed

Out Of Memory issue on k8s #167

Sindvero opened this issue Oct 20, 2021 · 0 comments

Comments

@Sindvero
Copy link

Hello everyone,

I'm trying to run the anomaly detection on k8s, but at some point, I get a OOM error from the node where I run my anomaly pod.

Here's my configuration file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-anomaly-detector
spec:
  selector:
    matchLabels:
      app: prometheus-anomaly-detector
  replicas: 3
  template:
    metadata:
      labels:
        app: prometheus-anomaly-detector
    spec:
      containers:
        - env:
            - name: FLT_PROM_URL
              value: "http:https://prometheus-k8s:9090"
            - name: FLT_METRICS_LIST
              value: 'kafka_controller_controllereventmanager_eventqueuetimems'
            - name: FLT_RETRAINING_INTERVAL_MINUTES
              value: "30"
            - name: FLT_ROLLING_TRAINING_WINDOW_SIZE
              value: "15d"
            - name: FLT_DEBUG_MODE
              value: "True"
            - name: APP_FILE
              value: "app.py"
            - name: MLFLOW_TRACKING_URI
              value: "http:https://localhost:5000"
          name: prometheus-anomaly-detector
          image: quay.io/aicoe/prometheus-anomaly-detector:latest
          ports:
          - name: metrics
            containerPort: 8080

I'm not sure how to change the confg to avoid having OOM on the node. Does anyone get this issue before too? Should put the retraining interval bigger?

Thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant