Out Of Memory issue on k8s #167

Sindvero · 2021-10-20T20:28:06Z

Hello everyone,

I'm trying to run the anomaly detection on k8s, but at some point, I get a OOM error from the node where I run my anomaly pod.

Here's my configuration file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-anomaly-detector
spec:
  selector:
    matchLabels:
      app: prometheus-anomaly-detector
  replicas: 3
  template:
    metadata:
      labels:
        app: prometheus-anomaly-detector
    spec:
      containers:
        - env:
            - name: FLT_PROM_URL
              value: "http:https://prometheus-k8s:9090"
            - name: FLT_METRICS_LIST
              value: 'kafka_controller_controllereventmanager_eventqueuetimems'
            - name: FLT_RETRAINING_INTERVAL_MINUTES
              value: "30"
            - name: FLT_ROLLING_TRAINING_WINDOW_SIZE
              value: "15d"
            - name: FLT_DEBUG_MODE
              value: "True"
            - name: APP_FILE
              value: "app.py"
            - name: MLFLOW_TRACKING_URI
              value: "http:https://localhost:5000"
          name: prometheus-anomaly-detector
          image: quay.io/aicoe/prometheus-anomaly-detector:latest
          ports:
          - name: metrics
            containerPort: 8080

I'm not sure how to change the confg to avoid having OOM on the node. Does anyone get this issue before too? Should put the retraining interval bigger?

Thanks

The text was updated successfully, but these errors were encountered:

Sindvero closed this as completed Oct 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out Of Memory issue on k8s #167

Out Of Memory issue on k8s #167

Sindvero commented Oct 20, 2021

Out Of Memory issue on k8s #167

Out Of Memory issue on k8s #167

Comments

Sindvero commented Oct 20, 2021