Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GatewayAPI Session Affinity not honored #1647

Open
ethankhall opened this issue May 20, 2024 · 1 comment
Open

GatewayAPI Session Affinity not honored #1647

ethankhall opened this issue May 20, 2024 · 1 comment

Comments

@ethankhall
Copy link

ethankhall commented May 20, 2024

Describe the bug

When setting a Canary object to so session affinity with an Kubernete API Gateway like in Session Affinity. I was running a K6 test to verify that users were assigned to a version, and weren't shifted back on a successful deploy.

I noticed that within 1 second, all the users were assigned to the next version.

I believe this is happening because the HTTPRoute being created doesn't pin the user to the primary version.

HTTPRoute

spec:
  hostnames:
  - charmander.example.com
  parentRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: default-gateway
    namespace: istio-ingress
  rules:
  - backendRefs:
    - group: ""
      kind: Service
      name: charmander-primary
      port: 9898
      weight: 0
    - group: ""
      kind: Service
      name: charmander-canary
      port: 9898
      weight: 100
    matches:
    - headers:
      - name: Cookie
        type: RegularExpression
        value: .*flagger-cookie.*nROEvCteRd.*
      path:
        type: PathPrefix
        value: /
  - backendRefs:
    - group: ""
      kind: Service
      name: charmander-primary
      port: 9898
      weight: 95
    - filters:
      - responseHeaderModifier:
          add:
          - name: Set-Cookie
            value: flagger-cookie=nROEvCteRd; Max-Age=3600
        type: ResponseHeaderModifier
      group: ""
      kind: Service
      name: charmander-canary
      port: 9898
      weight: 5
    matches:
    - path:
        type: PathPrefix
        value: /

Note, charmander is a deployment of ghcr.io/stefanprodan/podinfo

To Reproduce

K8s Yaml and K6 script

---
# Source: charmander/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: charmander
  namespace: charmander
  labels:
    app.kubernetes.io/name: charmander
    app.kubernetes.io/component: "web"
spec:
  minReadySeconds: 5
  replicas: 3
  revisionHistoryLimit: 5
  progressDeadlineSeconds: 60
  strategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
  selector:
    matchLabels:
      app.kubernetes.io/name: charmander
      app.kubernetes.io/component: "web"
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9797"
        unique-title: 'greetings from deploy v1'
      labels:
        app.kubernetes.io/name: charmander
        app.kubernetes.io/component: "web"
    spec:
      containers:
      - name: podinfod
        image: ghcr.io/stefanprodan/podinfo:6.5.0
        imagePullPolicy: IfNotPresent
        ports:
        - name: http
          containerPort: 9898
          protocol: TCP
        - name: http-metrics
          containerPort: 9797
          protocol: TCP
        - name: grpc
          containerPort: 9999
          protocol: TCP
        command:
        - ./podinfo
        - --port=9898
        - --port-metrics=9797
        - --grpc-port=9999
        - --grpc-service-name=podinfo
        - --level=info
        - --random-delay=false
        - --random-error=true
        env:
        - name: PODINFO_UI_COLOR
          value: "#34577c"
        - name: PODINFO_UI_MESSAGE
          valueFrom:
            fieldRef:
              fieldPath: metadata.annotations['unique-title']
        startupProbe:
          exec:
            command:
            - podcli
            - check
            - http
            - localhost:9898/healthz
          initialDelaySeconds: 30
          timeoutSeconds: 5
        resources:
          limits:
            cpu: 2000m
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 64Mi
---
# Source: charmander/templates/canary.yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: charmander-canary
  namespace: charmander
spec:
# when set to true, deploy will auto succeed, only use during an emergency.
  skipAnalysis: false
  # deployment reference
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: charmander
  # the maximum time in seconds for the canary deployment
  # to make progress before it is rollback (default 600s)
  progressDeadlineSeconds: 120
  service:
    gatewayRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: default-gateway
      namespace: istio-ingress
    hosts:
    - 'charmander.example.com'
    port: 9898
    targetPort: 9898
  analysis:
    interval: 1m
    maxWeight: 50
    metrics: []
    sessionAffinity:
      cookieName: flagger-cookie
      maxAge: 3600
    stepWeight: 10
    threshold: 5

And running the k6 script

import http from 'k6/http';
import { check, sleep } from 'k6';

export const URL = "https://charmander.example.com/"
export const options = {
    // A number specifying the number of VUs to run concurrently.
    vus: 6,
    // A string specifying the total duration of the test run.
    duration: '600s',
    // Disable clearing cookies
    noCookiesReset: true
};

function parseRevision(resp) {
    try {
        return resp.json().message;
    } catch (e) {
        return null
    }
}

export function setup() {
    return { revision: null, changeCount: 0 };
}

export default function (data) {
    var resp = http.get(URL);

    var revision = parseRevision(resp);
    if (data.revision == null) {
        console.log(`VU initial version ${revision}`)
        data.revision = revision;
    }

    if (revision && revision !== data.revision) {
        data.changeCount++;
        console.log(data.revision + " : " + revision)
        data.revision = revision;
    }

    check(resp, { 'changeCount < 2': () => data.changeCount < 2 });
}

export function teardown(data) {
    console.log(data);
}

The output looks like

    scenarios: (100.00%) 1 scenario, 6 max VUs, 10m30s max duration (incl. graceful stop):
              * default: 6 looping VUs for 10m0s (gracefulStop: 30s)

INFO[0000] VU initial version greetings from deploy v2   source=console
INFO[0000] VU initial version greetings from deploy v1   source=console
INFO[0000] VU initial version greetings from deploy v1   source=console
INFO[0000] VU initial version greetings from deploy v2   source=console
INFO[0000] VU initial version greetings from deploy v1   source=console
INFO[0000] VU initial version greetings from deploy v1   source=console
INFO[0000] greetings from deploy v1 : greetings from deploy v2  source=console
INFO[0000] greetings from deploy v1 : greetings from deploy v2  source=console
INFO[0000] greetings from deploy v1 : greetings from deploy v2  source=console
INFO[0001] greetings from deploy v1 : greetings from deploy v2  source=console
INFO[0600] {"changeCount":0,"revision":null}             source=console

     ✓ changeCount < 2

     █ setup

     █ teardown

     checks.........................: 100.00% ✓ 63985      ✗ 0
     data_received..................: 27 MB   46 kB/s
     data_sent......................: 3.0 MB  4.9 kB/s
     http_req_blocked...............: avg=50.85µs min=0s      med=1µs     max=695.65ms p(90)=1µs     p(95)=1µs
     http_req_connecting............: avg=11.94µs min=0s      med=0s      max=86.31ms  p(90)=0s      p(95)=0s
     http_req_duration..............: avg=55.93ms min=33.96ms med=53.5ms  max=461.31ms p(90)=64.63ms p(95)=78.13ms
       { expected_response:true }...: avg=56.53ms min=33.96ms med=53.33ms max=461.31ms p(90)=66.94ms p(95)=87.43ms
     http_req_failed................: 35.18%  ✓ 22515      ✗ 41470
     http_req_receiving.............: avg=1.57ms  min=6µs     med=46µs    max=308.44ms p(90)=122µs   p(95)=413.79µs
     http_req_sending...............: avg=80.69µs min=8µs     med=43µs    max=26.45ms  p(90)=85µs    p(95)=130µs
     http_req_tls_handshaking.......: avg=32.48µs min=0s      med=0s      max=301.47ms p(90)=0s      p(95)=0s
     http_req_waiting...............: avg=54.28ms min=33.81ms med=53.06ms max=461.21ms p(90)=61.73ms p(95)=65.97ms
     http_reqs......................: 63985   106.637746/s
     iteration_duration.............: avg=56.24ms min=1.79µs  med=53.74ms max=772.84ms p(90)=64.99ms p(95)=78.55ms
     iterations.....................: 63985   106.637746/s
     vus............................: 6       min=6        max=6
     vus_max........................: 6       min=6        max=6


running (10m00.0s), 0/6 VUs, 63985 complete and 0 interrupted iterations
default ✓ [======================================] 6 VUs  10m0s

Expected behavior

When running, the users are ~ the correct percent of assigned users.

Additional context

  • Flagger version: 1.36.1
  • Kubernetes version: 1.25
  • Service Mesh provider: GatewayAPI + Istio 1.20.3
  • Ingress provider: GatewayAPI
@ethankhall
Copy link
Author

Maybe related to #1532

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant