Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oasis operation broken after update to 1.3.3 #108

Open
behnle opened this issue Jul 22, 2024 · 16 comments
Open

Oasis operation broken after update to 1.3.3 #108

behnle opened this issue Jul 22, 2024 · 16 comments

Comments

@behnle
Copy link

behnle commented Jul 22, 2024

Dear NOMAD developers,
i operate a NOMAD Oasis with decentralized user management.
After an update of NOMAD to version 1.3.3 / the latest docker image, i am unable to log into my Oasis.
The setup is as follows:

  • NOMAD is installed from the official Docker images, the setup used to work
  • Keycloak v. 25.0.2 is run as a separate instance as it is also used as SSO provider for further systems
  • NOMAD is registered as OIDC client, Keycloak is configured in nomad.yaml

Observations:

  • I cleared all cookies to make things reproducible
  • When clicking on "LOGIN/REGISTER", i am successfully redirected to the login page of my Keycloak installation
    After entering the credentials, i am redirected to NOMAD, however the page looks the same as before. When clicking on "PUBLISH -> UPLOADS", NOMAD tells me "You have to login to use this functionality.", although i just logged in.
  • After logging in, there are two successful events for my user in the Keycloak log, one "LOGIN" event, and one "CODE_TO_TOKEN" event, as one would expect.
  • The Firefox network analysis shows that there is a successful transfer of a "token" file from Keycloak to NOMAD.
  • HOWEVER, when i look under "web storage" -> "Cookies", there is only the "terms of service" cookie but NOT the expected session cookie.
  • The very same keycloak successfully authenticates for Chemotion as OIDC client and eLabFTW as SAML client, hence, i believe the setup works in principle
  • The setup used to work flawlessly until i updated NOMAD to v1.3.3

NOMAD stack:

[root@u-030-s007 nomad]# docker compose ps
WARN[0000] /dockerdata/nomad/docker-compose.yaml: `version` is obsolete 
NAME                   IMAGE                                                      COMMAND                  SERVICE    CREATED       STATUS                 PORTS
nomad_oasis_app        gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-fair:latest   "./run.sh"               app        3 hours ago   Up 3 hours (healthy)   8000/tcp, 9000/tcp
nomad_oasis_elastic    docker.elastic.co/elasticsearch/elasticsearch:7.17.1       "/bin/tini -- /usr/l…"   elastic    3 hours ago   Up 3 hours (healthy)   9200/tcp, 9300/tcp
nomad_oasis_mongo      mongo:5.0.6                                                "docker-entrypoint.s…"   mongo      3 hours ago   Up 3 hours (healthy)   27017/tcp
nomad_oasis_north      gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-fair:latest   "python -m nomad.cli…"   north      3 hours ago   Up 3 hours (healthy)   8000/tcp, 9000/tcp
nomad_oasis_proxy      nginx:latest                                               "/docker-entrypoint.…"   proxy      3 hours ago   Up 3 hours             0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp
nomad_oasis_rabbitmq   rabbitmq:3.11.5                                            "docker-entrypoint.s…"   rabbitmq   3 hours ago   Up 3 hours (healthy)   4369/tcp, 5671-5672/tcp, 15691-15692/tcp, 25672/tcp
nomad_oasis_worker     gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-fair:latest   "./run-worker.sh"        worker     3 hours ago   Up 3 hours             8000/tcp, 9000/tcp

images:

[root@u-030-s007 nomad]# docker image ls
REPOSITORY                                                                      TAG       IMAGE ID       CREATED         SIZE
nginx                                                                           latest    fffffc90d343   4 weeks ago     188MB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-fair                               latest    8efd598a6dc5   7 weeks ago     2.03GB
nginx                                                                           <none>    e4720093a3c1   5 months ago    187MB
nginx                                                                           <none>    92b11f67642b   5 months ago    187MB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-fair                               <none>    279c097945fe   5 months ago    1.88GB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-fair                               <none>    dae1849135eb   6 months ago    1.81GB
python                                                                          latest    e7177b0afd0e   7 months ago    1.02GB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-remote-tools-hub/jupyterlab        latest    f1b5e187ee1e   8 months ago    6.39GB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-remote-tools-hub/jupyterlab        prod      f1b5e187ee1e   8 months ago    6.39GB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-remote-tools-hub/nexus-webtop      latest    548857bf45d9   8 months ago    7.43GB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-remote-tools-hub/apmtools-webtop   latest    125e01c59a73   8 months ago    5.29GB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-remote-tools-hub/webtop            latest    603c690b7911   8 months ago    1.65GB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-remote-tools-hub/ellips-jupyter    latest    4e3e12da664c   8 months ago    6.22GB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-remote-tools-hub/xps-jupyter       latest    5bf19c880ab6   8 months ago    5.65GB
nginx                                                                           <none>    a8758716bb6a   9 months ago    187MB
jupyter/datascience-notebook                                                    latest    f78a42f3bc9a   9 months ago    5.92GB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-fair                               v1.2.1    cc8dd7c53b3c   10 months ago   1.67GB
rabbitmq                                                                        3.11.5    3ddcc140fe5c   19 months ago   228MB
mongo                                                                           5.0.6     532c84506200   2 years ago     699MB
docker.elastic.co/elasticsearch/elasticsearch                                   7.17.1    515ab4fba870   2 years ago     618MB

The (redacted) keycloak part of nomad.yaml:

keycloak:
  server_url: 'https://keycloak.my-uni.de/'
  public_server_url: 'https://keycloak.my-uni.de/'
  realm_name: 'fdm'
  username: 'nomad-admin'
  password: censored
  client_id: 'nomad-1'
  client_secret: censored

There are no obvious errors in the docker-compose logs of Keycloak or NOMAD, there are no errors in the Keycloak GUI, there are no errors in the browser console, it just looks as if NOMAD does not set the session cookie.
Have there been any changes in NOMAD from 1.2 to 1.3 which would require a reconfiguration of the client settings in Keycloak?
What can i do to further track down the root cause of the issue?
The only maybe relevant warning is the following:

Das Cookie "Authorization" verfügt über keinen gültigen Wert für das "SameSite"-Attribut. Bald werden Cookies ohne das "SameSite"-Attribut oder mit einem ungültigen Wert dafür als "Lax" behandelt. Dadurch wird das Cookie nicht länger an Kontexte gesendet, die zu einem Drittanbieter gehören. Falls Ihre Anwendung das Cookie in diesen Kontexten benötigt, fügen Sie bitte das Attribut "SameSite=None" zu ihm hinzu. Weitere Informationen zum "SameSite"-Attribut finden Sie unter https://developer.mozilla.org/docs/Web/HTTP/Headers/Set-Cookie/SameSite.(https://my-uni/nomad-oasis/gui/static/node_modules/universal-cookie/es6/Cookies.js)  [Cookies.js:57:12]

If it helps, i can also provide you with the client settings in Keycloak

@lauri-codes
Copy link
Contributor

Hi @behnle!

This seems like a hard problem to track down. I cannot reproduce anything like this if I e.g. run a local Oasis and use the central authentication, or in our current deployments which are based on 1.3.3 and 1.3.4 and use the central authentication.

I think @blueraft or @markus1978 might know better if anything critical has changed in 1.3.3 that could cause this. Would you know which version of the nomad-fair docker image worked for you previously, or if you have updated your Keycloak version (or was it also 25.0.2 when you had a working NOMAD deployment)? This might help in tracking down the issue.

@blueraft
Copy link
Contributor

I don't believe anything changed with regards to keycloak. @Sideboard was looking into updating the keycloak version but we are still using keycloak:16.1.1 in our examples.

@behnle
Copy link
Author

behnle commented Jul 23, 2024

Thanks for Your replies @lauri-codes @blueraft . The last version that is remember to work was an 1.2.2 (?) image (with SHA256 sum 279c097945fe553be09e8f50d0502f20210836eff3d8b5c6b2213f8297b32724)

docker image inspect
[root@u-030-s007 nomad]# docker image inspect 279c097945fe
[
    {
        "Id": "sha256:279c097945fe553be09e8f50d0502f20210836eff3d8b5c6b2213f8297b32724",
        "RepoTags": [],
        "RepoDigests": [
            "gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-fair@sha256:be4b78aa30969cd88b6ca23841a07282496649e0a98b7645b607b072ddf235a2"
        ],
        "Parent": "",
        "Comment": "buildkit.dockerfile.v0",
        "Created": "2024-02-06T15:04:27.110797819+01:00",
        "DockerVersion": "",
        "Author": "",
        "Config": {
            "Hostname": "",
            "Domainname": "",
            "User": "nomad",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "8000/tcp": {},
                "9000/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "LANG=C.UTF-8",
                "GPG_KEY=E3FF2839C048B25C084DEBE9B26995E310250568",
                "PYTHON_VERSION=3.9.18",
                "PYTHON_PIP_VERSION=23.0.1",
                "PYTHON_SETUPTOOLS_VERSION=58.1.0",
                "PYTHON_GET_PIP_URL=https://github.com/pypa/get-pip/raw/049c52c665e8c5fd1751f942316e0a5c777d304f/public/get-pip.py",
                "PYTHON_GET_PIP_SHA256=7cfd4bdc4d475ea971f1c0710a5953bcc704d171f83c797b9529d9974502fcc6",
                "PYTHONPATH=/app/plugins"
            ],
            "Cmd": [
                "python3"
            ],
            "ArgsEscaped": true,
            "Image": "",
            "Volumes": {
                "/app/.volumes/fs": {}
            },
            "WorkingDir": "/app",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": null
        },
        "Architecture": "amd64",
        "Os": "linux",
        "Size": 1883886525,
        "GraphDriver": {
            "Data": {
                "LowerDir": "/dockerdata/volumes/overlay2/115e904ebcd76c1bccfcda5549ab1681b895babfbe1968d16afa6677daa3bf26/diff:/dockerdata/volumes/overlay2/dc501ca5cfceb3e0334bf4caa3bccd2f2113a06cc9cb94f259362a9f5726b663/diff:/dockerdata/volumes/overlay2/d676ebd69562462e8b2be0084c697feab1a0426a9446cd5f5cdc402210842722/diff:/dockerdata/volumes/overlay2/ad6d73441a8db834f40746356502198b73ec3debfabd0185ce6ed9ede70f056c/diff:/dockerdata/volumes/overlay2/7294a6e95fc74c2a72da9f485fcb67095896fdb103969faba9971e9dd25e4582/diff:/dockerdata/volumes/overlay2/9d8ec0b0c1ca04bfe0293af096cde23979b7f707892ce372f14635718686645c/diff:/dockerdata/volumes/overlay2/af761b214b08f5ba00d1d01db95ff92213a4557a97a40b10ea5832d240f3151e/diff:/dockerdata/volumes/overlay2/433c9a4a3676ffe5e142deb6b01dff36581174cacf86293054770853c6f03ce9/diff:/dockerdata/volumes/overlay2/3efe44a4d06aa62463c0e0cdb7aa697d80891f174c2320c716c6f0fdf7d4e08d/diff:/dockerdata/volumes/overlay2/f5eb910c7070ddb5cab7071b6c841c80a2aecb28a1f0aeed9a51ad48f08b7c17/diff:/dockerdata/volumes/overlay2/a3e9dabd0f9f4cc114303a898c0bb565e9fab76b5ca4af66a91148912df4ff8b/diff:/dockerdata/volumes/overlay2/005be47edaf1c594bfaaff029c48eaf1f246688f4e81f5f2c54e3003168f0af0/diff:/dockerdata/volumes/overlay2/bd771adf438b9ff7270519229801783985b783cd689d88ea11318296fa360deb/diff",
                "MergedDir": "/dockerdata/volumes/overlay2/88f956e0ebbe0d92b456f097129f166b0deef067b767b2dd0ff65cef9d847b77/merged",
                "UpperDir": "/dockerdata/volumes/overlay2/88f956e0ebbe0d92b456f097129f166b0deef067b767b2dd0ff65cef9d847b77/diff",
                "WorkDir": "/dockerdata/volumes/overlay2/88f956e0ebbe0d92b456f097129f166b0deef067b767b2dd0ff65cef9d847b77/work"
            },
            "Name": "overlay2"
        },
        "RootFS": {
            "Type": "layers",
            "Layers": [
                "sha256:fb1bd2fc52827db4ce719cc1aafd4a035d68bc71183b3bc39014f23e9e5fa256",
                "sha256:da5d55102092b80b04fcb9e6cce42b12f7c53ed72cb1568811576763c9d40786",
                "sha256:c4e334227ccac6bda44f5768a5459ad5f8def8e9bb3df0e5323feffd89b9480b",
                "sha256:087aa9f40b611f4de7ee0079dfd3600cc038b8be247f82c6abf3b99df7a5624d",
                "sha256:18a1e69d7a2d521683b54e7deadc70dbd2b498b68ea2e05115e14c147a5497ff",
                "sha256:a0254b855be6bf5ecad7b09b7b97f28be0b2676e9fbb99b04610318d02cbe279",
                "sha256:e31f05acf7dd708f0f905abfc969cd41eb63afe3a61cc614a61d3be17aec75df",
                "sha256:666aa383b6a2a9c2be8d2f5fa200c20e909ef78b0a9d5f1ce35cb20c9100b100",
                "sha256:fd9f50d931645dab893e2da7c6e77d480117012892612785ffad21f3e57c9b04",
                "sha256:de8b1961466c64fc0577d8a297b3a38a0ee9da8f4ca4dc416f3e2f7acc9ff7c0",
                "sha256:2d93d7cba9761cdc66514bc06946c1ec8724d63ca1632fe15a579d2e679ca7bb",
                "sha256:32867b0930e582286f6909b92d3cfca0acea0cdfa91b5c2aeb2eb66a80b59c1d",
                "sha256:f2772f2d7f778831050d8d449ee9f0a8930cc9fccfcf3bf23766aa78922f6062",
                "sha256:2fb27a2af30200bbeee57f32f116200433ffc2333254894109c642b44739a3ea"
            ]
        },
        "Metadata": {
            "LastTagTime": "0001-01-01T00:00:00Z"
        }
    }
]
I also just downloaded the sample config folder and was able to deploy an 1.3.3 Oasis with Keycloak 16. But this version is out of support for decades (still fighting with an attempt to replace it by Keycloak 25).

The last actions i made was first to update Keycloak from 24.0.5. to 25.0.2 (after which all clients were still working).
Afterwards i updated NOMAD from 1.2.2 (?)/the previous "latest" to 1.3.3, which then caused said issue. All other clients still work.
Unfortunately the migration steps section does not mention any mandatory steps for going from 1.2.2 to 1.3.3, thus i assumed that any mandatory action will be executed on first start in the background.
Before doing the update, i did not delete any container volume. Maybe there is now some old data leftover which causes the issue. Can you tell me which container volume i can safely delete without losing scientific data?
Is there any possibility to increase the log level and inspect the authentication process (specifically why the token that is obviously sent never makes it to the web storage as bearer token)?

@lauri-codes
Copy link
Contributor

lauri-codes commented Jul 23, 2024

It should be possible to increase the Python log level in nomad.yaml (services.console_log_level) to something like DEBUG, but I would assume that the default level of WARNING would already catch any possible problem. I don't think anything that is persisted in the Docker volumes can affect the authentication process, and in general, I would avoid removing any of the volumes if not absolutely necessary.

My first thought would be that there is some incompatibility with Keycloak 25.0.2 and the docker image for NOMAD 1.3.3 (but maybe even older versions of NOMAD). To try and reproduce the problem locally, we could spin up a Keycloak service with version 25.0.2 alongside the other services in the default docker-compose Oasis setup and see if things break. The other likely option could be some JS authentication lib incompatibility.

@blueraft
Copy link
Contributor

deploy an 1.3.3 Oasis with Keycloak 16. But this version is out of support for decades

I wasn't able to find release cycle date info for keycloack. 16.1.1 got released Jan 2022 so I'd be surprised if this was already end of life-d.

@behnle
Copy link
Author

behnle commented Jul 23, 2024

Unfortunately, only the latest version of Keycloak receives security fixes (https://github.com/keycloak/keycloak/security/policy#supported-versions), and even if you buy LTS from RedHat, the oldest version they provide backports for is now 22.x (https://access.redhat.com/articles/7033107). Keycloak has a terribly rapid release cycle, i wish they would spend more time on QA and less time on agile feature development.

@blueraft
Copy link
Contributor

That's unfortunate, I'll check with Sascha about updating to v25 and let you know if we're able to fix the compatibility issue.

@behnle
Copy link
Author

behnle commented Jul 24, 2024

While i still am unable to explain and solve the issue, i can at least provide you with a set of config files for an MWE that reproduces the issue.
The example is adapted from here. Strip the .txt extension for usage when replacing the files from the original example.
The realm import mechanism of Keycloak has been overhauled since v16, hence i placed the realm to be imported in a subdirectory below "configs". Credentials are still "admin" "password".
The MWE behaves as in the original report, namely that you are successfully redirected to Keycloak for SSO authorization and then redirected back, but NOMAD still treats you like you are not logged in.
docker-compose.yaml.txt
nginx.conf.txt
nomad.yaml.txt
nomad-realm.json

@blueraft
Copy link
Contributor

Thank you for the MWE. I am able to reproduce this.

One thing I've noticed is that the keycloak response to authorization requests does not include an Authorization cookie. Can you confirm if it's the same for you?

With v25:

Screenshot 2024-07-25 at 13 04 17

With central NOMAD:

Screenshot 2024-07-25 at 13 04 38

@behnle
Copy link
Author

behnle commented Jul 25, 2024

Indeed i can confirm that the web storage does not contain an authorization cookie after authorization:
Pre-login:
pre-login
After login:
post-login
But there are clearly authorization and code-to-token events logged in KC:
events
(you have to first turn on user event logging)
The question is now where do these get lost?
Because down the road of the browser console, there is the following post event:
header
with the cookie tab being,
cookie
the request
request
and the reply
reply
The call stack of this POST is
callstack
I personally would interpret it such that KC does indeed send a bearer token, but for some obscure reason, there is no cookie afterwards. I though have to admit that this is way outside my comfort zone.

@blueraft
Copy link
Contributor

Thank you for confirming, I'll take a look tomorrow with v1.2 to see if we are doing something differently there.

@blueraft
Copy link
Contributor

I've used the same docker compose file and used nomad v1.2.1 and it doesn't work there either. Same issue with no Authorization cookie being sent back in response headers by Keycloack. Something probably changed on keycloak side then I'd imagine. Does v1.2.1 work for you with the same docker compose file?

@lauri-codes
Copy link
Contributor

If I understood correclty, @behnle already tested that with keycloak 24.0.5 everything worked fine in combination with nomad 1.3.3. So I would assume that something happened in the transition from 24.0.5 to 25.0.2.

It might be worthwhile to check if v24.0.2 works, and then check the keycloak changelogs. Maybe 25.0.2 needs us to update our JS keycloak version (keycloak-js in package.json)?

@behnle
Copy link
Author

behnle commented Jul 26, 2024

@blueraft I didn't try NOMAD 1.2.1 yet, that's for sure worth a try. Just have to figure how to pull it from your registry.
@lauri-codes Yes and no (NOMAD continued to work in my production environment after updating KC to 24.0.5), but i just managed to get the MWE up and running. With the following keycloak setting and dropping KC healthcheck in other containers, it seems to work:

  # keycloak user management
  keycloak:
    restart: unless-stopped
    #image: quay.io/keycloak/keycloak:16.1.1
    image: quay.io/keycloak/keycloak:24.0.5
    container_name: nomad_oasis_keycloak
    environment:
      - TZ=Europe/Berlin
      - PROXY_ADDRESS_FORWARDING=true
      - KEYCLOAK_ADMIN=admin
      - KEYCLOAK_ADMIN_PASSWORD=password
      - KEYCLOAK_USER=admin
      - KEYCLOAK_PASSWORD=password
      #      - KEYCLOAK_FRONTEND_URL=http:https://localhost/keycloak/auth
      - KC_HOSTNAME_STRICT=false
      - KC_HTTP_ENABLED=true
      - KC_HTTP_PORT=8080
      - KC_PROXY=edge
      #- KC_LOG_LEVEL=DEBUG
      #      - KC_HOSTNAME=http:https://localhost/keycloak/
      - KC_HOSTNAME_URL=http:https://localhost/keycloak/
      #- KEYCLOAK_IMPORT=/opt/keycloak/data/import/nomad-realm.json -Dkeycloak.profile.feature.upload_scripts=enabled"
      - KEYCLOAK_EXTRA_ARGS_PREPENDED="--proxy-headers xforwarded --hostname-debug=true --http-enabled true --health-enabled=true --verbose"
      #- KEYCLOAK_EXTRA_ARGS="--import-realm --verbose"
    command: start-dev --import-realm
      #- "-Dkeycloak.import=/opt/keycloak/data/import -Dkeycloak.migration.strategy=IGNORE_EXISTING"
      #      start-dev --import-realm
    volumes:
      - keycloak:/opt/keycloak/data
      - ./configs/keycloak-import/:/opt/keycloak/data/import:ro
      # healthcheck:
      #   #test:
      #   test: ["CMD-SHELL", "exec 3<>/dev/tcp/127.0.0.1/9000;echo -e 'GET /health/ready HTTP/1.1\r\nhost: http:https://localhost\r\nConnection: close\r\n\r\n' >&3;if [ $? -eq 0 ]; then echo 'Healthcheck Successful';exit 0;else echo 'Healthcheck Failed';exit 1;fi;"]
      #   #  - "CMD"
      #   #  - "curl"
      #   #  - "--fail"
      #   #  - "--silent"
      #   #  - "http:https://127.0.0.1:9990/health/live"
      #   #  - "http:https://keycloak:9000/health/live"
      #interval: 10s
      #timeout: 10s
      #retries: 30
      #start_period: 30s

i.e. i am able to perform SSO login.
I did not dive that deep into the NOMAD code, but if you use a js keycloak package and not just a generic OIDC library, it might well require an update. Unfortunately, KC developers tend to break compatibility on a daily basis.

@blueraft
Copy link
Contributor

Just have to figure how to pull it from your registry.

In the docker compose file, this would be for the app and the worker:

    image: gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-fair:v1.2.1

i.e. i am able to perform SSO login.

Good to know this works, probably just some breaking change from 24 to 25 then.

@behnle
Copy link
Author

behnle commented Jul 31, 2024

Just checked the 1.3.4 image mentioned in #107 (comment), unfortunately, the problem still seems to persist with exactly the same symptoms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants