Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keycloak Randomly Crashing in Docker Swarm with SocketTimeoutException and Blocked Threads in Infinispan Operations #30585

Closed
2 tasks done
zadm opened this issue Jun 19, 2024 · 2 comments

Comments

@zadm
Copy link

zadm commented Jun 19, 2024

Before reporting an issue

  • I have read and understood the above terms for submitting issues, and I understand that my issue may be closed without action if I do not follow them.

Area

infinispan

Describe the bug

Keycloak is randomly crashing in my Docker Swarm setup. The crash is accompanied by stack traces indicating SocketTimeoutException and blocked threads in the Infinispan cache operations.

Version

17.0.1

Regression

  • The issue is a regression

Expected behavior

Keycloak should handle cache operations without crashing due to timeouts or blocked threads.

Actual behavior

Keycloak crashes randomly with the following stack traces:


2024-06-19 12:00:51,712 WARN  [org.infinispan.HOTROD] (Thread-0) ISPN004098: Closing connection [id: 0xc10ae209, L:/10.0.3.139:39214 - R:10.0.3.110/10.0.3.110:11222] due to transport error: java.net.SocketTimeoutException: ReplaceIfUnmodifiedOperation{offlineSessions, key=[B0x033E2466653061613936352D65376465..[39], value=[B0x03040B000000446F72672E6B6579636C..[1141], flags=0, connection=10.0.3.110/10.0.3.110:11222} timed out after 60000 ms
    at org.infinispan.client.hotrod.impl.operations.HotRodOperation.run(HotRodOperation.java:182)
    at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
    at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)

2024-06-19 12:00:55,227 WARN  [io.vertx.core.impl.BlockedThreadChecker] (vertx-blocked-thread-checker) Thread Thread[vert.x-eventloop-thread-5,5,main] has been blocked for 3548 ms, time limit is 2000 ms: io.vertx.core.VertxException: Thread blocked
    at io.vertx.core.net.impl.ConnectionBase.lambda$handleException$4(ConnectionBase.java:357)
    at io.vertx.core.net.impl.ConnectionBase$$Lambda$1733/0x0000000841090840.handle(Unknown Source)
    at io.vertx.core.impl.EventLoopContext.emit(EventLoopContext.java:50)
    at io.vertx.core.impl.ContextImpl.emit(ContextImpl.java:274)
    at io.vertx.core.impl.EventLoopContext.emit(EventLoopContext.java:22)
    at io.vertx.core.net.impl.ConnectionBase.handleException(ConnectionBase.java:354)
    at io.vertx.core.http.impl.Http1xServerConnection.handleException(Http1xServerConnection.java:466)
    at io.vertx.core.net.impl.VertxHandler.exceptionCaught(VertxHandler.java:136)
    at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
    at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
    at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.exceptionCaught(DefaultChannelPipeline.java:1377)
    at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
    at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
    at io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:907)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:125)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:177)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:829)

How to Reproduce?

Deploy Keycloak in a Docker Swarm setup.
Configure Infinispan for caching (default configuration).
Monitor Keycloak operations until it crashes with the stack trace above.

Configuration

cache-ispn-remote.xml


<?xml version="1.0" encoding="UTF-8"?>
<!--
  ~ Copyright 2019 Red Hat, Inc. and/or its affiliates
  ~ and other contributors as indicated by the @author tags.
  ~
  ~ Licensed under the Apache License, Version 2.0 (the "License");
  ~ you may not use this file except in compliance with the License.
  ~ You may obtain a copy of the License at
  ~
  ~ http:https://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing, software
  ~ distributed under the License is distributed on an "AS IS" BASIS,
  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  ~ See the License for the specific language governing permissions and
  ~ limitations under the License.
  -->

<infinispan
        xmlns:xsi="http:https://www.w3.org/2001/XMLSchema-instance"
        xmlns="urn:infinispan:config:13.0"
        xsi:schemaLocation="urn:infinispan:config:13.0 https://infinispan.org/schemas/infinispan-config-13.0.xsd urn:infinispan:server:13.0 https://infinispan.org/schemas/infinispan-server-13.0.xsd ">

    <cache-container name="keycloak" statistics="true">
        <transport lock-timeout="60000"
                cluster="${infinispan.cluster.name:keycloak-cluster}"
                node-name="${infinispan.node.name:}"/>

        <local-cache name="realms">
            <encoding>
                <key media-type="application/x-java-object"/>
                <value media-type="application/x-java-object"/>
            </encoding>
            <memory max-count="10000"/>
        </local-cache>

        <local-cache name="users">
            <encoding>
                <key media-type="application/x-java-object"/>
                <value media-type="application/x-java-object"/>
            </encoding>
            <memory max-count="10000"/>
        </local-cache>

        <local-cache name="keys">
            <encoding>
                <key media-type="application/x-java-object"/>
                <value media-type="application/x-java-object"/>
            </encoding>
            <expiration max-idle="3600000"/>
            <memory max-count="1000"/>
        </local-cache>

        <local-cache name="authorization">
            <encoding>
                <key media-type="application/x-java-object"/>
                <value media-type="application/x-java-object"/>
            </encoding>
            <memory max-count="10000"/>
        </local-cache>

        <distributed-cache name="sessions" owners="2">
            <expiration lifespan="-1"/>
            <remote-store cache="sessions" xmlns="urn:infinispan:config:store:remote:13.0"
                          purge="false"
                          preload="false"
                          shared="true" segmented="false"
                          connect-timeout="${env.KEYCLOAK_REMOTE_ISPN_CONN_TIMEOUT:2000}">
                <remote-server host="ispn" port="${infinispan.bind.port:11222}"/>

                <property name="rawValues">true</property>
                <property name="marshaller">org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory</property>
            </remote-store>
        </distributed-cache>

        <distributed-cache name="authenticationSessions" owners="2">
            <expiration lifespan="-1"/>
        </distributed-cache>

        <distributed-cache name="offlineSessions" owners="2">
            <expiration lifespan="-1"/>
            <remote-store cache="offlineSessions" xmlns="urn:infinispan:config:store:remote:13.0"
                          purge="false"
                          preload="false"
                          shared="true" segmented="false"
                          connect-timeout="${env.KEYCLOAK_REMOTE_ISPN_CONN_TIMEOUT:2000}">
                <remote-server host="ispn" port="${infinispan.bind.port:11222}"/>

                <property name="rawValues">true</property>
                <property name="marshaller">org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory</property>
            </remote-store>
        </distributed-cache>

        <distributed-cache name="clientSessions" owners="2">
            <expiration lifespan="-1"/>
            <remote-store cache="clientSessions" xmlns="urn:infinispan:config:store:remote:13.0"
                          purge="false"
                          preload="false"
                          shared="true" segmented="false"
                          connect-timeout="${env.KEYCLOAK_REMOTE_ISPN_CONN_TIMEOUT:2000}">
                <remote-server host="ispn" port="${infinispan.bind.port:11222}"/>

                <property name="rawValues">true</property>
                <property name="marshaller">org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory</property>
            </remote-store>
        </distributed-cache>

        <distributed-cache name="offlineClientSessions" owners="2">
            <expiration lifespan="-1"/>
            <remote-store cache="offlineClientSessions" xmlns="urn:infinispan:config:store:remote:13.0"
                          purge="false"
                          preload="false"
                          shared="true" segmented="false"
                          connect-timeout="${env.KEYCLOAK_REMOTE_ISPN_CONN_TIMEOUT:2000}">
                <remote-server host="ispn" port="${infinispan.bind.port:11222}"/>

                <property name="rawValues">true</property>
                <property name="marshaller">org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory</property>
            </remote-store>
        </distributed-cache>

        <distributed-cache name="loginFailures" owners="2">
            <expiration lifespan="-1"/>
            <remote-store cache="loginFailures" xmlns="urn:infinispan:config:store:remote:13.0"
                          purge="false"
                          preload="false"
                          shared="true" segmented="false"
                          connect-timeout="${env.KEYCLOAK_REMOTE_ISPN_CONN_TIMEOUT:2000}">
                <remote-server host="ispn" port="${infinispan.bind.port:11222}"/>

                <property name="rawValues">true</property>
                <property name="marshaller">org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory</property>
            </remote-store>
        </distributed-cache>

        <replicated-cache name="work">
            <expiration lifespan="-1"/>
            <remote-store cache="work" xmlns="urn:infinispan:config:store:remote:13.0"
                          purge="false"
                          preload="false"
                          shared="true" segmented="false"
                          connect-timeout="${env.KEYCLOAK_REMOTE_ISPN_CONN_TIMEOUT:2000}">
                <remote-server host="ispn" port="${infinispan.bind.port:11222}"/>

                <property name="rawValues">true</property>
                <property name="marshaller">org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory</property>
            </remote-store>
        </replicated-cache>

        <distributed-cache name="actionTokens" owners="2">
            <expiration max-idle="-1" lifespan="-1" interval="300000"/>
            <memory max-count="-1"/>
            <remote-store cache="actionTokens" xmlns="urn:infinispan:config:store:remote:13.0"
                          purge="false"
                          preload="false"
                          shared="true" segmented="false"
                          connect-timeout="${env.KEYCLOAK_REMOTE_ISPN_CONN_TIMEOUT:2000}">
                <remote-server host="ispn" port="${infinispan.bind.port:11222}"/>

                <property name="rawValues">true</property>
                <property name="marshaller">org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory</property>
            </remote-store>
        </distributed-cache>
    </cache-container>
</infinispan>

DockerFile


ARG VENDOR_VERSION=17.0.1
FROM quay.io/keycloak/keycloak:${VENDOR_VERSION} as builder

ENV KC_DB=postgres
ENV KC_HTTP_RELATIVE_PATH=/auth
ENV KC_METRICS_ENABLED=true
ENV KC_CACHE_CONFIG_FILE=cache-ispn.xml
ENV KC_HEALTH_ENABLED=true
ENV KC_FEATURES=token-exchange,scripts

COPY --chown=keycloak:keycloak ./custom/cache-ispn-remote.xml /opt/keycloak/conf/cache-ispn.xml

RUN /opt/keycloak/bin/kc.sh build

FROM quay.io/keycloak/keycloak:${VENDOR_VERSION}

COPY --from=builder /opt/keycloak/lib/quarkus/ /opt/keycloak/lib/quarkus/
WORKDIR /opt/keycloak

ARG VENDOR_VERSION=17.0.1
ARG GIT_REPO=keycloak/keycloak
ARG KEYCLOAK_DIST=https://github.com/keycloak/keycloak/releases/download/${VENDOR_VERSION}/keycloak-${VENDOR_VERSION}.tar.gz

USER root

RUN microdnf update -y && microdnf install -y glibc-langpack-en gzip hostname java-11-openjdk-headless openssl tar which git jq && microdnf clean all

ADD https://repo1.maven.org/maven2/co/elastic/apm/elastic-apm-agent/1.32.0/elastic-apm-agent-1.32.0.jar /usr/agent/
RUN chmod 644 /usr/agent/elastic-apm-agent-1.32.0.jar

COPY custom/psr-keycloak.sh /opt/psr-keycloak.sh
COPY themes /opt/keycloak/themes

RUN /opt/psr-keycloak.sh \
        && chown -R keycloak /opt/keycloak \
        && rm /opt/psr-keycloak.sh

USER keycloak

ENTRYPOINT [ "/opt/keycloak/bin/kc.sh" ]
CMD [ "start" ]

HEALTHCHECK --start-period=30s --interval=30s --timeout=3s --retries=5 \
            CMD curl --silent --fail --request GET http:https://localhost:8080/auth/health \
                | jq --exit-status '.status == "UP"' || exit 1

Anything else?

No response

@keycloak-github-bot
Copy link

Thanks for reporting this issue, but as this is reported against an older and unsupported release we are not able to evaluate the issue. Please verify with the nightly buildor the latest release.

If the issue can be reproduced in the nightly build or latest release add a comment with additional information, otherwise this issue will be automatically closed within 14 days.

@keycloak-github-bot
Copy link

Due to lack of updates in the last 14 days this issue will be automatically closed.

@keycloak-github-bot keycloak-github-bot bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants