Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] SnapshotStressTestsIT testRandomActivities failing #109792

Closed
ywangd opened this issue Jun 17, 2024 · 2 comments · Fixed by #110083
Closed

[CI] SnapshotStressTestsIT testRandomActivities failing #109792

ywangd opened this issue Jun 17, 2024 · 2 comments · Fixed by #110083
Assignees
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs low-risk An open issue or test failure that is a low risk to future releases Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI

Comments

@ywangd
Copy link
Member

ywangd commented Jun 17, 2024

Build scan:
https://gradle-enterprise.elastic.co/s/w3j3jcaljzhnc/tests/:server:internalClusterTest/org.elasticsearch.snapshots.SnapshotStressTestsIT/testRandomActivities

Reproduction line:

./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.snapshots.SnapshotStressTestsIT.testRandomActivities" -Dtests.seed=1226FEEF453AB964 -Dtests.locale=de-LU -Dtests.timezone=Antarctica/Vostok -Druntime.java=22

Applicable branches:
main

Reproduces locally?:
Didn't try

Failure history:
Failure dashboard for org.elasticsearch.snapshots.SnapshotStressTestsIT#testRandomActivities

Failure excerpt:

java.lang.Exception: Test abandoned because suite timeout was reached.

  at __randomizedtesting.SeedInfo.seed([1226FEEF453AB964]:0)

@ywangd ywangd added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Jun 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added Team:Distributed Meta label for distributed team needs:risk Requires assignment of a risk label (low, medium, blocker) labels Jun 17, 2024
@ywangd
Copy link
Member Author

ywangd commented Jun 17, 2024

The actual error is the following. It is a false alarm because the two generic threadpool are different. One is on the node and the other is the test itself. But thread name check does not include the node name part. A potential fix is to use a plain Thread object on this line

.

Jun 17, 2024 12:02:23 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
WARNUNG: Uncaught exception in thread: Thread[#4155,elasticsearch[node_s4][generic][T#4],5,TGRP-SnapshotStressTestsIT]
java.lang.AssertionError: cannot complete future on thread Thread[#4155,elasticsearch[node_s4][generic][T#4],5,TGRP-SnapshotStressTestsIT] with waiter on thread Thread[#4233,elasticsearch[TrackedCluster][generic][T#1],5,TGRP-SnapshotStressTestsIT], could deadlock if pool was full
	at java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:221)
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1099)
	at org.elasticsearch.action.support.PlainActionFuture$Sync.get(PlainActionFuture.java:278)
	at org.elasticsearch.action.support.PlainActionFuture.get(PlainActionFuture.java:96)
	at org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:45)
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$OngoingRecoveries.awaitEmpty(PeerRecoverySourceService.java:306)
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.doStop(PeerRecoverySourceService.java:112)
	at org.elasticsearch.common.component.AbstractLifecycleComponent.stop(AbstractLifecycleComponent.java:80)
	at org.elasticsearch.common.component.AbstractLifecycleComponent.close(AbstractLifecycleComponent.java:99)
	at org.elasticsearch.core.IOUtils.close(IOUtils.java:71)
	at org.elasticsearch.core.IOUtils.close(IOUtils.java:119)
	at org.elasticsearch.node.Node.close(Node.java:579)
	at org.elasticsearch.test.InternalTestCluster$NodeAndClient.close(InternalTestCluster.java:1079)
	at org.elasticsearch.test.InternalTestCluster$NodeAndClient.closeForRestart(InternalTestCluster.java:1023)
	at org.elasticsearch.test.InternalTestCluster.restartNode(InternalTestCluster.java:1901)
	at org.elasticsearch.test.InternalTestCluster.restartNode(InternalTestCluster.java:1868)
	at org.elasticsearch.test.InternalTestCluster.restartNode(InternalTestCluster.java:1858)
	at org.elasticsearch.snapshots.SnapshotStressTestsIT$TrackedCluster.lambda$startNodeRestarter$40(SnapshotStressTestsIT.java:1160)
	at org.elasticsearch.snapshots.SnapshotStressTestsIT$1.doRun(SnapshotStressTestsIT.java:159)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1570)
	at __randomizedtesting.SeedInfo.seed([1226FEEF453AB964]:0)
	at org.elasticsearch.action.support.PlainActionFuture.assertCompleteAllowed(PlainActionFuture.java:416)
	at org.elasticsearch.action.support.PlainActionFuture.set(PlainActionFuture.java:137)
	at org.elasticsearch.action.support.PlainActionFuture.onResponse(PlainActionFuture.java:37)
	at org.elasticsearch.action.ActionListener.onResponse(ActionListener.java:276)
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$OngoingRecoveries.remove(PeerRecoverySourceService.java:271)
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.lambda$recoverWithFreshClusterState$2(PeerRecoverySourceService.java:184)
	at org.elasticsearch.action.ActionListenerImplementations$RunAfterActionListener.onFailure(ActionListenerImplementations.java:280)
	at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62)
	at org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:73)
	at org.elasticsearch.action.ActionListener$3.onFailure(ActionListener.java:402)
	at org.elasticsearch.action.support.SubscribableListener$FailureResult.complete(SubscribableListener.java:394)
	at org.elasticsearch.action.support.SubscribableListener.tryComplete(SubscribableListener.java:306)
	at org.elasticsearch.action.support.SubscribableListener.setResult(SubscribableListener.java:331)
	at org.elasticsearch.action.support.SubscribableListener.onFailure(SubscribableListener.java:250)
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.lambda$recoverToTarget$3(RecoverySourceHandler.java:177)
	at org.elasticsearch.core.IOUtils.close(IOUtils.java:71)
	at org.elasticsearch.core.IOUtils.closeWhileHandlingException(IOUtils.java:169)
	at org.elasticsearch.core.IOUtils.closeWhileHandlingException(IOUtils.java:146)
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.lambda$recoverToTarget$4(RecoverySourceHandler.java:177)
	at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62)
	at org.elasticsearch.action.ActionListener$2.onFailure(ActionListener.java:257)
	at org.elasticsearch.action.support.SubscribableListener$FailureResult.complete(SubscribableListener.java:394)
	at org.elasticsearch.action.support.SubscribableListener.tryComplete(SubscribableListener.java:306)
	at org.elasticsearch.action.support.SubscribableListener.setResult(SubscribableListener.java:331)
	at org.elasticsearch.action.support.SubscribableListener.onFailure(SubscribableListener.java:250)
	at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62)
	at org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:73)
	at org.elasticsearch.action.DelegatingActionListener.onFailure(DelegatingActionListener.java:31)
	at org.elasticsearch.action.support.SubscribableListener$FailureResult.complete(SubscribableListener.java:394)
	at org.elasticsearch.action.support.SubscribableListener.tryComplete(SubscribableListener.java:306)
	at org.elasticsearch.action.support.SubscribableListener.setResult(SubscribableListener.java:331)
	at org.elasticsearch.action.support.SubscribableListener.onFailure(SubscribableListener.java:250)
	at org.elasticsearch.indices.recovery.MultiChunkTransfer.onCompleted(MultiChunkTransfer.java:146)
	at org.elasticsearch.indices.recovery.MultiChunkTransfer.handleItems(MultiChunkTransfer.java:129)
	at org.elasticsearch.indices.recovery.MultiChunkTransfer$1.write(MultiChunkTransfer.java:72)
	at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.processList(AsyncIOProcessor.java:97)
	at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.drainAndProcessAndRelease(AsyncIOProcessor.java:85)
	at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.put(AsyncIOProcessor.java:73)
	at org.elasticsearch.indices.recovery.MultiChunkTransfer.addItem(MultiChunkTransfer.java:83)
	at org.elasticsearch.indices.recovery.MultiChunkTransfer.lambda$handleItems$5(MultiChunkTransfer.java:120)
	at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62)
	at org.elasticsearch.action.ActionListener$2.onFailure(ActionListener.java:257)
	at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62)
	at org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:73)
	at org.elasticsearch.action.DelegatingActionListener.onFailure(DelegatingActionListener.java:31)
	at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62)
	at org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:73)
	at org.elasticsearch.action.DelegatingActionListener.onFailure(DelegatingActionListener.java:31)
	at org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener.onFailure(ActionListenerImplementations.java:317)
	at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62)
	at org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:73)
	at org.elasticsearch.action.ActionListener$3.onFailure(ActionListener.java:402)
	at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62)
	at org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:73)
	at org.elasticsearch.action.ActionListener$3.onFailure(ActionListener.java:402)
	at org.elasticsearch.action.support.RetryableAction.cancel(RetryableAction.java:99)
	at org.elasticsearch.indices.recovery.RemoteRecoveryTargetHandler.lambda$cancel$16(RemoteRecoveryTargetHandler.java:321)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1570)

@ywangd ywangd added low-risk An open issue or test failure that is a low risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Jun 17, 2024
@nicktindall nicktindall self-assigned this Jun 24, 2024
nicktindall added a commit to nicktindall/elasticsearch that referenced this issue Jun 24, 2024
nicktindall added a commit to nicktindall/elasticsearch that referenced this issue Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs low-risk An open issue or test failure that is a low risk to future releases Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI
Projects
None yet
3 participants