Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate flaky tests after 1.3.11 release and re-activate #274

Open
simonbasle opened this issue Mar 11, 2022 · 0 comments
Open

Investigate flaky tests after 1.3.11 release and re-activate #274

simonbasle opened this issue Mar 11, 2022 · 0 comments
Assignees
Labels
status/need-investigation This needs more in-depth investigation type/bug A general bug type/chore A task not related to code (build, formatting, process, ...)

Comments

@simonbasle
Copy link
Member

After #268 was merged, main ci started to fail with seemingly flaky tests.
These have been marked as @Ignore to allow for a release on tuesday, pending an investigation and fix of the root cause (hopefully, the tests themselves).

The tests are:

reactor.kafka.receiver.KafkaReceiverTest > transactionalOffsetCommit FAILED
    org.apache.kafka.common.KafkaException: TransactionalId transactionalOffsetCommit: Invalid transition attempted from state READY to state ABORTING_TRANSACTION

(see https://github.com/reactor/reactor-kafka/runs/5513606287?check_suite_focus=true)

Stack trace
        at org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:1078)
        at org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:1071)
        at org.apache.kafka.clients.producer.internals.TransactionManager.lambda$beginAbort$3(TransactionManager.java:372)
        at org.apache.kafka.clients.producer.internals.TransactionManager.handleCachedTransactionRequestResult(TransactionManager.java:1200)
        at org.apache.kafka.clients.producer.internals.TransactionManager.beginAbort(TransactionManager.java:369)
        at org.apache.kafka.clients.producer.KafkaProducer.abortTransaction(KafkaProducer.java:762)
        at reactor.kafka.sender.internals.DefaultTransactionManager.lambda$null$8(DefaultTransactionManager.java:81)
        at reactor.core.publisher.MonoRunnable.call(MonoRunnable.java:73)
        at reactor.core.publisher.MonoRunnable.call(MonoRunnable.java:32)
        at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:139)
        at reactor.core.publisher.MonoPublishOn$PublishOnSubscriber.run(MonoPublishOn.java:181)
        at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
        at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
 reactor.kafka.receiver.KafkaReceiverTest > autoCommitFailurePropagationAfterRetries FAILED
    java.lang.AssertionError: expected:<[8, 12]> but was:<[8, 0, 4, 8, 12]>

(see https://github.com/reactor/reactor-kafka/runs/5513811483?check_suite_focus=true)

Stack trace
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:120)
        at org.junit.Assert.assertEquals(Assert.java:146)
        at reactor.kafka.AbstractKafkaTest.checkConsumedMessages(AbstractKafkaTest.java:169)
        at reactor.kafka.receiver.KafkaReceiverTest.sendReceiveWithRedelivery(KafkaReceiverTest.java:1430)
        at reactor.kafka.receiver.KafkaReceiverTest.autoCommitFailurePropagationAfterRetries(KafkaReceiverTest.java:685)
@simonbasle simonbasle added type/bug A general bug status/need-investigation This needs more in-depth investigation type/chore A task not related to code (build, formatting, process, ...) labels Mar 11, 2022
garyrussell added a commit to garyrussell/reactor-kafka that referenced this issue Mar 17, 2022
See reactor#274

This test had several problems:

The second part of the test was receiving from the original topic instead
of the destination topic so it always succeeeded in finding all the records.

When this was changed to consume from the proper topic, it always failed.

This was because the `.take(count)` in `receiveAndSendTransactions` caused
the flux to be canceled before the final commit took place.

Added a callback hook to the `TransactionManager` so we can test that the commit
is complete before terminating the flux.

It is not clear whether these fixes will resolve the original problem so I
have left diagnostics for future failure analysis.

Also capture test results in the publish action.
garyrussell added a commit to garyrussell/reactor-kafka that referenced this issue Mar 17, 2022
See reactor#274

This test had several problems:

The second part of the test was receiving from the original topic instead
of the destination topic so it always succeeeded in finding all the records.

When this was changed to consume from the proper topic, it always failed.

This was because the `.take(count)` in `receiveAndSendTransactions` caused
the flux to be canceled before the final commit took place.

Added a callback hook to the `TransactionManager` so we can test that the commit
is complete before terminating the flux.

It is not clear whether these fixes will resolve the original problem so I
have left diagnostics for future failure analysis.

Also capture test results in the publish action.
garyrussell added a commit that referenced this issue Jun 14, 2022
* KafkaReceiverTest.transactionalOffsetCommit Fixes

See #274

This test had several problems:

The second part of the test was receiving from the original topic instead
of the destination topic so it always succeeeded in finding all the records.

When this was changed to consume from the proper topic, it always failed.

This was because the `.take(count)` in `receiveAndSendTransactions` caused
the flux to be canceled before the final commit took place.

Added a callback hook to the `TransactionManager` so we can test that the commit
is complete before terminating the flux.

It is not clear whether these fixes will resolve the original problem so I
have left diagnostics for future failure analysis.

Also capture test results in the publish action.

* Cancel inner Flux from .thenMany() when complete.

* Fix autoCommitFailurePropagationAfterRetries

Clear any remaining crud in `receivedMessages`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/need-investigation This needs more in-depth investigation type/bug A general bug type/chore A task not related to code (build, formatting, process, ...)
Projects
None yet
Development

No branches or pull requests

2 participants