You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've for some time had an issue that when temporary network errors occur, it seems like one or more partitions sometime simply stop processing. We've observed the following:
Partition X starts the commit procedure
Partition Y then also wants to commit, but a commit is already in progress so the offset is "enqueued"
The current commit, with X's offset, fails and will be retried by ~merging the offsets with any new offsets that were queued for commit whilst the attempt was made.
A new commit-attempt is made, with offsets for both X and Y and succeeds. The callback for X is executed, but not the one for Y. Typically, the flow that wanted to commit Y's offset is stuck waiting.
Expected Behavior
The Mono from ReceiverOffset.commit should complete after 4 above.
Actual Behavior
It never completes.
Possible Solution
In CommittableBatch.restoreOffsets, we are passed CommitArgs to restore. Offsets are "merged" with any new that have been added, whilst callbackEmitters are simply overwritten. A quick look suggests that merging the lists should solve the issue, but there might be other things I've missed of course.
The text was updated successfully, but these errors were encountered:
dforsl
changed the title
New commit-listeners overwritten in case of commit failure
New commit-listeners are lost in case of commit failure
Nov 12, 2019
We've for some time had an issue that when temporary network errors occur, it seems like one or more partitions sometime simply stop processing. We've observed the following:
Expected Behavior
The Mono from
ReceiverOffset.commit
should complete after 4 above.Actual Behavior
It never completes.
Possible Solution
In
CommittableBatch.restoreOffsets
, we are passedCommitArgs
to restore. Offsets are "merged" with any new that have been added, whilstcallbackEmitters
are simply overwritten. A quick look suggests that merging the lists should solve the issue, but there might be other things I've missed of course.The text was updated successfully, but these errors were encountered: