Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

major: #18 Batch support and engine event system improvements #19

Merged
merged 2 commits into from
Mar 14, 2022

Conversation

astubbs
Copy link
Contributor

@astubbs astubbs commented Nov 4, 2020

#18 - parallel batching - this PR changes the engine to always use lists of records, where non batch processing is the case where the batch size is one.

Related:

Base automatically changed from interface to master November 5, 2020 11:42
@astubbs astubbs added the enhancement New feature or request label Nov 5, 2020
@astubbs astubbs linked an issue Nov 5, 2020 that may be closed by this pull request
@astubbs astubbs force-pushed the master branch 6 times, most recently from 42029e9 to 8fe9766 Compare November 10, 2020 14:32
@ismarslomic
Copy link

I think documentation should clarify relation between the Consumer property max.poll.records and the argument batchLevel of pollbatch(). Should we set both or only the latter? If both, should they be set to same value?

In addition it would be useful if documentation clarifies how error handling and offset commits are handled in batch processing

@astubbs
Copy link
Contributor Author

astubbs commented Jan 11, 2021

I think documentation should clarify relation between the Consumer property max.poll.records and the argument batchLevel of pollbatch(). Should we set both or only the latter? If both, should they be set to same value?

good suggestion! Will do so - hold me to it.

In addition it would be useful if documentation clarifies how error handling and offset commits are handled in batch processing

what sort of errors are you thinking about?

Off the top of my head, by default, if there is an error processing a batch of messages, the same thing would happen as if for a single message - the messages will all be returned to the internal queues, and will be retried after the default delay expires.

@ismarslomic
Copy link

what sort of errors are you thinking about?

Off the top of my head, by default, if there is an error processing a batch of messages, the same thing would happen as if for a single message - the messages will all be returned to the internal queues, and will be retried after the default delay expires

I was not thinking on specific errors, but in general it would be useful to have a section in docs that states exactly what you are saying above, so users are aware of their responsibilities in this aspect!

New feature request - "Retry settings"
It would be nice to have some sort of flexibility to configure the built in retry mechanisms in pollBatch(), for example enableRetry, maxRetryAttempts and waitDuration. Then you can either configure it for your needs or choose to solve it yourself.

@astubbs
Copy link
Contributor Author

astubbs commented Jan 14, 2021

New feature request - "Retry settings"
It would be nice to have some sort of flexibility to configure the built in retry mechanisms in pollBatch(), for example enableRetry, maxRetryAttempts and waitDuration. Then you can either configure it for your needs or choose to solve it yourself.

Thanks again for your input! We should track this in a different feature, it's something I've been planning. I've created: #65 Enhanced retry epic. Let's continue the conversation there, and let me know if you have any further thoughts! I'll do something basic for the next version (enable/disable and then skip or die probably)...

@astubbs
Copy link
Contributor Author

astubbs commented Jan 14, 2021

@ismarslomic @JorgenRingen do you have any ordering (or relative ordering) expectations or requirements for the messages within the batches?

@astubbs
Copy link
Contributor Author

astubbs commented Jan 15, 2021

ok, implementation is ready for testing. I'm not entirely satisfied with the interface, but let me know what you think. I.e. we could make batching the only interface, and have the default batch size be == 1. But then the user function would always need to be a consumer which takes a list of records, which is kind of odd. Depends if you think batching would be the more common use case, or if single record processing would be..

@astubbs
Copy link
Contributor Author

astubbs commented Jan 21, 2021

Does anyone have time to test this branch out? If not that's ok, I'll release it as 3.1.

@astubbs astubbs force-pushed the master branch 2 times, most recently from bf382c4 to 97d09fc Compare March 30, 2021 08:52
Copy link
Contributor Author

@astubbs astubbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

step

Copy link
Contributor Author

@astubbs astubbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great! ;)

@astubbs
Copy link
Contributor Author

astubbs commented Feb 25, 2022

ok, there’s a couple test failures still - i think they need updating. But the batching feature seems feature complete now. Need to implement some test for the vertx and reactor versions, but should have this merged early next week.

@astubbs astubbs mentioned this pull request Feb 25, 2022
64 tasks
Copy link
Contributor Author

@astubbs astubbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor

@astubbs astubbs changed the title feature: #18 Add simple parallel batching support feature: #18 Add parallel batching support Mar 4, 2022
Copy link
Contributor Author

@astubbs astubbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting there... lots of improvements / refactoring in this to

@astubbs astubbs changed the title feature: #18 Add parallel batching support major: #18 Batch support and engine event system improvements Mar 9, 2022
@astubbs astubbs force-pushed the features/batching branch 3 times, most recently from 2fba344 to cc1167e Compare March 9, 2022 16:06
@astubbs astubbs mentioned this pull request Mar 10, 2022
README.adoc Show resolved Hide resolved
@astubbs
Copy link
Contributor Author

astubbs commented Mar 10, 2022

Target should be 100% now. Protect for negative delta.

@astubbs astubbs force-pushed the features/batching branch 2 times, most recently from 96eb5a0 to b7b631c Compare March 14, 2022 15:15
- test for observing dropping average batch size
- performance: process mailbox timeout calculation not needed as poll box is interrupted either with work results, or with new messages from broker poller
- improvements to event based system and LongPollingMockConsumer
- eventing and timing improvements and simplifications / refactorings, better batch test code reuse
- only sleep for retry delay if starved
- remove clean/dirty state cache - too complex and unnecessary - stay tuned for replacement
- change close ordering of operations to fix final commit execution
- teach ProgressTracker to use time
- batch size request - always request enough to fill all batches

Co-authored-by: Antony Stubbs <[email protected]>
Co-authored-by: Benedikt Linse <[email protected]>
@astubbs astubbs marked this pull request as ready for review March 14, 2022 15:45
@astubbs astubbs merged commit 360ebb2 into master Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for parallel processing of batches / bulk sets of messages
6 participants