Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Circuit Breaker to stop processing records if HTTP endpoint errors cross a threshold #110

Closed
Tracked by #172
masaldaan opened this issue Apr 2, 2021 · 9 comments · Fixed by #198
Closed
Tracked by #172
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed medium

Comments

@masaldaan
Copy link

This is more of a question/suggestion in the vein of #82.
How can I stop/pause the processing/polling of records if the downstream system (e.g. the Slow GPS tracking system in the examples) starts returning 5xx errors?
I could add retries as suggested in #82 (which I really liked, kudos), but there might be a threshold, say 50 5xx errors in 5 secs, at which point, you'd like to pause processing any records.
You could then decide to restart after a pre-decided cooldown period, or have a switch that needs to be flipped manually after which you restart.

I could implement the circuit breaking myself, but I'm having trouble visualising how the pause/resume might work for parallel consumers.

I'd appreciate any pointers/help.

Many thanks!

@astubbs
Copy link
Contributor

astubbs commented Apr 2, 2021

Any records from a partition or topic? Or any records of a certain key or field?

We could implement a stage in the system upon ingestion of messages, and retrieval of work, to preemptively conditionally schedule the work- before waiting for the failure. And again when it’s scheduled it could check a switch - which could be derived from target system.

@masaldaan
Copy link
Author

That is a good question, I was thinking of any records from a partition for a particular consumer group (since in the scenario I was envisioning, a consumer group services a single downstream system)

@masaldaan
Copy link
Author

Conditional scheduling is actually how I'm handling it in vanilla & Spring-based Kafka consumers, but I am still reading up on offset management in parallel-consumers, so I did not want to offer up half-baked solutions.

@astubbs
Copy link
Contributor

astubbs commented Apr 2, 2021

Ah ok, pausing everything is a bit different, much easier and would would be more efficient. I’ve got a couple of ideas for both - I’ll push up a draft interface in a couple days - let me know what you think.

For everything - basically the controller just needs to stop taking work. The broker polled will pause things automatically and resume things again once the controller starts taking work again.

@astubbs astubbs added the medium label May 12, 2021
@astubbs astubbs added the help wanted Extra attention is needed label Jul 23, 2021
@masaldaan
Copy link
Author

Hi @astubbs, I'll gladly try & help out with this issue. It will take me some time to actually get up to speed with the internals though, I hope that's alright.

@astubbs astubbs added the enhancement New feature or request label Jul 27, 2021
@astubbs
Copy link
Contributor

astubbs commented Jul 27, 2021

FYI, the easiest way to do this, is to wrap you user functions, in a function which cheeks the return result of the user function - or tests something, to see what the target host name is. Check if the hostname is in a map of disabled hosts, and if so - fail the function immediately (throw any exception). This will cause the message to go back into the queue and eventually be retried (you can plug in a custom retry delay calculator here too). Effect being that the messages will just be retried forever, but you can skip message processing immediately and fail fast. No changes to the framework are required. Let me know your thoughts..

@astubbs
Copy link
Contributor

astubbs commented Feb 15, 2022

FYI here's documentation for how you'd do this, with an example: https://github.com/confluentinc/parallel-consumer/tree/master#circuit-breaker-pattern

@astubbs
Copy link
Contributor

astubbs commented Mar 3, 2022

@astubbs astubbs mentioned this issue May 16, 2022
64 tasks
@astubbs astubbs linked a pull request May 16, 2022 that will close this issue
5 tasks
@astubbs
Copy link
Contributor

astubbs commented May 16, 2022

Implemented in:

If #193 doesn't suffice, let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed medium
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants