Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add defensive code for dealing with loadbalancer concurrency tracking #4951

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

tysonnorris
Copy link
Contributor

Defensive code for concurrency tracking in sharding loadbalancer.

Description

We saw some cases where controller ack processing was incomplete for some activations, which turned out to be:

  • cluster state change would reset the concurrency tracking (for concurrent activations, -c > 1)
  • the concurrency tracking logic did not tolerate the resets, which happen several times for each cluster state change
  • the ack message processing would fail silently

This PR does 2 things:

  • add defensive code to the concurrency tracking logic (to tolerate these state resets during cluster state changes)
  • add extra logging to message processing to increase visibility of failed message handling

This was not caught in tests due to the multi-controller cluster state changes, plus concurrency, required to reproduce the symptom.

Related issue and scope

  • I opened an issue to propose and discuss this change (#????)

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

{
case Failure(e) =>
logging.error(this, s"Failed to process message for topic $topic : $e (stack trace included)")
e.printStackTrace()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

printing the stack this way excludes it from the logger formatting - is that intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not intentional - can you refer me to an example of better formatting for stack trace?
Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it already part of the log line above anyway? I guess the broader question is: Why is the stack trace even needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it is, and I forgot to remove the e.printStackTrace() when I changed the log line to include it... 😔 fixing that...

Copy link
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one, direct map access is probably always bad 😅

{
case Failure(e) =>
logging.error(this, s"Failed to process message for topic $topic : $e (stack trace included)")
e.printStackTrace()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it already part of the log line above anyway? I guess the broader question is: Why is the stack trace even needed here?

@@ -213,7 +214,13 @@ class MessageFeed(description: String,
outstandingMessages = outstandingMessages.tail

if (logHandoff) logging.debug(this, s"processing $topic[$partition][$offset] ($occupancy/$handlerCapacity)")
handler(bytes)
handler(bytes).andThen {
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the inner braces needed?

@style95 style95 added the stale old issue which needs to validate label May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale old issue which needs to validate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants