Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid callback calls while lock is held. #862

Merged
merged 1 commit into from
Feb 13, 2020

Conversation

mattmoor
Copy link
Member

This came up here: #861 (comment)

/assign @vaikas

@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Nov 11, 2019
@knative-prow-robot knative-prow-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Nov 11, 2019
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mattmoor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 11, 2019
@knative-prow-robot knative-prow-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 11, 2019
@@ -70,7 +71,7 @@ func TestHappyPathsExact(t *testing.T) {
// Not tracked yet
{
trk.OnChanged(thing1)
if got, want := calls, 0; got != want {
if got, want := int(calls), 0; got != want {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have want=int32(0) and increment it along the way, this avoiding the ugly casts in every check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying a different approach now.

@@ -158,12 +158,13 @@ func (i *impl) TrackReference(ref Reference, obj interface{}) error {
// The simplest way of eliminating such a window is to call the
// callback to "catch up" immediately following new
// registrations.
i.cb(key)
defer i.cb(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to have a "locked" get func that reruns nil or cb. Then the code here can be simplified and the unlocks won't be spread around

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, first of all, @vagababov do you agree that we should not be holding the lock while making callbacks. If I'm just making up dumb things, then this change is moot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general approach, I do agree with that, since god knows what the callback might do.
What I meant here is that the unlock and lock are now far from each other. I'd prefer we have a "locked" function that returns the callback and then the code is

lock
defer getCB()
unlock

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I would expect that only the invocation (and not the read of i.cb) happens outside of the lock, but that is perhaps incorrect...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that this is why race is freaking out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the race errors, let me check.
But in general the further the unlock is from lock, the bigger the chance some path will forget to unlock it, or some return/panic will escape unlocking it.

Fun story: on one of my GCP projects we were seeing runaway latencies on some instances.
Turned out that we were using LRU map whose Get was not readlock safe (which is kind of reasonable, given you have to update the last used timestamp).
And we were using lock() x unlock() pattern, where x would include map read. Which would sometimes panic and keep the mutex locked and the whole system would then grind to a halt.

Here you have 71 lines of separation between lock and unlock 133-204 (well each path is shorter, but I'll exaggerate) or 227-274, each of the code lines in theory can panic.
So I'd make the code along the lines of:

func acquireCB() func(namespacedname){
  lock()
  defer unlock()
  if selector == nil {...return x}
  ...
 return y
}

func (i *impl) OnChanged(obj interface{}) {
  ...
  cb := acquireCB(...)
  cb()
}

Same for track reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rewritten this as something much simpler.

@vaikas
Copy link
Contributor

vaikas commented Dec 4, 2019

@mattmoor should we try to get this in before the next release?

@mattmoor
Copy link
Member Author

mattmoor commented Dec 4, 2019

😅 Well the pkg release branch is already cut, so too late 😬

@vaikas
Copy link
Contributor

vaikas commented Dec 13, 2019

/test pull-knative-pkg-unit-tests

@n3wscott
Copy link
Contributor

n3wscott commented Feb 3, 2020

Ping on this, still want this to land?

@evankanderson evankanderson removed their request for review February 6, 2020 19:27
@knative-prow-robot knative-prow-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 13, 2020
for _, key := range keys {
cb(key)
}
}(i.cb) // read i.cb with the lock held
defer i.m.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you're still defering, move next to the lock
same below

@vagababov
Copy link
Contributor

/lgtm
😬

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 13, 2020
@knative-prow-robot knative-prow-robot merged commit df06299 into knative:master Feb 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants