-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid callback calls while lock is held. #862
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mattmoor The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
8ecdcdd
to
56eb946
Compare
tracker/enqueue_test.go
Outdated
@@ -70,7 +71,7 @@ func TestHappyPathsExact(t *testing.T) { | |||
// Not tracked yet | |||
{ | |||
trk.OnChanged(thing1) | |||
if got, want := calls, 0; got != want { | |||
if got, want := int(calls), 0; got != want { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can have want=int32(0) and increment it along the way, this avoiding the ugly casts in every check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying a different approach now.
tracker/enqueue.go
Outdated
@@ -158,12 +158,13 @@ func (i *impl) TrackReference(ref Reference, obj interface{}) error { | |||
// The simplest way of eliminating such a window is to call the | |||
// callback to "catch up" immediately following new | |||
// registrations. | |||
i.cb(key) | |||
defer i.cb(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to have a "locked" get func that reruns nil or cb. Then the code here can be simplified and the unlocks won't be spread around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, first of all, @vagababov do you agree that we should not be holding the lock while making callbacks. If I'm just making up dumb things, then this change is moot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general approach, I do agree with that, since god knows what the callback might do.
What I meant here is that the unlock and lock are now far from each other. I'd prefer we have a "locked" function that returns the callback and then the code is
lock
defer getCB()
unlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I would expect that only the invocation (and not the read of i.cb
) happens outside of the lock, but that is perhaps incorrect...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that this is why race
is freaking out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at the race errors, let me check.
But in general the further the unlock is from lock, the bigger the chance some path will forget to unlock it, or some return/panic will escape unlocking it.
Fun story: on one of my GCP projects we were seeing runaway latencies on some instances.
Turned out that we were using LRU map whose Get
was not readlock safe (which is kind of reasonable, given you have to update the last used timestamp).
And we were using lock() x unlock()
pattern, where x would include map read. Which would sometimes panic and keep the mutex locked and the whole system would then grind to a halt.
Here you have 71 lines of separation between lock and unlock 133-204 (well each path is shorter, but I'll exaggerate) or 227-274, each of the code lines in theory can panic.
So I'd make the code along the lines of:
func acquireCB() func(namespacedname){
lock()
defer unlock()
if selector == nil {...return x}
...
return y
}
func (i *impl) OnChanged(obj interface{}) {
...
cb := acquireCB(...)
cb()
}
Same for track reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rewritten this as something much simpler.
56eb946
to
e222640
Compare
@mattmoor should we try to get this in before the next release? |
😅 Well the pkg release branch is already cut, so too late 😬 |
/test pull-knative-pkg-unit-tests |
Ping on this, still want this to land? |
e222640
to
56a5c5e
Compare
for _, key := range keys { | ||
cb(key) | ||
} | ||
}(i.cb) // read i.cb with the lock held | ||
defer i.m.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you're still defering, move next to the lock
same below
/lgtm |
This came up here: #861 (comment)
/assign @vaikas