Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jcontext #43

Merged
merged 34 commits into from
Oct 27, 2020
Merged

Jcontext #43

merged 34 commits into from
Oct 27, 2020

Conversation

thejerf
Copy link
Owner

@thejerf thejerf commented Sep 18, 2020

@AudriusButkevicius , @lthibault , this is the PR I'm tentatively planning on going with.

It has all the testing, the merged PRs from the other branches, documentation updates. It's late where I am and I want to look at this with some fresh eyes and possibly gloss over the documentation once more before I push it, but this is my release candidate.

Thank you very much for working with me so much. I've placed a special thanks section in the README.md in this PR thanking Syncthing for working with me so well. If you'd like to see that phrasing changed please let me know.

Copy link
Contributor

@lthibault lthibault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just flagged a few documentation issues, and a few usability tweaks.

Thank you very much for working with me so much.

It is truly my pleasure. I really appreciate being involved in this project, and this request for review has brightened my day!

P.S.: any chance you might consider adding a CONTRIBUTORS.md? I'd love to be able to show off my name in there. 😆

v4/service.go Outdated
to prevent any corruption in the previous state from crashing the
service again.

The Supervisor will pass a ctx down to the service. That context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to explain the intended use of the context. This only tells me what not to do.

v4/supervisor.go Outdated
ctxMutex sync.Mutex
ctx context.Context
// This function cancels this supervisor specifically.
myCancel func()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming to something more descriptive like cancel (this is recognizable as the context’s cancel func, due to convention) or ctxCancel.

v4/supervisor.go Outdated
// participate in the log function propagation and recursive
// UnstoppedService report.
//
// It is legal for GetSupervisor to potentiall return nil, in which case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: potentially

return nil
}

var currentSupervisorIDL sync.Mutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace with atomic.AddUint32 ?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I've never gotten a clear sense of exactly what contract the atomic functions provide, in particular, the read guarantees they provide, so I'm a bit nervous about putting this out in such a basic library.

(In particular, I'd really like to know for sure for non-suture-related code that an atomic pointer swap's result can be safely read without any atomic operations, but I can't find any documentation or person willing to guarantee that. I've been doing it anyhow in some code and it seems to work, and it seems like there can't be any other purpose for the operation since there is no AtomicRead, but I'd really like it to be stated somewhere.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it would be absolutely astonishing if the return value of CompareAndSwap, AddUint32, etc required synchronization. It would practically defeat the purpose of these functions.

there is no AtomicRead

There's the atomic.Load* family, but they seem to be the counterparty to atomic.Store*.

Personally, I feel comfortable taking that bet (in fact, I still can't bring myself to consider it a bet at all), but I also can't imagine the mutex presents a meaningful bottleneck, either. I guess there's no harm in playing it safe.

v4/supervisor.go Outdated
BadStopLogger func(*Supervisor, Service, string)

// FailureLogger is called when a service fails
FailureLogger func(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quite often find myself writing log handlers to use github.com/lthibault/log as a logging backend. It’s quite tedious given the large arity of these functions. Consider passing in a struct instead.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I've thought about it, the more sensible that seems to be.

Let's go ahead and do it.

v4/service.go Outdated
service again.

The Supervisor will pass a ctx down to the service. That context
can not be independently used to cancel the service, as the supervisor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't actually cancel a context that was passed in to you, as you don't have a reference to the cancel function, so I am not sure this explanation even makes sense.

I think the important thing to note here is that the context is used for indication of the lifetime of the service, namely when the done channel is closed, the service should terminate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read this as “if you derive a subcontext using WithCancel, don’t try to cancel with the resulting CancelFunc”.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? That cancellation will not propagate upwards.

The whole point of passing a context down to the service is so that it could abort all ongoing http requests and what not as the service is asked to terminate, to ensure prompt termination, so you definately want to pass down the context deeper, create child contexts, and by definition you always have to cancel the child contexts in order not to have leaks (even if the task scoped by the context is done, in which cancellation is just resource cleanup).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Audrius was correct in the first comment. I was simply mistaken because the child service won't get the cancel function passed to it.

Thank you both for going over the docs; I was just expecting to do it myself. I will hopefully get to reviewing these changes late today my time (I'm in EST).

v4/service.go Outdated
not be restarted and removed from the supervisor entirely.

ErrTerminateTree indicates that the entire supervision tree should be
restarted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restarted or terminated?
The comment says restarted, the error says terminated.

Atleast in syncthings case, we're loking for termination, as we want to unwind everything in case of a fatal database error or something along those lines.

v4/service.go Outdated
In Go 1.13 and greater, this is checked via errors.Is, so the error
can be further wrapped with whatever additional info you like. Prior
to Go 1.13, it will be checked via directly equality check, so the
distinguished errors can not be wrapped.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot

v4/service.go Outdated
the Service should also have been performed.

If a service does not stop within the supervisor's timeout duration, a log
entry will be made with a descriptive string to that effect. This does
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you "make a log entry",

"the supervisor will log a message with a descriptive string to that effect"

// having terminated, and includes the timeout the supervisor
// itself waited before closing the liveness channel.
case <-s.ctx.Done():
return ErrTimeout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this is called without the service being started?
I suspect you'll have a nil pointer exception?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sendControl above it shouldn't be able to succeed until the supervisor is started.

ctx, myCancel := context.WithCancel(ctx)
s.ctxMutex.Lock()
s.ctx = ctx
s.ctxMutex.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in all cases, the only use of s.ctx is just accessing it's Done(), so perhaps you don't need to store the whole context, and could just store the done channel.

Also, select on a nil channel never "fires", so I think it might be safe to not even have a mutex for that?

func (s *Supervisor) Serve(ctx context.Context) error {
// context documentation suggests that it is legal for functions to
// take nil contexts, it's user's responsibility to never pass them in.
if ctx == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think supporting passing nil context is a bad idea, as this effectively means that the user gives up the option to stop the service tree, which I don't think should ever be encouraged, and suggests sloppy application design.

If the user wants to be sloppy, he should pass context.TODO() or something.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because a careful parsing of the context library documentation says it is the caller's responsibility to pass in a correct context, which I am reading as, it's OK for a receiver of a context to treat nil contexts as undefined behavior, basically, so it's OK to proceed without it.

You'll note I'm not documenting this option, or showing it in examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm with @AudriusButkevicius insofar as passing a nil context is a code smell. However, I'm also aligned with @thejerf in his "defensive programming" impulse.

What if we returned an error? This would tick both the box of discouraging bad design and account for invalid input.

// and the way it works, I think it's OK in this case. This is the
// exceptional case, basically.
ctxMutex sync.Mutex
ctx context.Context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use sync.AtomicValue

README.md Outdated
with [contexts](https://golang.org/pkg/context/). If you are using suture
for the first time, I recommend it.

[suture v3](https://godoc.org/gopkg.in/thejerf/suture.v3{ is the latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[suture v3](https://godoc.org/gopkg.in/thejerf/suture.v3{ is the latest
[suture v3](https://godoc.org/gopkg.in/thejerf/suture.v3) is the latest

v4/supervisor.go Outdated
s.m.Unlock()

response := make(chan serviceID)
if !s.sendControl(addService{service, serviceName(service), response}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sendControl now returns an error instead of a bool.

v4/supervisor.go Outdated
case s.control <- sm:
return nil
case <-doneChan:
return ErrTimeout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ErrTimeout to represent that supervisor was stopped/context was cancelled feels like a source of confusion. I'd create a dedicated error like ErrSupervisorNotStarted.

@imsodin
Copy link
Contributor

imsodin commented Oct 8, 2020

@thejerf What is your plan or status on this? Sorry for the nagging, I definitely don't mean to complain. I am interested because I have a largish refactor of Syncthing's db layer brewing that uses the v4 api. Would you appreciate a PR taking a stab at resolving the review comments?
Again, thanks for all your work on suture and the great cooperation!

@thejerf
Copy link
Owner Author

thejerf commented Oct 8, 2020

Sorry, fell through life's cracks. I've pushed most of the updates suggested here just now.

I think I agree with lthibault that the logging should be done with one log function that takes a defined set of structs. It should define an interface that only our log messages conform to (to guarantee that no other ones will sneak in), and the log messages should be pre-configured with JSON struct tags for JSON log output. If you want to create a PR for that, that would be helpful.

Tests will need a bit of work; possible a logging object where you can register hooks for each of the incoming kinds of structs, to simulate the way they are used now.

I'll also try to work on it here at some point; I'll put a comment here when I start.

@imsodin
Copy link
Contributor

imsodin commented Oct 10, 2020

Thanks for the update!

I PRed a proposal for a new log/event interface, including adjusting tests: #44
Let me know what whether this fits more or less with what you had in mind.

v4/supervisor.go Outdated
@@ -694,7 +699,7 @@ terminate. A timeout value of 0 means to wait forever.

If a nil error is returned from this function, then the service was
terminated normally. If either the supervisor terminates or the timeout
passes, ErrTimeout is returned. (If this isn't even the right supervisor
passes, Err is returned. (If this isn't even the right supervisor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a find replace error? "Err"

thejerf and others added 5 commits October 11, 2020 21:26
v4: Add EventHook instead of various logging closures
1. Rename them all to start with Event, so they're contiguous in the
   godoc.
2. Add JSON serialization for all the events.
3. By suggestion, add a method to return map[string]interface{} for
   all events.
4. Include the original objects in the log, but mark the fields
   containing them as not to be serialized in JSON, so we don't
   serializing arbitrarily-large objects.
@thejerf thejerf merged commit ef1bd45 into master Oct 27, 2020
@thejerf
Copy link
Owner Author

thejerf commented Oct 27, 2020

@imsodin , @AudriusButkevicius , @lthibault , it is done and merged. This is my first attempt at making a true go module so let me know if I botched anything. But I did just quickly here make a new project that imported suture and it seemed to do what it should.

Thank you all for you work with me and I hope this is helpful for you.

I may still do some small doc tweaks but it's not anything worth holding a release for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants