Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix orphan workers on restart #1090

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dan-jensen
Copy link

@dan-jensen dan-jensen commented Jun 2, 2019

If 1 daemonized DJ worker is running and you execute restart with a
different n value, the existing worker will effectively be orphaned, in
that it continues to run through future stop/start/restart commands
with the new n value. The opposite is also true. If you change from an
n value of 2+ to 1 when you restart then the running workers will be
orphaned. This fixes that problem by looking in the pid_dir to identify
all worker process names and stopping them ALL prior to starting n new
workers.

This is a proposal for resolving #1091. Comments and alternative proposals very welcome.

If 1 daemonized DJ worker is running and you execute restart with a
different n value, the existing worker will effectively be orphaned, in
that it continues to run through future stop/start/restart commands
with the new n value. The opposite is also true. If you change from an
n value of 2+ to 1 when you restart then the running workers will be
orphaned. This fixes that problem by looking in the pid_dir to identify
all worker process names and stopping them ALL prior to starting n new
workers.
@dan-jensen dan-jensen force-pushed the fix-orphan-workers-on-restart branch from 83d496d to 62ac023 Compare June 2, 2019 13:29
@dan-jensen
Copy link
Author

Would love to hear your thoughts on this, @albus522.

@albus522
Copy link
Member

People do start/stop/restart independent groups of workers. So having restart stop anything it can find running would stop more than intended for certain use cases. A warning that additional workers may still be running might be all we can do here.

@dan-jensen
Copy link
Author

Hey thanks for the quick response, @albus522!

I could be wrong, but I think the existing code would stop more than intended in the case of independent groups of workers. Specifically, Delayed Job will stop everything with a process name of "delayed_job.#{worker_index}" (line 103/104 in this changeset). In fact, I think this change would actually improve that behavior. Because it relies on worker-specific PIDs instead of a shared naming convention, it won't stop workers running in the context of another project.

@albus522
Copy link
Member

Look above to the identifier section. That uses the same "delayed_job.*" structure.

@dan-jensen
Copy link
Author

@albus522, I'm sorry, I don't understand what you're getting at. I see you're talking about the section starting on line 93. But are you saying this changeset would introduce a problem with the use of the identifier argument?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants