Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart DelayedJob workers after they crash #5146

Merged
merged 1 commit into from
Jun 30, 2023
Merged

Conversation

javierm
Copy link
Member

@javierm javierm commented Jun 29, 2023

References

Background

We're receiving reports from Consul installations saying that, once in a while, DelayedJob processes stop.

In some cases we've solved this issue by monitoring these processes with Systemd or Monit, but implementing support for these tools in a way that works on existing installations is difficult.

On the other hand, DelayedJob provides a simple way to monitor its processes that might not be as powerful as systemd but it's much better than doing nothing and it's easy to make it work on existing installations

Objectives

  • Make it easier to maintain Consul installations running on production

Notes

We might switch to systemd in the future, particularly if we upgrade Puma (see pull request #4922).

Release Notes

⚠️ DelayedJob processes now create processes which monitor and restart DelayedJob processes in case they crash (see pull request #5146). If you're already monitoring these processes with tools like monit or systemd, you might want to disable this feature. Also note that, in order to stop delayed job, we now need to pass the -n option: RAILS_ENV=production bin/delayed_job -n 2 stop.

DelayedJob offers the `--monitor` (aliased as `-m`) option to create a
process that monitors the workers and restarts them when they crash.

This change implies that, in order to stop the delayed job workers, we
now need to pass the `-n` option when running `bin/delayed_job stop`:
`RAILS_ENV=production bin/delayed_job -n 2 stop`.
Copy link
Member

@Senen Senen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it in the staging server by killing the delayed_job processes by hand, and the processes were restarted automatically.

Also tested a server reboot and the Capistrano delayed_job tasks (delayed_job:start delayed_job:stop delayed_job:restart).

Also, checked how a deployment will create monitoring processes for existing installations.

Consul Democracy automation moved this from Reviewing to Testing Jun 30, 2023
@javierm javierm merged commit db4db07 into master Jun 30, 2023
15 checks passed
Consul Democracy automation moved this from Testing to Release 2.0.0 Jun 30, 2023
@javierm javierm deleted the delayed_job_monitor branch June 30, 2023 15:23
@javierm javierm added Release notes and removed 2.0 labels Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatically restart delayed job processes when they stop
2 participants