Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

./script/delayed_job restart bounces processes, but no new PIDs #3

Closed
chrisfinne opened this issue Aug 28, 2009 · 35 comments
Closed

./script/delayed_job restart bounces processes, but no new PIDs #3

chrisfinne opened this issue Aug 28, 2009 · 35 comments

Comments

@chrisfinne
Copy link

the PID files are never created for the new process(es), so subsequent restarts and stops won't work.

Not sure if this is a problem the the daemons gem (i'm on the latest 1.0.10) or delayed_job.

I see this on my Mac and Ubuntu boxes.

@ghost
Copy link

ghost commented Aug 31, 2009

I've had this problem too. I haven't been able to track it down, but it seems like an issue with the daemons library

@chrisfinne
Copy link
Author

Here's my hack in my cap script to restart it. (I hard-code to launch 3 delayed_job processes).
http:https://gist.github.com/178397

@jerodsanto
Copy link

I have this problem as well. Will try chrisfinne's hack until somebody comes up with a fix.

@rmm5t
Copy link

rmm5t commented Sep 23, 2009

I just ran into this problem as well. It appears to be a problem with the daemons gem. When the daemons gem "restarts" it stops, sleeps for 1 sec, and then starts. Pid file cleanup happens all over the place (I think incorrectly), and it looks like a race condition occurs. The daemons gem deletes pid files when you call stop, when the daemon exits normally, and when it traps a kill signal. It can take delayed_job a few seconds to respond to a kill signal because it first finishes what it was doing. Meanwhile, a new delayed_job daemon is forked (even after daemons 1 sec delay) and a new pid file is written. After the restart, the cleanup tasks of the original daemon clean up the pid files again -- incorrectly blowing away the new pid file.

In summary, the problem looks to be the daemons gem, and that gem should wait for the original process to stop running before restarting a new process.

@rmm5t
Copy link

rmm5t commented Sep 23, 2009

Turns out this is a known problem with the daemons gem. Too bad the simple fix hasn't been applied.
http:https://rubyforge.org/tracker/index.php?func=detail&aid=21050&group_id=524&atid=2084

@rmm5t
Copy link

rmm5t commented Sep 23, 2009

There's a fork of daemons that fixes the problem. http:https://github.com/ghazel/daemons

$ sudo gem uninstall daemons
$ sudo gem install ghazel-daemons

v1.0.11 works well for me and I'm going to start freezing my apps that use delayed_job to the ghazel-daemons gem.

@chrisfinne
Copy link
Author

closing

@rmm5t
Copy link

rmm5t commented Sep 24, 2009

I'm not sure this is worth closing until either daemons is fixed, delayed_job puts an explicit dependency on ghazel-daemons, or delayed_job implements a workaround.

@jerodsanto
Copy link

I agree, Its much easier for others to find if it is open and technically the issue has not been resolved.

@chrisfinne
Copy link
Author

I figured that since it was definitively proven to be another package's bug and a solid workaround beyond my ugly hack was detailed, I'd close it, but you make some good points, so I'll leave it open.

@bkeepers
Copy link

So is everyone still having this issue with the latest version?

@rmm5t
Copy link

rmm5t commented Sep 28, 2009

Brandon, Yes, the restart problem is in daemons-1.0.10 and has been there for about a year.

Edit: To clarify, ghazel-daemons-1.0.11 fixes the problem, but that fork is not well known and most people running delayed_job have daemons-1.0.10 installed.

@bkeepers
Copy link

Has anyone tried to contact the maintainer of daemons? I'm thinking about just ditching it altogether and trying to figure out a different solution.

@rmm5t
Copy link

rmm5t commented Sep 29, 2009

Brandon, The maintainer of daemons (Thomas Uehlinger) responded to the associated ticket about a year ago, but nothing since. I haven't tried to contact him myself. It doesn't look like there's been any activity in daemons since either.
http:https://rubyforge.org/tracker/index.php?func=detail&aid=21050&group_id=524&atid=2084

Perhaps daemon-spawn helps, though the gem on rubyforge is either not published yet or missing.
http:https://github.com/alexvollmer/daemon-spawn

@tcocca
Copy link

tcocca commented Oct 6, 2009

chrisfinne's hack worked perfect for me after struggling with this for so long.

Can this be overriddden in the config/deploy.rb instead of editing the plugin or gem?

@dlegg
Copy link

dlegg commented Oct 8, 2009

One thing I have noticed out of all of this is that restart option definitely doesn't work regardless. The pidfile gets blown away but the process isn't actually stopped or started which means that you can start another process with the zombie hanging around. You have to do an explicit stop and then start if you want to restart delayed_job, and that seems to work. I haven't done enough testing yet, but with chrisfinne's script above are we saying that we can have a similar thing happen with an explicit stop/start, hence waiting for the process to actually end?

@tcocca
Copy link

tcocca commented Oct 8, 2009

dlegg, correct, the restart doesn't work. chrisfinne's script does an explicit stop, then keeps checking for the process to actually stop and then not call start again till that process has stopped and the pid has dissapeared. This worked great for me. In the cap deploy you will see how long it takes for the process to actually stop with the "waiting for process to stop ..." text.

I would recommend trying this.

I haven't tried the other version of the daemons gem though so I can't speak to that.

~ tom

@scottj97
Copy link

chrisfinne's recipe works great, unless your deploy server is also your development server, in which case it will wait forever because it sees the 'cap delayed_job:restart' task in the process list. (I took care of that with another grep -v.)

What I don't understand is why an idle delayed_job server should take 20 seconds or more to exit??

@jimeh
Copy link

jimeh commented Mar 26, 2010

I've gone a slightly different way in making sure that delayed_job stops properly. With a combination of lsof, grep, and awk I'm killing all ruby processes which have the specific application's delayed_jobs.log file open.

It's working quite well and fast for me so far:
http:https://gist.github.com/345494

UPDATE: Here's a MUCH better fix, which adds the changes from ghazel's fork to the 1.0.10 deamons gem via overloading:
http:https://gist.github.com/346160

@sunkencity
Copy link

I still get the problem with no pid file no matter if I use ghazel-daemons gem or monkeypatching the same things. Right now I'm using chrisfinnes script to ensure shutdown and that works, and such a solution seems to be the right solution anyway, what if there's a long running email job or such.

@jimeh
Copy link

jimeh commented Apr 12, 2010

@sunkencity: I've actually ended up using the ghazel-daemons gem in the end. It's a little tricky, as you need to do this in your environment.rb file:

config.gem "ghazel-daemons", :lib => "daemons"
gem "ghazel-daemons"
require "daemons"

@sunkencity
Copy link

OK, thanks! I thought I had uninstalled daemons and everything was fine but I had forgotten that my capistrano automatically installs any missing gem dependencies, so I guess I can switch to that now.

Here's an extra task I use to make sure that the reload went well

http:https://gist.github.com/363553

@jimeh
Copy link

jimeh commented Apr 12, 2010

With the above environment.rb config you don't have to uninstall the daemons gem, the ghazel-daemons gem is force loaded.

Also, I see you took a similar approach to me when it comes to seeing if there are any orphaned DJ daemons running.

Incase you might find it useful, here's the final delayed_job Capistrano tasks I've ended up using: http:https://gist.github.com/345494

@seboslaw
Copy link

I am also seeing this problem :(
Running rails3-beta4 with delayed_job installed as a plugin (have tried it as a gem before), daemons (1.0.10 - ghazel-daemons-1.0.11 didn't make any difference) and ruby 1.8.7p249.

Weird thing is that it runs fine under OSX but quits right after the start on my production ubuntu box. "script/delayed_job run" runs fine on both...

@euanmaxwell
Copy link

I hit this problem on Ubuntu 10.04 today. I'd upgraded the daemons gem to 1.1.0 yesterday and DJ stopped working, it claimed to be forking the workers but they were immediately dying and the log file had some funny binary input. Removing 1.1.0 to force DJ to use 1.0.10 seems to have solved the problem for me.

@badnaam
Copy link

badnaam commented Aug 2, 2010

I have this in my enviornment.rb

config.gem 'delayed_job', :source => 'http:https://rubygems.org', :version => "2.1.0.pre"
config.gem "ghazel-daemons", :lib => "daemons", :source => 'http:https://gems.github.com'
gem "ghazel-daemons"
require "daemons"
But I stil can't get delayed_job to restart from capistrano.

desc "Restart the delayed_job process"
task :delayed_job_restart, :roles => :app do
    run "cd #{current_path};#{get_rails_env} script/delayed_job restart"
end

@MBO
Copy link

MBO commented Sep 7, 2010

I have same problem and I wrote this task to restart delayed_job without killing instantly all jobs of waiting in infinite loop if there are more dj's on server running. Works so far. Requires *NIX environment with awk and lsof

http:https://gist.github.com/568143

@thoughtless
Copy link

While this ticket is a real issue (and is related to https://github.com/collectiveidea/delayed_job/issues#issue/81 and https://github.com/collectiveidea/delayed_job/issues#issue/100) you must be careful when delayed job fails while writing nothing to delayed_job.log. Other problems besides this one can cause that problem. For example, if there is a database problem (such as an error in database.yml or a migration hasn't been run yet) you could get very similar symptoms.

Always make sure you check BOTH delayed_job.log and production.log (or whatever environment you are running delayed_job in). Delayed Job's catch-all exception handler outputs to the rails log, not to delayed_job.log.

@christophercotton
Copy link

In our case, starting multiple delayed_jobs -n 5 with restart, the PIDs do not get created. The reason is because intermediate processes are created and die before getting a chance to write out the PIDs. The Daemons::Controller.run if it is 'start' calls '@group.new_application.start' and if it is a restart calls '@group.start_all' The start_all forks a new process for each application to start (even if it is only one) the "start" just waits for the delayed_job to start correctly.

Processes in a restart

script/delayed_job (pid 1)
    (Daemons::ApplicationGroup) @group.start_all (which forks)
         application.start (pid 2)
              (since it isn't :ontop) call_as_daemon
                  delayed_worker_1 (pid 4)
         application.start (pid 3)
              (since it isn't :ontop) call_as_daemon
                  delayed_worker_1 (pid 5)

the start_all (and pid 1) will exit immediately after all the forks. It doesn't wait for each fork to finish. pid2 normally has enough time to write out the PID file. Though I'm guessing if your system is fast about launching everything, maybe none of them will write out PID files. Our case we would get the delayed_job started, but the PID won't be written out, and then we would start getting multiple processes because it had thought it wasn't started.

Our solution was just to put a "sleep 5" at the end of the script/delayed_job This seemed to be enough time to allow the PID to get created.

Daemons really seems to be broken and should be fixed (either just waiting for the forks to finish, or just don't fork during the start_all) or Delayed_Job should move to something else as a main method of daemonizing.

@airblade
Copy link

Why does it take so long (20s or more) for an idle delayed_job worker to stop?

I'm on delayed_job 2.1.4, daemons 1.1.0, Rails 3.0.10, and Ruby 1.9.2-p290.

/cc @scottj97

@thoughtless
Copy link

The script Delayed Job uses to stop idle processes loads the entire Rails environment before shutting down the worker. As far as I know, this is not necessary. Theoretically all that is needed to make it faster is to write a shutdown script that doesn't load Rails. But I don't remember the specifics well enough to estimate how easy/hard that would be for Delayed Job.
I've been toying with something like delayed job (https://github.com/thoughtless/angael) which takes this approach. That gem uses a manager process which does not use Rails, but the worker processes can use Rails. You just need to send the worker manager SIGINT and it will perform a graceful shutdown. I'm not recommend my gem as a drop-in replacement for delayed job. Delayed job is a battle-tested solution. My gem aims to be better in certain circumstances, but it is has only been used (to my knowledge) in a single production application. YMMV, etc.

@andrewdsmith
Copy link

According to the release announcement of 1.1.0, the broken behaviour reported here is fixed now. The referenced bug report has also been (long) closed by the maintainer. Not sure if issue #81 stops people upgrading, however.

@garethrees
Copy link

The v2.0 branch still seems to use daemons 1.0.10. Daemons is now on 1.1.9. Is there a reason delayed_job v2.0 is not using this?

@johncant
Copy link

johncant commented Oct 2, 2012

@garethrees, yeah. daemons 1.1.0 breaks delayed_job incurring issue #81, but downgrading to daemons 1.0.10 seems to fix it. Daemons 1.1.9 didn't work for me either.

@rchampourlier
Copy link

daemons 1.1.0 wasn't working for me (neither script/delayed_job run nor start), and reverting to 1.0.10 only solved the run part.

So I decided to try the approach describe here, using daemon-spawn gem instead. Check this gist too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests