Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delaying failed jobs - fixed delay or exponential backoff + other fixes #227

Closed
wants to merge 1 commit into from

Conversation

niravmehta
Copy link

  • Support for delaying failed jobs - with fixed delay for each retry or exponential backoff calculation.
  • Way to get all delayed jobs from the queue.
  • Fix fo delay() not saving
  • Fix for "undefined" showing up via console.error when jobs failed and the error passed was a string
  • Honor original delay when retrying failed jobs - instead of processing them right away

…fixes

Support for delaying failed jobs - with fixed delay for each retry or
exponential backoff calculation.
Way to get all delayed jobs from the queue.
Fix fo delay() not saving
Fix for "undefined" showing up via console.error when jobs failed and
the error passed was a string
Honor original delay when retrying failed jobs - instead of processing
them right away
@niravmehta
Copy link
Author

I am building a HTTP POST catch/store/inspect/forward service and love the flexibility of Kue.

But Kue did not delay failed jobs - I found the delay and attempts mechanism was not working per a "user's" expectation.

Checked issues here and found a few alternatives. Spent a lot of time over last 5 days fixing the problems. :-)

This pull request fixes issues from #126 and adds a few other enhancements.

Hope you can merge it into the master soon.

BTW, one thing I couldn't fix was that in the UI, a failed-and-delayed job shows up both in failed and delayed list. When it completes eventually, it still shows in failed-and-completed lists. I believe this could be due to some race conditions in state saving.

@niravmehta
Copy link
Author

BTW, this is how one could use retry mechanism:

jobs.create("type", {...}).delay(1000).attempts(7).retryDelay(2000, 'exponential').save();

or

job.retryDelay(2000, 'fixed');
  • The "fixed" delay type will delay for given ms on every failure.
  • The "exponential" delay type will do exponential backoff calculation to find delay time

@rosskukulinski
Copy link

testing this in house and combined with @lukaszfiszer change to worker.js in #197. AWESOME

@rosskukulinski
Copy link

Noooope. @niravmehta After some more testing, it seems that delay is being completely ignored.

@niravmehta
Copy link
Author

@rosskukulinski Can you test it without the job removal code? The "job failed" event fires before the delay handling is processed. And if you want the job to be delayed, you certainly don't want to delete it until all attempts have been made.

Also, when you say the delay was ignored, do you mean the original delay you set? Or the retryDelay? The original delay will be used for the first execution of the job. Beyond that point, it will be the retryDelay (fixed or exponential).

Let me explain how this works so we are on the same page.

  • You set the delay, attempts, retryDelay while creating a job
  • Job is processed and fails
  • On failure, the worker checks if the job has any remaining attempts.
  • If there are attempts remaining, it will calculate the delay (if any) to be set.
  • The delay is set
  • When the next promote() call is executed, it will check if there are any delayed jobs that it needs to push to the work on (ie make inactive)
  • If there are, it sets them inactive, so that the next process operation will process them

Here are a few pointers on what could be going wrong if the code is not working as expected:

  • You may not be using the latest code from "master" branch. I tested it with changes in master upto July 22.
  • You haven't set a jobs.promote(interval in ms) call
  • The delay / retryDelays are set in seconds, and not ms (ms = sec * 1000)
  • The delay is not getting saved (this requires a ".set('delay')" call to the database. I have already handled this in my code, but if your code does not have it, the delay will always be the original delay (0 in case there was no original delay), and won't be updated with retryDelay
  • You are removing a job before it can be processed!

Here's how I debugged problems while I was testing this code:

  • Use redis-cli and found all keys with "q:*" or for a particular job / state. To figure out if the job was still in the DB (what shows on web UI can be different from what's in DB!)
  • Multiple console.log() statements at different places (along with Date.now()) to see if the code is taking the path I want it to

Hope this helps. And I am happy to solve if you find any bugs!

@rosskukulinski
Copy link

Hi @niravmehta,

I'll give it another shot today after reviewing your pointers.

It should also be noted in giant bold letters if you've changed delay/retryDelay to be in seconds now as opposed to ms. That's a breaking change for any existing code.

Thanks again for taking the time to work on this!

@niravmehta
Copy link
Author

Just to clarify, the delay and retryDelay times in my code are set in ms.

Setting them in seconds is a mistake, and that's what I was pointing out.

:-)

@slash-zero
Copy link

Hi,

Any idea when this will be merged in? Need the delay feature for something I am working on.

@etodanik
Copy link

Is this alive?

What are the issues with this? Can it be safely used in production?

@behrad
Copy link
Collaborator

behrad commented Mar 11, 2014

Not completely merged into 0.7.5 yet, but can be done in next update.

@etodanik
Copy link

There is another commit doing something similar.

Which one looks like it's going to make it into master? (they both have incompatible syntaxes).

This this one stable enough? I'm having trouble with the other one.

@behrad
Copy link
Collaborator

behrad commented Mar 12, 2014

I was supposed to define a natural api for this collecting different PRs and user scenarios, but had'nt have enough time to do it, but I will in near feature.
Please check #300 for later news...

@behrad behrad modified the milestones: 0.7.7, 0.7.1 Mar 27, 2014
@slash-zero
Copy link

@israelidanny not sure, but I am currently using this in production, as I need the retry functionality for failed jobs to work.

@behrad do you know when this will all be available in Kue?

@behrad
Copy link
Collaborator

behrad commented Mar 27, 2014

@slash-zero #300 (comment)

@behrad
Copy link
Collaborator

behrad commented Apr 19, 2014

#300 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants