-
-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler overlaps on Daily schedule when PreventOverlapping is enabled #351
Comments
Thanks for reaching out + details on the issue. Will get back once I confirm the issue (I have an idea of what it is). |
Hey guys, did you find a solution for this bug? i am looking for this |
I believe the issue lies at this mutex 24 hour maximum duration |
@toutas you're correct. The issue is related (a) the 24-hour timeout and/or (b) the underlying storage mechanism (e.g. .NET Core in memory cache). I suspect there are issues where the cache entry is being evicted too soon or not being evicted when it should - which lead to 2 different issues (this one and another one where a job that did complete is still holding the mutex. |
looking through the code seems like the 24-hour timeout would be the sole culprit for this case, as it will ensure any locks being held for more than 1440 minutes are freed. not really knowledgeable about the dynamics of the solution, but maybe the timeout could have a per-scheduledEvent duration defaulting to 1440 minutes? one that could be set alongside the scheduledEvent. |
@toutas Here's the way things should work: Happy Path
Not So Happy PathThere are cases when certain types of .NET exceptions are not caught by For example, a The 24-hour timeout helps in this situation where a bad run of the job won't cause the lock to never unlock. Also, there's a scenario where jobs who are running in a poor state and have been running "too long" can "hog" the lock. Over an extended period (days), you might get a job that just hogs the lock and no jobs are ever run... So this is a trade-off -> should that job just hold the lock forever? Or should there be some play in this? So it's not a really black/white answer or solution. The same applies to if/when Coravel supported distributed locks. What if the application crashes before it can release a lock? The same applies... Potential Solutions
Again, this becomes not a very simple / straightforward solution to the problem. The quickest thing is to extend the 24-hour timeout to something longer, but that just pushes the issue - or to remove the timeout altogether. I'd favor removing the timeout - which means invocables that never complete or for some reason hog a mutex just never let it go... and that's a user error which needs to be fixes? But then - the issue will re-appear for distributed locks - something to consider if we want to keep the interface of the mutex the same but have the underlying storage (e.g. database, redis, etc.) do what it needs to do. Would appreciate any thoughts or input! 🤷♂️ |
There's also the distinction between a normal job and a "long running job". Maybe there's an opportunity to mark an invocable as a "long running job" where under the covers we don't timeout the mutex only for these... Sounds close to the "configure mutex timeout per invocable" solution - which is my favorite solution at the moment 🤷♂️ |
Personally I think a breaking change of some sort is favorable in this condition, as there is no way to know whether an invocable can be long-running or not unless the user puts a maximum expected time on it. And not requiring it to be set on every invocable will inevitably lead to people discovering it the hard way. As for the issues with .NET Core early cache evictions, non-catched exceptions and so on; if you are able to reproduce it or get something more tangible since you aren't certain of the behaviour, I would like to look into it. |
Description:
The PreventOverlapping works fine for the minute or hourly schedule but does not seem to work for a daily schedule.
I have a long running task that runs longer than 24 hours. Yet, a new overlapping task is created even though the process is still running.
I prepare the overlap prevention the following way:
//This is actually defined in a settings file, but added it here as a local variable
//cronSchedule = "00 03 * * " fails
//cronScheduleEveryMinute = " * * * *" works
//cronScheduleEveryHour = "00 */1 * * *" works
var string cronSchedule = "00 03 * * *";
builder.Services.UseScheduler(scheduler =>
{
scheduler.Schedule().Cron(cronSchedule).Zoned(TimeZoneInfo.Local)
.PreventOverlapping(nameof(MyJob));
});
Expected behavior
Expected that another task should not be created since the previous one is still running.
Actual behavior
A new task is created which overlaps with the previous one.
Steps to reproduce
//start the app with the cronSchedule, get it to trigger once and then change the date and time on your system one day forward while //the previous task still runs
var string cronSchedule = "00 03 * * *";
builder.Services.UseScheduler(scheduler =>
{
scheduler.Schedule().Cron(cronSchedule).Zoned(TimeZoneInfo.Local)
.PreventOverlapping(nameof(MyJob));
});
Exception(s)
No exception thrown
Coravel version
5.0.2
.NET Version
.Net 6.0
The text was updated successfully, but these errors were encountered: