-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reschedule replacement tasks in case of artificial failure due to missing spooling output size #22472
Conversation
…sing spooling output size The code introduced in trinodb#22298 was lacking rescheduling replacement tasks. As a result query execution got stuck after we observed missing spooling output stats for task.
5a2135e
to
0b871ee
Compare
@@ -2317,10 +2318,10 @@ public void finalizeUpdateOfExchangeSinkInstanceHandle(TaskId taskId, ExchangeSi | |||
partition.updateExchangeSinkInstanceHandle(taskId, updatedExchangeSinkInstanceHandle); | |||
} | |||
|
|||
public void taskFinished(TaskId taskId, TaskStatus taskStatus) | |||
public Optional<List<PrioritizedScheduledTask>> taskFinished(TaskId taskId, TaskStatus taskStatus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't mind seeing a blurb that describes what the return value is supposed to represent. Something along the lines of
This method may decide to fail the task, in which case it returns the taskFailed() return value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will pass - partly because I am lazy - and partly because it is all enclosed to this class and quite straightforward to read.
@@ -1767,7 +1767,8 @@ public Void onRemoteTaskCompleted(RemoteTaskCompletedEvent event) | |||
TaskState taskState = taskStatus.getState(); | |||
StageExecution stageExecution = getStageExecution(taskId.getStageId()); | |||
if (taskState == TaskState.FINISHED) { | |||
stageExecution.taskFinished(taskId, taskStatus); | |||
Optional<List<PrioritizedScheduledTask>> failOverrideReplacementTasks = stageExecution.taskFinished(taskId, taskStatus); | |||
failOverrideReplacementTasks.ifPresent(prioritizedScheduledTasks -> prioritizedScheduledTasks.forEach(schedulingQueue::addOrUpdate)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand. The task will only be "failed" if it's already finished. Why schedule a replacement? Is it because failing a task cancels downstream tasks automatically? Then aren't those downstream tasks what needs rescheduling, not the current task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want query to finish we need to either:
- explicitly mark it as failed after task failure
- create new task which will do the work the failed task failed to complete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this task failed to complete, though?
Lines 2332 to 2333 in f8d373f
// it is rare but possible to get empty spooling output stats for task which completed successfully. | |
// As we need this information in FTE mode we need to fail such task artificially |
If this is true, then we're choosing neither of the two options above. It looks like we're creating a new task to redo the successfully completed work. I'm sure I'm missing something, but I doubt it's something obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - we are creating new task in place of the one which failed. It is new "attempt" but technically it is separate task.
.../java/io/trino/execution/scheduler/faulttolerant/EventDrivenFaultTolerantQueryScheduler.java
Show resolved
Hide resolved
@@ -2331,8 +2332,7 @@ public void taskFinished(TaskId taskId, TaskStatus taskStatus) | |||
// it is rare but possible to get empty spooling output stats for task which completed successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is rare but possible to get empty spooling output stats for task which completed successfully.
why it happens and can it be prevented instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is hard to prevent.
The TaskStatus and TaskInfo are two separate channels. There will always be a possibility that we get TaskStatus saying that task FINISHED but cannot get final TaskInfo. Currently we decide to not fail task in this siutation - but recreate final TaskInfo locally from information we have at hand. And it does not work with output spooling stats.
Alterantive would be to always kill task in such case - but I do not think it is better.
@@ -2317,10 +2318,10 @@ public void finalizeUpdateOfExchangeSinkInstanceHandle(TaskId taskId, ExchangeSi | |||
partition.updateExchangeSinkInstanceHandle(taskId, updatedExchangeSinkInstanceHandle); | |||
} | |||
|
|||
public void taskFinished(TaskId taskId, TaskStatus taskStatus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
post mortem: as it gets more complicated for unfamiliar reader, it would be good to add some javadoc what does it return as @dekimir suggested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - as you both want that I will add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code introduced in #22298 was lacking rescheduling replacement tasks. As a result query execution got stuck after we observed missing spooling output stats for task.
Description
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: