Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reschedule replacement tasks in case of artificial failure due to missing spooling output size #22472

Conversation

losipiuk
Copy link
Member

The code introduced in #22298 was lacking rescheduling replacement tasks. As a result query execution got stuck after we observed missing spooling output stats for task.

Description

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Section
* Fix rare error condition where query execution could block when executing with task retries enabled.

@cla-bot cla-bot bot added the cla-signed label Jun 21, 2024
@losipiuk losipiuk requested review from dekimir and sopel39 June 21, 2024 16:04
…sing spooling output size

The code introduced in trinodb#22298 was
lacking rescheduling replacement tasks. As a result query execution got
stuck after we observed missing spooling output stats for task.
@losipiuk losipiuk force-pushed the lukaszos/reschedule-replacement-tasks-in-case-of-artificial-failure-due-to-missing-spooling-output-size-dd6ec1 branch from 5a2135e to 0b871ee Compare June 21, 2024 16:05
@losipiuk losipiuk requested a review from findepi June 21, 2024 16:13
@@ -2317,10 +2318,10 @@ public void finalizeUpdateOfExchangeSinkInstanceHandle(TaskId taskId, ExchangeSi
partition.updateExchangeSinkInstanceHandle(taskId, updatedExchangeSinkInstanceHandle);
}

public void taskFinished(TaskId taskId, TaskStatus taskStatus)
public Optional<List<PrioritizedScheduledTask>> taskFinished(TaskId taskId, TaskStatus taskStatus)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mind seeing a blurb that describes what the return value is supposed to represent. Something along the lines of

This method may decide to fail the task, in which case it returns the taskFailed() return value.

Copy link
Member Author

@losipiuk losipiuk Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will pass - partly because I am lazy - and partly because it is all enclosed to this class and quite straightforward to read.

@@ -1767,7 +1767,8 @@ public Void onRemoteTaskCompleted(RemoteTaskCompletedEvent event)
TaskState taskState = taskStatus.getState();
StageExecution stageExecution = getStageExecution(taskId.getStageId());
if (taskState == TaskState.FINISHED) {
stageExecution.taskFinished(taskId, taskStatus);
Optional<List<PrioritizedScheduledTask>> failOverrideReplacementTasks = stageExecution.taskFinished(taskId, taskStatus);
failOverrideReplacementTasks.ifPresent(prioritizedScheduledTasks -> prioritizedScheduledTasks.forEach(schedulingQueue::addOrUpdate));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. The task will only be "failed" if it's already finished. Why schedule a replacement? Is it because failing a task cancels downstream tasks automatically? Then aren't those downstream tasks what needs rescheduling, not the current task?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want query to finish we need to either:

  • explicitly mark it as failed after task failure
  • create new task which will do the work the failed task failed to complete

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this task failed to complete, though?

// it is rare but possible to get empty spooling output stats for task which completed successfully.
// As we need this information in FTE mode we need to fail such task artificially

If this is true, then we're choosing neither of the two options above. It looks like we're creating a new task to redo the successfully completed work. I'm sure I'm missing something, but I doubt it's something obvious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - we are creating new task in place of the one which failed. It is new "attempt" but technically it is separate task.

@losipiuk losipiuk merged commit f8d373f into trinodb:master Jun 24, 2024
94 of 95 checks passed
@github-actions github-actions bot added this to the 451 milestone Jun 24, 2024
@@ -2331,8 +2332,7 @@ public void taskFinished(TaskId taskId, TaskStatus taskStatus)
// it is rare but possible to get empty spooling output stats for task which completed successfully.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is rare but possible to get empty spooling output stats for task which completed successfully.

why it happens and can it be prevented instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is hard to prevent.
The TaskStatus and TaskInfo are two separate channels. There will always be a possibility that we get TaskStatus saying that task FINISHED but cannot get final TaskInfo. Currently we decide to not fail task in this siutation - but recreate final TaskInfo locally from information we have at hand. And it does not work with output spooling stats.
Alterantive would be to always kill task in such case - but I do not think it is better.

@@ -2317,10 +2318,10 @@ public void finalizeUpdateOfExchangeSinkInstanceHandle(TaskId taskId, ExchangeSi
partition.updateExchangeSinkInstanceHandle(taskId, updatedExchangeSinkInstanceHandle);
}

public void taskFinished(TaskId taskId, TaskStatus taskStatus)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

post mortem: as it gets more complicated for unfamiliar reader, it would be good to add some javadoc what does it return as @dekimir suggested

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - as you both want that I will add a comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants