Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): Ensure executions cannot resume if already running #10014

Merged
merged 1 commit into from
Jul 11, 2024

Conversation

ivov
Copy link
Contributor

@ivov ivov commented Jul 11, 2024

@n8n-assistant n8n-assistant bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Jul 11, 2024
Copy link

cypress bot commented Jul 11, 2024

4 flaky tests on run #5880 ↗︎

0 399 0 0 Flakiness 4

Details:

🌳 🖥️ browsers:node18.12.0-chrome107 🤖 ivov 🗃️ e2e/*
Project: n8n Commit: 5eb9c6728c
Status: Passed Duration: 05:23 💡
Started: Jul 11, 2024 1:43 PM Ended: Jul 11, 2024 1:48 PM
Flakiness  5-ndv.cy.ts • 2 flaky tests

View Output Video

Test Artifacts
NDV > should not retrieve remote options when required params throw errors Screenshots Video
NDV > Stop listening for trigger event from NDV Screenshots Video
Flakiness  10-undo-redo.cy.ts • 1 flaky test

View Output Video

Test Artifacts
Undo/Redo > should undo/redo adding connected nodes Test Replay Screenshots Video
Flakiness  24-ndv-paired-item.cy.ts • 1 flaky test

View Output Video

Test Artifacts
NDV > resolves expression with default item when input node is not parent, while still pairing items Test Replay Screenshots Video

Review all test suite changes for PR #10014 ↗︎

Copy link
Contributor

✅ All Cypress E2E specs passed

@ivov ivov merged commit d651be4 into master Jul 11, 2024
27 checks passed
@ivov ivov deleted the pay-1554-error-no-active-execution-found branch July 11, 2024 13:49
@despairblue
Copy link
Contributor

This is not fixing the race condition. It will only make it less likely, but the race condition is still there.

The race condition comes from the time between reading the execution from the db and update the status in the db:

Read:

const execution = await this.executionRepository.findSingleExecution(executionId, {
includeData: true,
unflattenData: true,
});

Write:

await this.executionRepository.updateStatus(executionId, 'running');

On my machines in main mode that's ca. 35ms. This needs to be effectively 0 for the race condition to be gone.

You can still reproduce this if you do curl [resumeURL] & curl [resumeURL].
You can also use hey [resumeURL] and you end up with restarting the execution ca. 10 times:
image

This is an improvement over what it was before:
image

What I had in mind as a fix was doing pessimistic locking on the application layer, e.g. updating the status only if it was waiting and then only continue if the update worked:

		// ...
		const result = await this.executionRepository.update(
			{ id: executionId, status: 'waiting' },
			{ status: 'running' },
		);

		if (result.affected === 0) {
			throw new ConflictError(`The execution "${executionId} is running already.`);
		}

		const execution = await this.executionRepository.findSingleExecution(executionId, {
			includeData: true,
			unflattenData: true,
		});

		// ...

With that in place we get this:
image

The problem is that this may bypass the execution recovery, or have side effects with it that I can't foresee.

What do you think? @ivov @valya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants