Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-120317: Lock around global state in the tokenize module #120318

Merged
merged 13 commits into from
Jul 16, 2024

Conversation

lysnikolaou
Copy link
Contributor

@lysnikolaou lysnikolaou commented Jun 10, 2024

@Fidget-Spinner
Copy link
Member

This LGTM. Can you add the reproducer to test.test_free_threading please? I'll approve after!

@lysnikolaou
Copy link
Contributor Author

@Fidget-Spinner I added more locking around global state. Reviewing each commit seprately might make it easier.

Python/Python-tokenize.c Outdated Show resolved Hide resolved
Python/Python-tokenize.c Show resolved Hide resolved
Python/Python-tokenize.c Outdated Show resolved Hide resolved
Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a refactor, so good so far.

Python/Python-tokenize.c Show resolved Hide resolved
Python/Python-tokenize.c Show resolved Hide resolved
Python/Python-tokenize.c Outdated Show resolved Hide resolved
Python/Python-tokenize.c Show resolved Hide resolved
@lysnikolaou
Copy link
Contributor Author

Feedback addressed.

@Fidget-Spinner I used _Py_atomic_store_int instead of the macro. Is that okay?

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Feel free to ignore the following comments if this part of the tokenizer is not perf sensitive.

Python/Python-tokenize.c Outdated Show resolved Hide resolved
Python/Python-tokenize.c Outdated Show resolved Hide resolved
@lysnikolaou
Copy link
Contributor Author

All feedback addressed. Thanks a lot for the reviews and the help @Fidget-Spinner and @erlend-aasland!

@lysnikolaou lysnikolaou changed the title gh-120317: Lock around tokenizer calls in the tokenizer module gh-120317: Lock around global state in the tokenize module Jun 10, 2024
Python/Python-tokenize.c Outdated Show resolved Hide resolved
Copy link
Member

@pablogsal pablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM



@threading_helper.requires_working_threading()
class TestTokenize(unittest.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this locally: it crashes spectacularly without the fix :)

@@ -194,7 +252,13 @@ tokenizeriter_next(tokenizeriterobject *it)
}
if (it->done || type == ERRORTOKEN) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to protect the read? Otherwise we may get weird partial reads no?

@pablogsal
Copy link
Member

Actually, one question here: shouldn't we be protecting all reads of the it structure that are not under critical sections? Otherwise we are getting some partial reads no? I am missing something?

@pablogsal
Copy link
Member

For example, this read is not protected, no?:

    Py_ssize_t lineno = ISSTRINGLIT(type) ? it->tok->first_lineno : it->tok->lineno;
     Py_ssize_t end_lineno = it->tok->lineno;

@lysnikolaou lysnikolaou force-pushed the tokenize-thread-safety branch 2 times, most recently from 20c7cb2 to 1c51daf Compare June 11, 2024 20:34
@lysnikolaou
Copy link
Contributor Author

After some offline discussion with @pablogsal, we decided it makes more sense to just lock around all of tokenizeriter_next.

@lysnikolaou lysnikolaou added the needs backport to 3.13 bugs and security fixes label Jun 11, 2024
try:
r = next(it)
tokens.append(tokenize.TokenInfo._make(r))
time.sleep(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this test faster? It's going to run on every CI job. I think we should aim for 0.1 seconds or less for these sorts of tests. (It's currently ~3 seconds).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the amount of time each thread sleeps to 0.03 secs, which means that the whole test now takes about 0.1 seconds. Also verified that the test still fails without the fix and succeeds with it.

@lysnikolaou
Copy link
Contributor Author

lysnikolaou commented Jul 1, 2024

@pablogsal Can you do a final review here when you have some spare cycles?

Copy link
Member

@pablogsal pablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the PR and the comments and it LGTM

I also ran the tests locally in free threaded mode to confirm

@lysnikolaou lysnikolaou merged commit 8549559 into python:main Jul 16, 2024
34 checks passed
@miss-islington-app
Copy link

Thanks @lysnikolaou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@lysnikolaou lysnikolaou deleted the tokenize-thread-safety branch July 16, 2024 09:36
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 16, 2024
…honGH-120318)

(cherry picked from commit 8549559)

Co-authored-by: Lysandros Nikolaou <[email protected]>
Co-authored-by: Pablo Galindo <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Jul 16, 2024

GH-121841 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Jul 16, 2024
lysnikolaou added a commit that referenced this pull request Jul 16, 2024
…-120318) (#121841)

(cherry picked from commit 8549559)

Co-authored-by: Lysandros Nikolaou <[email protected]>
Co-authored-by: Pablo Galindo <[email protected]>
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants