Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryError in coverage report on Python 3.12 #1785

Closed
cvzi opened this issue May 17, 2024 · 5 comments
Closed

MemoryError in coverage report on Python 3.12 #1785

cvzi opened this issue May 17, 2024 · 5 comments
Labels
bug Something isn't working not our bug The problem was elsewhere

Comments

@cvzi
Copy link

cvzi commented May 17, 2024

Describe the bug
On Python 3.12 an error occurs in coverage report, lower Python versions seem to be fine:

C:\Users\cuzi\Desktop\coverage_large_dict>coverage report
MemoryError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Python312\Scripts\coverage.exe\__main__.py", line 7, in <module>
  File "C:\Python312\Lib\site-packages\coverage\cmdline.py", line 970, in main
    status = CoverageScript().command_line(argv)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\coverage\cmdline.py", line 708, in command_line
    total = self.coverage.report(
            ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\coverage\control.py", line 1084, in report
    return reporter.report(morfs, outfile=file)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\coverage\report.py", line 181, in report
    for fr, analysis in get_analysis_to_report(self.coverage, morfs):
  File "C:\Python312\Lib\site-packages\coverage\report_core.py", line 100, in get_analysis_to_report
    analysis = coverage._analyze(morf)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\coverage\control.py", line 942, in _analyze
    return analysis_from_file_reporter(data, self.config.precision, file_reporter, filename)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\coverage\results.py", line 31, in analysis_from_file_reporter
    statements = file_reporter.lines()
                 ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\coverage\python.py", line 194, in lines
    return self.parser.statements
           ^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\coverage\python.py", line 189, in parser
    self._parser.parse_source()
  File "C:\Python312\Lib\site-packages\coverage\parser.py", line 271, in parse_source
    self._raw_parse()
  File "C:\Python312\Lib\site-packages\coverage\parser.py", line 154, in _raw_parse
    tokgen = generate_tokens(self.text)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\coverage\phystokens.py", line 179, in generate_tokens
    return list(tokenize.generate_tokens(readline))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\tokenize.py", line 577, in _generate_tokens_from_c_tokenizer
    yield TokenInfo._make(info)
          ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\collections\__init__.py", line 449, in _make
    result = tuple_new(cls, iterable)
             ^^^^^^^^^^^^^^^^^^^^^^^^
SystemError: <built-in method __new__ of type object at 0x00007FFFB9EE6270> returned a result with an exception set

CI logs: https://github.com/cvzi/coverage_large_dict/actions/runs/9125997761/job/25093333034

To Reproduce

  • Python 3.12 on Windows and Ubuntu

  • coverage.py 7.5.1 with C extension

  • coverage debug sys

    -- sys -------------------------------------------------------
                   coverage_version: 7.5.1
                    coverage_module: C:\Python312\Lib\site-packages\coverage\__init__.py
                               core: -none-
                            CTracer: available
               plugins.file_tracers: -none-
                plugins.configurers: -none-
          plugins.context_switchers: -none-
                  configs_attempted: C:\Users\cuzi\Desktop\coverage_large_dict\.coveragerc
                                     C:\Users\cuzi\Desktop\coverage_large_dict\setup.cfg
                                     C:\Users\cuzi\Desktop\coverage_large_dict\tox.ini
                                     C:\Users\cuzi\Desktop\coverage_large_dict\pyproject.toml
                       configs_read: C:\Users\cuzi\Desktop\coverage_large_dict\pyproject.toml
                        config_file: None
                    config_contents: -none-
                          data_file: -none-
                             python: 3.12.3 (tags/v3.12.3:f6650f9, Apr  9 2024, 14:05:25) [MSC v.1938 64 bit (AMD64)]
                           platform: Windows-11-10.0.22621-SP0
                     implementation: CPython
                         executable: C:\Python312\python.exe
                       def_encoding: utf-8
                        fs_encoding: utf-8
                                pid: 4188
                                cwd: C:\Users\cuzi\Desktop\coverage_large_dict
                               path: C:\Python312\Scripts\coverage.exe
                                     C:\Python312\python312.zip
                                     C:\Python312\DLLs
                                     C:\Python312\Lib
                                     C:\Python312
                                     C:\Python312\Lib\site-packages
                        environment: COVERAGE_CORE = sysmon
                                     TEMP = C:\Users\cuzi\AppData\Local\Temp
                                     TMP = C:\Users\cuzi\AppData\Local\Temp
                       command_line: C:\Python312\Scripts\coverage debug sys
             sqlite3_sqlite_version: 3.45.1
                 sqlite3_temp_store: 0
            sqlite3_compile_options: ATOMIC_INTRINSICS=0, COMPILER=msvc-1938, DEFAULT_AUTOVACUUM,
                                     DEFAULT_CACHE_SIZE=-2000, DEFAULT_FILE_FORMAT=4,
                                     DEFAULT_JOURNAL_SIZE_LIMIT=-1, DEFAULT_MMAP_SIZE=0, DEFAULT_PAGE_SIZE=4096,
                                     DEFAULT_PCACHE_INITSZ=20, DEFAULT_RECURSIVE_TRIGGERS,
                                     DEFAULT_SECTOR_SIZE=4096, DEFAULT_SYNCHRONOUS=2,
                                     DEFAULT_WAL_AUTOCHECKPOINT=1000, DEFAULT_WAL_SYNCHRONOUS=2,
                                     DEFAULT_WORKER_THREADS=0, DIRECT_OVERFLOW_READ, ENABLE_FTS3, ENABLE_FTS4,
                                     ENABLE_FTS5, ENABLE_MATH_FUNCTIONS, ENABLE_RTREE, MALLOC_SOFT_LIMIT=1024,
                                     MAX_ATTACHED=10, MAX_COLUMN=2000, MAX_COMPOUND_SELECT=500,
                                     MAX_DEFAULT_PAGE_SIZE=8192, MAX_EXPR_DEPTH=1000, MAX_FUNCTION_ARG=127,
                                     MAX_LENGTH=1000000000, MAX_LIKE_PATTERN_LENGTH=50000,
                                     MAX_MMAP_SIZE=0x7fff0000, MAX_PAGE_COUNT=0xfffffffe, MAX_PAGE_SIZE=65536,
                                     MAX_SQL_LENGTH=1000000000, MAX_TRIGGER_DEPTH=1000,
                                     MAX_VARIABLE_NUMBER=32766, MAX_VDBE_OP=250000000, MAX_WORKER_THREADS=8,
                                     MUTEX_W32, OMIT_AUTOINIT, SYSTEM_MALLOC, TEMP_STORE=1, THREADSAFE=1
    
    

  • What versions of what packages do you have installed? Only coverage and pytest 8.2.0

  • What code shows the problem? I suspect it's the large dictionary as a literal in a single line that causes the problem. Something like this:

    a_large_dict_literal = {'\U0001F947':{'en':':1st_place_medal:','status':2,'E':3,'de':':goldmedaille:','es':':medalla_de_oro:','fr':':médaille_d’or:','ja':':金メダル:','ko':':금메달:','pt':':medalha_de_ouro:','it':':medaglia_d’oro:','fa':':مدال_طلا:','id':':medali_emas:','zh':':金牌:','ru':':золотая_медаль:','tr':':birincilik_madalyası:','ar':':ميدالية_مركز_أول:'},'\U0001F948':{'en':':2nd_place_medal:','status':2,'E':3,'de':':silbermedaille:','es':':medalla_de_plata:','fr':':médaille_d’argent:','ja':':銀メダル:','ko':':은메달:','pt':':medalha_de_prata:','it':':medaglia_d’argento:','fa':':مدال_نقره:','id':':medali_perak:','zh':':银牌:','ru':':серебряная_медаль:','tr':':ikincilik_madalyası:','ar':':ميدالية_مركز_ثان:'}, ...

    See this 3 MB file: https://github.com/cvzi/coverage_large_dict/blob/main/largedict/__init__.py

  • What commands should we run to reproduce the problem?
    This repository is a minimal example: https://github.com/cvzi/coverage_large_dict

    git clone https://github.com/cvzi/coverage_large_dict.git
    cd coverage_large_dict
    pip install pytest coverage
    coverage run -m pytest
    coverage report
    

Additional context

When the same large dictionary is formated in a pretty way with newlines and spaces, the problem doesn't occur

@cvzi cvzi added bug Something isn't working needs triage labels May 17, 2024
@devdanzin
Copy link
Contributor

Confirmed: it didn't raise a MemoryError but consumed 19GB of RAM before I killed it.

@devdanzin
Copy link
Contributor

This can be reproduced by converting the large dict to a string, then running:

import io, tokenize
import largedict

text = largedict.d  # The large dict as a string
readline = io.StringIO(text).readline
list(tokenize.generate_tokens(readline))

The excessive memory usage comes from the huge list that is created here:

return list(tokenize.generate_tokens(readline))

Given that the list is only used to iterate tokens, it might be better to simply return tokenize.generate_tokens(readline).

@nedbat
Copy link
Owner

nedbat commented May 17, 2024

Hmm, I don't get it. You say:

The excessive memory usage comes from the huge list that is created here:
return list(tokenize.generate_tokens(readline))

But reformatting the file to be on multiple lines makes it work fine, with a list of 450050 tokens. So it's not just the number of tokens, there's something about the one-line that is the problem, and it's internal to tokenize?

@devdanzin
Copy link
Contributor

So it's not just the number of tokens, there's something about the one-line that is the problem, and it's internal to tokenize?

You're right, I confirmed that removing the list from coverage would reduce the memory usage, but didn't think it through.

When calling list(tokenize.generate_tokens(readline)), memory blows out with a large single line dict, while the same dict broken into lines doesn't cause memory issues. Adding this information to the CPython issue.

@nedbat
Copy link
Owner

nedbat commented May 17, 2024

Looks like python/cpython#119118 is narrowing down to a fix, so I'll close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working not our bug The problem was elsewhere
Projects
None yet
Development

No branches or pull requests

3 participants