MemoryError in `coverage report` on Python 3.12 #1785

cvzi · 2024-05-17T09:54:45Z

Describe the bug
On Python 3.12 an error occurs in coverage report, lower Python versions seem to be fine:

C:\Users\cuzi\Desktop\coverage_large_dict>coverage report
MemoryError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Python312\Scripts\coverage.exe\__main__.py", line 7, in <module>
  File "C:\Python312\Lib\site-packages\coverage\cmdline.py", line 970, in main
    status = CoverageScript().command_line(argv)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\coverage\cmdline.py", line 708, in command_line
    total = self.coverage.report(
            ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\coverage\control.py", line 1084, in report
    return reporter.report(morfs, outfile=file)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\coverage\report.py", line 181, in report
    for fr, analysis in get_analysis_to_report(self.coverage, morfs):
  File "C:\Python312\Lib\site-packages\coverage\report_core.py", line 100, in get_analysis_to_report
    analysis = coverage._analyze(morf)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\coverage\control.py", line 942, in _analyze
    return analysis_from_file_reporter(data, self.config.precision, file_reporter, filename)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\coverage\results.py", line 31, in analysis_from_file_reporter
    statements = file_reporter.lines()
                 ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\coverage\python.py", line 194, in lines
    return self.parser.statements
           ^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\coverage\python.py", line 189, in parser
    self._parser.parse_source()
  File "C:\Python312\Lib\site-packages\coverage\parser.py", line 271, in parse_source
    self._raw_parse()
  File "C:\Python312\Lib\site-packages\coverage\parser.py", line 154, in _raw_parse
    tokgen = generate_tokens(self.text)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\coverage\phystokens.py", line 179, in generate_tokens
    return list(tokenize.generate_tokens(readline))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\tokenize.py", line 577, in _generate_tokens_from_c_tokenizer
    yield TokenInfo._make(info)
          ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\collections\__init__.py", line 449, in _make
    result = tuple_new(cls, iterable)
             ^^^^^^^^^^^^^^^^^^^^^^^^
SystemError: <built-in method __new__ of type object at 0x00007FFFB9EE6270> returned a result with an exception set

CI logs: https://github.com/cvzi/coverage_large_dict/actions/runs/9125997761/job/25093333034

To Reproduce

Python 3.12 on Windows and Ubuntu
coverage.py 7.5.1 with C extension

coverage debug sys

-- sys -------------------------------------------------------
               coverage_version: 7.5.1
                coverage_module: C:\Python312\Lib\site-packages\coverage\__init__.py
                           core: -none-
                        CTracer: available
           plugins.file_tracers: -none-
            plugins.configurers: -none-
      plugins.context_switchers: -none-
              configs_attempted: C:\Users\cuzi\Desktop\coverage_large_dict\.coveragerc
                                 C:\Users\cuzi\Desktop\coverage_large_dict\setup.cfg
                                 C:\Users\cuzi\Desktop\coverage_large_dict\tox.ini
                                 C:\Users\cuzi\Desktop\coverage_large_dict\pyproject.toml
                   configs_read: C:\Users\cuzi\Desktop\coverage_large_dict\pyproject.toml
                    config_file: None
                config_contents: -none-
                      data_file: -none-
                         python: 3.12.3 (tags/v3.12.3:f6650f9, Apr  9 2024, 14:05:25) [MSC v.1938 64 bit (AMD64)]
                       platform: Windows-11-10.0.22621-SP0
                 implementation: CPython
                     executable: C:\Python312\python.exe
                   def_encoding: utf-8
                    fs_encoding: utf-8
                            pid: 4188
                            cwd: C:\Users\cuzi\Desktop\coverage_large_dict
                           path: C:\Python312\Scripts\coverage.exe
                                 C:\Python312\python312.zip
                                 C:\Python312\DLLs
                                 C:\Python312\Lib
                                 C:\Python312
                                 C:\Python312\Lib\site-packages
                    environment: COVERAGE_CORE = sysmon
                                 TEMP = C:\Users\cuzi\AppData\Local\Temp
                                 TMP = C:\Users\cuzi\AppData\Local\Temp
                   command_line: C:\Python312\Scripts\coverage debug sys
         sqlite3_sqlite_version: 3.45.1
             sqlite3_temp_store: 0
        sqlite3_compile_options: ATOMIC_INTRINSICS=0, COMPILER=msvc-1938, DEFAULT_AUTOVACUUM,
                                 DEFAULT_CACHE_SIZE=-2000, DEFAULT_FILE_FORMAT=4,
                                 DEFAULT_JOURNAL_SIZE_LIMIT=-1, DEFAULT_MMAP_SIZE=0, DEFAULT_PAGE_SIZE=4096,
                                 DEFAULT_PCACHE_INITSZ=20, DEFAULT_RECURSIVE_TRIGGERS,
                                 DEFAULT_SECTOR_SIZE=4096, DEFAULT_SYNCHRONOUS=2,
                                 DEFAULT_WAL_AUTOCHECKPOINT=1000, DEFAULT_WAL_SYNCHRONOUS=2,
                                 DEFAULT_WORKER_THREADS=0, DIRECT_OVERFLOW_READ, ENABLE_FTS3, ENABLE_FTS4,
                                 ENABLE_FTS5, ENABLE_MATH_FUNCTIONS, ENABLE_RTREE, MALLOC_SOFT_LIMIT=1024,
                                 MAX_ATTACHED=10, MAX_COLUMN=2000, MAX_COMPOUND_SELECT=500,
                                 MAX_DEFAULT_PAGE_SIZE=8192, MAX_EXPR_DEPTH=1000, MAX_FUNCTION_ARG=127,
                                 MAX_LENGTH=1000000000, MAX_LIKE_PATTERN_LENGTH=50000,
                                 MAX_MMAP_SIZE=0x7fff0000, MAX_PAGE_COUNT=0xfffffffe, MAX_PAGE_SIZE=65536,
                                 MAX_SQL_LENGTH=1000000000, MAX_TRIGGER_DEPTH=1000,
                                 MAX_VARIABLE_NUMBER=32766, MAX_VDBE_OP=250000000, MAX_WORKER_THREADS=8,
                                 MUTEX_W32, OMIT_AUTOINIT, SYSTEM_MALLOC, TEMP_STORE=1, THREADSAFE=1

What versions of what packages do you have installed? Only coverage and pytest 8.2.0

What code shows the problem? I suspect it's the large dictionary as a literal in a single line that causes the problem. Something like this:

a_large_dict_literal = {'\U0001F947':{'en':':1st_place_medal:','status':2,'E':3,'de':':goldmedaille:','es':':medalla_de_oro:','fr':':médaille_d’or:','ja':':金メダル:','ko':':금메달:','pt':':medalha_de_ouro:','it':':medaglia_d’oro:','fa':':مدال_طلا:','id':':medali_emas:','zh':':金牌:','ru':':золотая_медаль:','tr':':birincilik_madalyası:','ar':':ميدالية_مركز_أول:'},'\U0001F948':{'en':':2nd_place_medal:','status':2,'E':3,'de':':silbermedaille:','es':':medalla_de_plata:','fr':':médaille_d’argent:','ja':':銀メダル:','ko':':은메달:','pt':':medalha_de_prata:','it':':medaglia_d’argento:','fa':':مدال_نقره:','id':':medali_perak:','zh':':银牌:','ru':':серебряная_медаль:','tr':':ikincilik_madalyası:','ar':':ميدالية_مركز_ثان:'}, ...

See this 3 MB file: https://github.com/cvzi/coverage_large_dict/blob/main/largedict/__init__.py

What commands should we run to reproduce the problem?
This repository is a minimal example: https://github.com/cvzi/coverage_large_dict

git clone https://github.com/cvzi/coverage_large_dict.git
cd coverage_large_dict
pip install pytest coverage
coverage run -m pytest
coverage report

Additional context

When the same large dictionary is formated in a pretty way with newlines and spaces, the problem doesn't occur

The text was updated successfully, but these errors were encountered:

devdanzin · 2024-05-17T10:22:23Z

Confirmed: it didn't raise a MemoryError but consumed 19GB of RAM before I killed it.

devdanzin · 2024-05-17T10:43:21Z

This can be reproduced by converting the large dict to a string, then running:

import io, tokenize
import largedict

text = largedict.d  # The large dict as a string
readline = io.StringIO(text).readline
list(tokenize.generate_tokens(readline))

The excessive memory usage comes from the huge list that is created here:

coveragepy/coverage/phystokens.py

Line 179 in 81089de

return list(tokenize.generate_tokens(readline))

Given that the list is only used to iterate tokens, it might be better to simply return tokenize.generate_tokens(readline).

nedbat · 2024-05-17T16:04:59Z

Hmm, I don't get it. You say:

The excessive memory usage comes from the huge list that is created here:
return list(tokenize.generate_tokens(readline))

But reformatting the file to be on multiple lines makes it work fine, with a list of 450050 tokens. So it's not just the number of tokens, there's something about the one-line that is the problem, and it's internal to tokenize?

devdanzin · 2024-05-17T16:46:36Z

So it's not just the number of tokens, there's something about the one-line that is the problem, and it's internal to tokenize?

You're right, I confirmed that removing the list from coverage would reduce the memory usage, but didn't think it through.

When calling list(tokenize.generate_tokens(readline)), memory blows out with a large single line dict, while the same dict broken into lines doesn't cause memory issues. Adding this information to the CPython issue.

nedbat · 2024-05-17T22:07:52Z

Looks like python/cpython#119118 is narrowing down to a fix, so I'll close this.

cvzi added bug Something isn't working needs triage labels May 17, 2024

devdanzin mentioned this issue May 17, 2024

tokenize.generate_tokens() performance regression in 3.12 python/cpython#119118

Closed

cvzi mentioned this issue May 17, 2024

import takes ~30 seconds carpedm20/emoji#280

Open

nedbat closed this as completed May 17, 2024

nedbat added not our bug The problem was elsewhere and removed needs triage labels May 17, 2024

devdanzin mentioned this issue May 25, 2024

Memory leak when generating reports #1791

Closed

bersbersbers mentioned this issue Jun 7, 2024

Python 3.12 runs much slower than Python 3.11 microsoft/debugpy#1496

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MemoryError in `coverage report` on Python 3.12 #1785

MemoryError in `coverage report` on Python 3.12 #1785

cvzi commented May 17, 2024 •

edited

Loading

devdanzin commented May 17, 2024

devdanzin commented May 17, 2024

nedbat commented May 17, 2024

devdanzin commented May 17, 2024

nedbat commented May 17, 2024

MemoryError in coverage report on Python 3.12 #1785

MemoryError in coverage report on Python 3.12 #1785

Comments

cvzi commented May 17, 2024 • edited Loading

devdanzin commented May 17, 2024

devdanzin commented May 17, 2024

nedbat commented May 17, 2024

devdanzin commented May 17, 2024

nedbat commented May 17, 2024

MemoryError in `coverage report` on Python 3.12 #1785

MemoryError in `coverage report` on Python 3.12 #1785

cvzi commented May 17, 2024 •

edited

Loading