Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Windows, test_wrong_alias_doesnt_work fails if the error message contains bytes that UTF-8 cannot decode #1777

Closed
devdanzin opened this issue May 5, 2024 · 4 comments · Fixed by #1780
Labels
bug Something isn't working

Comments

@devdanzin
Copy link
Contributor

Describe the bug
Running the tests on Windows, tests/test_process.py::AliasedCommandTest::test_wrong_alias_doesnt_work fails if the error message for the wrong alias contains bytes that UTF-8 cannot decode.

To Reproduce
How can we reproduce the problem? Please be specific. Don't link to a failing CI job. Answer the questions below:

  1. Python version: Python 3.12.3
  2. coverage.py version: 7.5.2a0.dev1
  3. Packages: pytest 8.1.1
  4. Code that shows the problem and commands to run:

The issue happens on this line:

output_str = output.decode(output_encoding()).replace("\r", "")

From this code:

def run_command(cmd: str) -> tuple[int, str]:
    """Run a command in a sub-process.

    Returns the exit status code and the combined stdout and stderr.

    """
    # Subprocesses are expensive, but convenient, and so may be over-used in
    # the test suite.  Use these lines to get a list of the tests using them:
    if 0:  # pragma: debugging
        with open("/tmp/processes.txt", "a") as proctxt:  # type: ignore[unreachable]
            print(os.getenv("PYTEST_CURRENT_TEST", "unknown"), file=proctxt, flush=True)

    # In some strange cases (PyPy3 in a virtualenv!?) the stdout encoding of
    # the subprocess is set incorrectly to ascii.  Use an environment variable
    # to force the encoding to be the same as ours.
    sub_env = dict(os.environ)
    sub_env['PYTHONIOENCODING'] = output_encoding()

    proc = subprocess.Popen(
        cmd,
        shell=True,
        env=sub_env,
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
    )
    output, _ = proc.communicate()
    status = proc.returncode

    # Get the output, and canonicalize it to strings with newlines.
    output_str = output.decode(output_encoding()).replace("\r", "")  # <= Here
    return status, output_str

The issue is that on Windows output can have cp1252 as encoding and output_encoding() will return UTF-8, leading to e.g.

 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc6 in position 13: invalid continuation byte

It can be reproduced by running the following code in Windows Terminal on Windows 11:

from tests.helpers import run_command
print(run_command('echo 1'))  # Prints "(0, '1\n')"
print(run_command('echo é'))  # Error

Expected behavior
run_command should be able to handle output from subprocess.Popen correctly.

Additional context
I see some ways to fix this, not sure what is best. I'd prefer calling locale.getpreferredencoding() in run_commands, as it's about the encoding used to generate the error message. Another way would be to change the order in which output_encoding tries to get the right encoding. And the way I like the least, try decoding as it's currently done and use locale.getpreferredencoding() if that fails.

I'll send a PR once we decide what's the best solution.

@devdanzin devdanzin added bug Something isn't working needs triage labels May 5, 2024
@nedbat
Copy link
Owner

nedbat commented May 8, 2024

output_encoding tries to get the encoding from sys.stdout, which I guess is utf-8 because it's Python's stdout. But the subprocess in that test is writing cp1252 bytes because it isn't Python at all, it's a bogus "coverage2" command which isn't found.

I'm curious what non-ASCII characters are in the subprocess output?

@devdanzin
Copy link
Contributor Author

The error message received is:

b"'coverage2' n\xc6o \x82 reconhecido como um comando interno\r\nou externo, um programa oper\xa0vel ou um arquivo em lotes.\r\n"

Which decodes to:

'coverage2' não é reconhecido como um comando interno
ou externo, um programa operável ou um arquivo em lotes.

Hm, there's a problem with my previous analysis: even though cp1252 is able to decode the output string, it's the wrong encoding. Running chcp shows that Windows Terminal is using cp850 instead.

So I'm not sure using locale.getpreferredencoding() is the right thing to do, as that gives cp1252 and it only works by chance. We'd get the right precise answer from device_encoding(0) instead.

@nedbat
Copy link
Owner

nedbat commented May 9, 2024

Sounds like os.device_encoding(0) is the way to go then?

@devdanzin
Copy link
Contributor Author

I believe so, after looking for many alternatives. I've submitted #1780 so you can check a proposed way of doing it.

nedbat added a commit that referenced this issue May 11, 2024
…run_command() (Fix #1777) (#1780)

* Use os.device_encoding(0) to figure out stdin encoding in run_command().

* Use the encoding of stdout (fd 1) to decode the output of subprocess.Popen.

Co-authored-by: Ned Batchelder <[email protected]>

* refactor to remove a helper used in only one place

---------

Co-authored-by: Ned Batchelder <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants