Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use system default encoding when passing code to PIPE #209

Open
NoAnyLove opened this issue Sep 20, 2017 · 2 comments
Open

Use system default encoding when passing code to PIPE #209

NoAnyLove opened this issue Sep 20, 2017 · 2 comments
Assignees
Labels

Comments

@NoAnyLove
Copy link

On Windows system, if the source code contains non-ascii characters, the autopep8 and yapf will fail to format the code. For example,

# coding: utf-8
print("中文")

Format above code with autopep8, it outputs

"format_test.py" 3L, 38C
Trying definition from g:formatdef_autopep8
Evaluated formatprg: autopep8 - --max-line-length=80
Using python 3 code...
Formatter autopep8 has errors: b'Traceback (most recent call last):\r\n  File "e:\\python36\\lib\\runpy.py", line 193, in _run_module_as_main\r\n    "__main__", mod_spec)\r\n  File "e:\\python36\\lib\\runpy.py", line 85, in _run_code\r\n    exec(code, run_globals)\r\n  File "E:\\Python36\\Scripts\\autopep8.exe\\__main__.py", line 9, in <module>\r\n  File "e:\\python36\\lib\\site-packages\\autopep8.py", line 3803, in main\r\n    fix_code(sys.stdin.read(), args, encoding=encoding))\r\nUnicodeEncodeError: \'gbk\' codec can\'t encode character \'\\udcad\' in position 25: illegal multibyte sequence\r\n'
Definition in 'g:formatdef_autopep8' was unsuccessful.
No format definitions were successful.
Removing trailing whitespace...
Retabbing...
Autoindenting...
2 lines to indent... 
3 lines indented 

and yapf outputs,

Trying definition from g:formatdef_yapf
Evaluated formatprg: yapf --style="{based_on_style:pep8,indent_width:4,column_limit:80}" -l 1-3
Using python 3 code...
Formatter yapf has errors: b'Traceback (most recent call last):\r\n  File "e:\\python36\\lib\\runpy.py", line 193, in _run_module_as_main\r\n    "__main__", mod_spec)\r\n  File "e:\\python36\\lib\\runpy.py", line 85, in _run_code\r\n    exec(code, run_globals)\r\n  File "E:\\Python36\\Scripts\\yapf.exe\\__main__.py", line 9, in <module>\r\n  File "e:\\python36\\lib\\site-packages\\yapf\\__init__.py", line 306, in run_main\r\n    sys.exit(main(sys.argv))\r\n  File "e:\\python36\\lib\\site-packages\\yapf\\__init__.py", line 177, in main\r\n    file_resources.WriteReformattedCode(\'<stdout>\', reformatted_source)\r\n  File "e:\\python36\\lib\\site-packages\\yapf\\yapflib\\file_resources.py", line 99, in WriteReformattedCode\r\n    py3compat.EncodeAndWriteToStdout(reformatted_code)\r\n  File "e:\\python36\\lib\\site-packages\\yapf\\yapflib\\py3compat.py", line 80, in EncodeAndWriteToStdout\r\n    sys.stdout.buffer.write(s.encode(encoding))\r\nUnicodeEncodeError: \'utf-8\' codec can\'t encode character \'\\udcad\' in position 25: surrogates not allowed\r\n'
Definition in 'g:formatdef_yapf' was unsuccessful.
No format definitions were successful.
Removing trailing whitespace...
Retabbing...
Autoindenting...
2 lines to indent... 
3 lines indented 

This issue is related to #25, and only occurs on Windows. The reason is as follows.

Python 3 uses utf-8 as default encoding, and so does Linux system. The source code passing via PIPE will always be utf-8. But on Windows, it becomes tricky. With the following code, we can check the encoding used in Windows,

import sys
import os

print("Is a tty: {}".format(os.isatty(sys.stdin.fileno())))
print(sys.stdin.encoding)
> python3 test_stdin_encoding.py
Is a tty: True
utf-8

> echo "hello" | python3 test_stdin_encoding.py
Is a tty: False
cp936

Windows system does not always use utf-8(65001) as its default encoding for console. In fact, it rarely use utf-8 as default setting. We can chagne the system setting to force windows to use utf-8, but I think it's beyond this topic.

On the other hand, vim-autoformat always encodes source code in utf-8 and pass it to PIPE, but the formatter program has no idea about the encoding and may assume it is the defualt encoding used by system.

I would like to recommend to get the encoding at run time, and uses it to encode the code. For example,

# L249-250 in autoformat.vim
encoding = sys.stdin.encoding
text = bytes(os.linesep.join(vim.current.buffer[:]) + os.linesep, encoding)

# L276 in autoformat.vim
stdoutdata = stdoutdata.decode(encoding)

This should fix the issue we mentioned above and #25. Besides, it should not play negative effect for other system and encodings.

@chtenb
Copy link
Member

chtenb commented Sep 20, 2017

Thanks for supplying this information. I wasn't aware of the python code writing in a different encoding than the system default. This indeed needs to be solved, and I will have a look at it when I have time.

@chtenb chtenb self-assigned this Oct 16, 2018
@chtenb chtenb added the bug label Oct 21, 2018
@NewUserHa
Copy link

+1
hope fix this quickly please!

google/yapf#449 they already fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants