Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script fails with exception when encountering UTF-8 character #9

Closed
GoogleCodeExporter opened this issue Aug 23, 2015 · 12 comments
Closed

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?
1. run ./gitinspector.py script against repo with UTF-8 character in author's 
name


What is the expected output? What do you see instead?

raceback (most recent call last):
  File "./gitinspector.py", line 136, in <module>
    __run__.output()
  File "./gitinspector.py", line 57, in output
    outputable.output(changes.ChangesOutput(self.hard))
  File "/Users/tajima/Downloads/gitinspector/outputable.py", line 37, in output
    outputable.output_text()
  File "/Users/tajima/Downloads/gitinspector/changes.py", line 240, in output_text
    print(i.ljust(20)[0:20], end=" ")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 8: 
ordinal not in range(128)

The person's name has a ü

What version of the product are you using? On what operating system?

Mac OS Snow Lion
 ./gitinspector.py  --version
gitinspector 0.2.2


Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 12 Jul 2013 at 10:26

@GoogleCodeExporter
Copy link
Author

Thanks for the report. It "should" work just fine (TM) ;). It looks like it 
might be a terminal issue; the string itself is in UTF-8. Maybe something is 
causing it to fall back to ascii upon print?

Could you try running the following little script from your terminal?:

----

import locale
import sys

print locale.getpreferredencoding()
print sys.getdefaultencoding()
print sys.stdout.encoding
print sys.stdin.encoding

----

Tell me what output you get. My output is:

UTF-8
ascii
UTF-8
UTF-8

/Adam Waldenberg

Original comment by [email protected] on 12 Jul 2013 at 11:47

  • Added labels: OpSys-OSX

@GoogleCodeExporter
Copy link
Author

Hi,
I got:

$ python
Python 2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> import sys
>>>
>>> print locale.getpreferredencoding()
US-ASCII
>>> print sys.getdefaultencoding()
ascii
>>> print sys.stdout.encoding
US-ASCII
>>> print sys.stdin.encoding
US-ASCII

Original comment by [email protected] on 12 Jul 2013 at 11:53

@GoogleCodeExporter
Copy link
Author

So, looking into a bit further, on my system I have:

$ locale
LANG="en_CA.US-ASCII"
LC_COLLATE="en_CA.US-ASCII"
LC_CTYPE="en_CA.US-ASCII"
LC_MESSAGES="en_CA.US-ASCII"
LC_MONETARY="en_CA.US-ASCII"
LC_NUMERIC="en_CA.US-ASCII"
LC_TIME="en_CA.US-ASCII"
LC_ALL=

So, then I did a bit of looking up and I'm supposed to add the following to my 
.bash_login:

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

When I do that, running your little script I get this now:

$ python
Python 2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> import sys
>>>
>>> print locale.getpreferredencoding()
UTF-8
>>> print sys.getdefaultencoding()
ascii
>>> print sys.stdout.encoding
UTF-8
>>> print sys.stdin.encoding
UTF-8
>>>


Now it's working fine.

Thanks for your help and super fast response!! :)

Original comment by [email protected] on 12 Jul 2013 at 12:00

@GoogleCodeExporter
Copy link
Author

Aha. So that is the problem then. The easiest fix is to switch terminal 
encoding to UTF-8.

Take a look at this post (seems to be a common issue):
http:https://yzisin.wordpress.com/2012/01/09/how-to-fix-locale-issues-in-mac-os-x-lion
-terminal/

You could also try to just run the following before starting gitinspector; if 
it behaves anything like a normal unix terminal it should also work:

LANG=en_US.UTF-8

/Adam Waldenberg

Original comment by [email protected] on 12 Jul 2013 at 12:13

@GoogleCodeExporter
Copy link
Author

I see that you beat me to it :). Great that it is working.

/Adam Waldenberg


Original comment by [email protected] on 12 Jul 2013 at 12:14

@GoogleCodeExporter
Copy link
Author

Marking as Semi-Invalid as it's not really an issue in gitinspector. However, 
some kind of fix or improvement could be added to help alleviate the problems 
caused by a non-unicode terminal encoding.

/Adam Waldenberg 

Original comment by [email protected] on 12 Jul 2013 at 12:26

  • Changed state: Semi-Invalid

@GoogleCodeExporter
Copy link
Author

Issue 54 has been merged into this issue.

Original comment by [email protected] on 27 Dec 2014 at 6:23

@dgruss
Copy link

dgruss commented May 15, 2021

Doesn't seem to work for me. Locales were already configured properly I think.

>>> import locale
>>> import sys
>>> 
>>> print locale.getpreferredencoding()
UTF-8
>>> print sys.getdefaultencoding()
ascii
>>> print sys.stdout.encoding
UTF-8
>>> print sys.stdin.encoding
UTF-8
>>> 

Still:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 0: ordinal not in range(128)

@adam-waldenberg
Copy link
Member

adam-waldenberg commented May 15, 2021

@dgruss It's not - because its trying to encode a UTF-8 character into ascii - which it can't do. This is not a locale issue, but rather a terminal configuration issue.

Your problem is;

>>> print sys.getdefaultencoding()
ascii

So it's doing exactly what it should. Either change the terminal encoding to whatever the repo uses (UTF-8 in this case), or use the environment variable PYTHONIOENCODING to force it into UTF-8 regardless of what the terminal says.

You can read more about it here;
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONIOENCODING

Redirecting to a file should also do the trick, because that defaults to UTF-8 regardless.

@dgruss
Copy link

dgruss commented May 15, 2021 via email

@adam-waldenberg
Copy link
Member

adam-waldenberg commented May 16, 2021

No. The encoding for the terminal where you run gitinspector will always be the same. It doesn't matter what the source encoding is. Essentially, your problem here is that Python is trying to convert and show a character that is not available in the ascii charset. A UTF-8 destination, on the other hand, will support most characters and the conversion will work.

We can't display any data in the terminal if it's inherently impossible to do so. If the terminal doesnt support a certain character - it just doesn't. Python has ignoreor replace parameters that you can use when doing encoding. However, doing so would cause non-deterministic behavior where running on different terminals could create different results - something that's not desirable.

@dgruss
Copy link

dgruss commented May 16, 2021

Ok, then I'd add one more solution to the list here as PYTHONIOENCODING didn't change anything on my server:

Add to /usr/lib/python2.7/sitecustomize.py the code:

import sys
sys.setdefaultencoding('UTF-8')

Works then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants