Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pytest cannot deal with utf-8 encoded __repr__ of a custom object #678

Closed
pytestbot opened this issue Feb 10, 2015 · 7 comments
Closed

pytest cannot deal with utf-8 encoded __repr__ of a custom object #678

pytestbot opened this issue Feb 10, 2015 · 7 comments
Labels
type: bug problem that needs to be addressed
Milestone

Comments

@pytestbot
Copy link
Contributor

Originally reported by: Roman Bolshakov (BitBucket: roolebo, GitHub: roolebo)


I have a test module which does use beautiful soup to parse some test data. I added an assertion to check that a variable (I assigned result of parsing to) is an instance of unicode type. I had a bug in my code, a list with various objects got returned instead of the expected unicode string so the assertion fired. Besides, I got a totally unexpected UnicodeDecodeError in pytest.

Here's how it could be reproduced: https://gist.github.com/roolebo/ca816a26cdc0a8b17226

It turned out that beautiful soup returns utf-8 encoded string as a result of repr invocation on Tag object. The gist above could be nailed down without beautiful soup dependency:

#!python
# coding=utf-8
def test_unicode_repr():
    class Foo(object):
        a = 1

        def __repr__(self):
            return '<b class="boldest">Б</b>'
    f = Foo()
    assert 0 == f.a
#!python

lines = ['assert 0 == 1', '{1 = <b class="boldest">\xd0</b>.a', '}']

    def _format_lines(lines):
        """Format the individual lines

        This will replace the '{', '}' and '~' characters of our mini
        formatting language with the proper 'where ...', 'and ...' and ' +
        ...' text, taking care of indentation along the way.

        Return a list of formatted lines.
        """
        result = lines[:1]
        stack = [0]
        stackcnt = [0]
        for line in lines[1:]:
            if line.startswith('{'):
                if stackcnt[-1]:
                    s = u('and   ')
                else:
                    s = u('where ')
                stack.append(len(result))
                stackcnt[-1] += 1
                stackcnt.append(0)
>               result.append(u(' +') + u('  ')*(len(stack)-1) + s + line[1:])
E               UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 23: ordinal not in range(128)

../venv/lib/python2.7/site-packages/_pytest/assertion/util.py:104: UnicodeDecodeError

@pytestbot
Copy link
Contributor Author

Original comment by Andrey Gusev (BitBucket: nex2hex, GitHub: nex2hex):


fix:

#!python

result.append(u(' +') + u('  ')*(len(stack)-1) + s + line[1:].decode('utf-8'))

@pytestbot
Copy link
Contributor Author

Original comment by Anatoly Bubenkov (BitBucket: bubenkoff, GitHub: bubenkoff):


please prepare PR with a test

also for the actual fix: decode should be not strict, eg errors='ignore' or 'replace'

@pytestbot pytestbot added the type: bug problem that needs to be addressed label Jun 15, 2015
@The-Compiler
Copy link
Member

Any update, @nex2hex? A PR would be much appreciated! If you have any trouble, let us know and we'll be happy to help.

@RonnyPfannschmidt
Copy link
Member

this is fixed in #878

@nicoddemus
Copy link
Member

@RonnyPfannschmidt is this fixed? Can we close this?

@RonnyPfannschmidt
Copy link
Member

not yet, git destroyed my updated pr, its on my agenda for today

@RonnyPfannschmidt
Copy link
Member

the merge was done before

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug problem that needs to be addressed
Projects
None yet
Development

No branches or pull requests

5 participants