test: memoryview of Apache Arrow buffers (#9) #11

maartenbreddels · 2020-09-11T11:53:27Z

This tests the issue mentioned in #9, which gives me


    def test_arrow():
        import pyarrow as pa
        ar = pa.array(['foo', 'bar'])
        blake = blake3()
>       blake.update(memoryview(ar.buffers()[1]))
E       BufferError: Incompatible type as buffer

tests/test_blake3.py:98: BufferError

oconnor663 · 2020-09-11T15:34:25Z

The failure seems to be at this line, with PyBuffer::get returning an error. I'm not very familiar with the details of Python's buffer protocol, or with whatever quirks there might be in how PyO3 wraps it. I'm also unfamiliar with PyArrow. But as far as I can tell, this is either a bug in PyO3, a bug in PyArrow, or expected behavior?

oconnor663 · 2020-09-11T15:39:36Z

It looks like the memoryview part might be a red herring. I get the same error if I try to use the buffers object directly.

oconnor663 · 2020-09-11T15:58:01Z

Ok, putting some debugging into PyO3 clarifies this. The issue is that the PyArrow buffer has a signed char type. You can see that like this:

>>> import pyarrow
>>> ar = pyarrow.array(["foo", "bar"])
>>> memoryview(ar.buffers()[1]).format
'b'

Here's the table of format specifiers. Compare that to regular unsigned char format of a regular bytes object:

>>> memoryview(b"foobar").format
'B'

That leads to a fix. If I replace u8 with i8 in the PyBuffer::get call, then the test passes. (But of course everything else breaks.) Perhaps what we should do is to try to get the buffer as both u8 and i8, and use whichever one succeeds? This assumes that we're ok with pointer casting i8's to u8's...maybe that would have a weird result on hypothetical non-two's-complement machines?

maartenbreddels · 2020-09-11T19:13:52Z

I think it will be a good approach, in the end, it's just bytes that will be processed, the type information will probably be lost (in the blake library) and nobody should be making any memory copies.

oconnor663 · 2020-09-12T00:27:53Z

I've pushed beb4e32, which includes a fix and some tests, so I'll close out this PR. Thanks for pointing me to this!

maartenbreddels · 2020-09-12T04:56:59Z

Great, thanks! Are you planning a release soon? (from mobile phone)

…

On Sat, 12 Sep 2020, 02:28 oconnor663, ***@***.***> wrote: I've pushed beb4e32 <beb4e32>, which includes a fix and some tests, so I'll close out this PR. Thanks for pointing me to this! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#11 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANPEPLN63TVJWPG56HT2ZDSFK6BNANCNFSM4RHJ3UPQ> .

Changes since 0.1.6: - Add support for hashing buffers of signed chars. This is uncommon, but it can come up with e.g. PyArrow. See #9 and #11.

oconnor663 · 2020-09-12T10:12:45Z

Just pushed version 0.1.7 :)

test: memoryview of Apache Arrow buffers (oconnor663#9)

4e3a6f2

maartenbreddels force-pushed the test_arrow_memoryview branch from 725d40c to 4e3a6f2 Compare September 11, 2020 11:54

oconnor663 closed this Sep 12, 2020

oconnor663 pushed a commit that referenced this pull request Sep 12, 2020

version 0.1.7

96b27cb

Changes since 0.1.6: - Add support for hashing buffers of signed chars. This is uncommon, but it can come up with e.g. PyArrow. See #9 and #11.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: memoryview of Apache Arrow buffers (#9) #11

test: memoryview of Apache Arrow buffers (#9) #11

maartenbreddels commented Sep 11, 2020 •

edited

Loading

oconnor663 commented Sep 11, 2020

oconnor663 commented Sep 11, 2020

oconnor663 commented Sep 11, 2020

maartenbreddels commented Sep 11, 2020

oconnor663 commented Sep 12, 2020

maartenbreddels commented Sep 12, 2020 via email

oconnor663 commented Sep 12, 2020

test: memoryview of Apache Arrow buffers (#9) #11

test: memoryview of Apache Arrow buffers (#9) #11

Conversation

maartenbreddels commented Sep 11, 2020 • edited Loading

oconnor663 commented Sep 11, 2020

oconnor663 commented Sep 11, 2020

oconnor663 commented Sep 11, 2020

maartenbreddels commented Sep 11, 2020

oconnor663 commented Sep 12, 2020

maartenbreddels commented Sep 12, 2020 via email

oconnor663 commented Sep 12, 2020

maartenbreddels commented Sep 11, 2020 •

edited

Loading