Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modest improvement to FixedLenByteArray BYTE_STREAM_SPLIT arrow decoder #6222

Merged
merged 4 commits into from
Aug 13, 2024

Conversation

etseidl
Copy link
Contributor

@etseidl etseidl commented Aug 9, 2024

Which issue does this PR close?

Related to #6219.

Rationale for this change

When this decoder was added, little thought was given to performance. The changes here have an impact that varies by architecture.

Old 2.2GHz Core i7

group                                                                                                   master                                 opt_read
-----                                                                                                   ------                                 --------
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                        1.00      2.2±0.04ms        ? ?/sec    1.03      2.3±0.48ms        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                       1.00      2.3±0.04ms        ? ?/sec    1.01      2.3±0.04ms        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                         1.00      2.2±0.09ms        ? ?/sec    1.00      2.2±0.04ms        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs     1.74      3.3±0.07ms        ? ?/sec    1.00  1904.9±45.03µs        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Decimal128Array/byte_stream_split encoded, optional, half NULLs    1.29      3.1±0.05ms        ? ?/sec    1.00      2.4±0.05ms        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Decimal128Array/byte_stream_split encoded, optional, no NULLs      1.72      3.3±0.06ms        ? ?/sec    1.00  1923.3±37.33µs        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Float16Array/byte_stream_split encoded, mandatory, no NULLs        1.28   831.4±14.84µs        ? ?/sec    1.00   651.8±13.35µs        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Float16Array/byte_stream_split encoded, optional, half NULLs       1.12  1473.3±33.07µs        ? ?/sec    1.00  1314.2±26.78µs        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Float16Array/byte_stream_split encoded, optional, no NULLs         1.27   841.0±17.96µs        ? ?/sec    1.00   659.7±13.25µs        ? ?/sec
arrow_array_reader/FIXED_LENGTH_BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs           1.00  1027.5±24.48µs        ? ?/sec    1.00  1028.8±24.90µs        ? ?/sec
arrow_array_reader/FIXED_LENGTH_BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs          1.00  1927.8±45.32µs        ? ?/sec    1.00  1927.2±38.97µs        ? ?/sec
arrow_array_reader/FIXED_LENGTH_BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs            1.00  1034.4±27.58µs        ? ?/sec    1.00  1034.2±32.82µs        ? ?/sec

newer 3.6GHz Core i7-12700K

group                                                                                                   master                                 opt_read
-----                                                                                                   ------                                 --------
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                        1.00    873.0±8.77µs        ? ?/sec    1.00    875.9±5.62µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                       1.00   1039.7±7.19µs        ? ?/sec    1.01   1048.3±6.85µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                         1.00    874.6±4.54µs        ? ?/sec    1.00    878.5±9.08µs        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs     1.30   963.5±10.85µs        ? ?/sec    1.00    743.8±3.88µs        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Decimal128Array/byte_stream_split encoded, optional, half NULLs    1.10  1216.2±16.67µs        ? ?/sec    1.00   1103.1±6.30µs        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Decimal128Array/byte_stream_split encoded, optional, no NULLs      1.29    966.9±7.71µs        ? ?/sec    1.00    747.5±3.44µs        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Float16Array/byte_stream_split encoded, mandatory, no NULLs        1.36    347.3±2.89µs        ? ?/sec    1.00    255.5±2.11µs        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Float16Array/byte_stream_split encoded, optional, half NULLs       1.04    684.2±2.43µs        ? ?/sec    1.00    655.9±4.87µs        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Float16Array/byte_stream_split encoded, optional, no NULLs         1.37    347.6±5.69µs        ? ?/sec    1.00    254.5±1.07µs        ? ?/sec
arrow_array_reader/FIXED_LENGTH_BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs           1.01    367.8±5.65µs        ? ?/sec    1.00    365.7±2.58µs        ? ?/sec
arrow_array_reader/FIXED_LENGTH_BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs          1.01    927.8±8.27µs        ? ?/sec    1.00    919.6±4.94µs        ? ?/sec
arrow_array_reader/FIXED_LENGTH_BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs            1.00    373.1±3.23µs        ? ?/sec    1.00    373.0±5.44µs        ? ?/sec

What changes are included in this PR?

Replaces the use of Vec::push with direct access via slices.

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet Changes to the parquet crate label Aug 9, 2024
@etseidl
Copy link
Contributor Author

etseidl commented Aug 11, 2024

Updated numbers.

2.2GHz Core i7

arrow_array_reader/BYTE_STREAM_SPLIT/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs     1.99      3.3±0.07ms        ? ?/sec    1.00  1669.6±32.91µs        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Decimal128Array/byte_stream_split encoded, optional, half NULLs    1.37      3.1±0.05ms        ? ?/sec    1.00      2.2±0.02ms        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Decimal128Array/byte_stream_split encoded, optional, no NULLs      1.97      3.3±0.06ms        ? ?/sec    1.00  1679.6±34.58µs        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Float16Array/byte_stream_split encoded, mandatory, no NULLs        1.34   831.4±14.84µs        ? ?/sec    1.00   619.9±11.99µs        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Float16Array/byte_stream_split encoded, optional, half NULLs       1.09  1473.3±33.07µs        ? ?/sec    1.00  1356.3±33.21µs        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Float16Array/byte_stream_split encoded, optional, no NULLs         1.33   841.0±17.96µs        ? ?/sec    1.00   631.5±16.27µs        ? ?/sec

3.6GHz

arrow_array_reader/BYTE_STREAM_SPLIT/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs     1.54   963.5±10.85µs        ? ?/sec    1.00    626.7±4.82µs        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Decimal128Array/byte_stream_split encoded, optional, half NULLs    1.16  1216.2±16.67µs        ? ?/sec    1.00   1049.0±6.41µs        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Decimal128Array/byte_stream_split encoded, optional, no NULLs      1.53    966.9±7.71µs        ? ?/sec    1.00    630.4±3.76µs        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Float16Array/byte_stream_split encoded, mandatory, no NULLs        1.45    347.3±2.89µs        ? ?/sec    1.00    239.8±2.33µs        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Float16Array/byte_stream_split encoded, optional, half NULLs       1.03    684.2±2.43µs        ? ?/sec    1.00    662.6±3.21µs        ? ?/sec
arrow_array_reader/BYTE_STREAM_SPLIT/Float16Array/byte_stream_split encoded, optional, no NULLs         1.47    347.6±5.69µs        ? ?/sec    1.00    236.0±1.40µs        ? ?/sec

Copy link
Contributor

@wiedld wiedld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to avoid bounds checking.
👍🏼 benchmarked results.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @etseidl and @wiedld for the review

@alamb alamb merged commit c1b3d98 into apache:master Aug 13, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants