Always use Unsafe as the default PinotBufferFactory #13639

gortiz · 2024-07-17T10:49:07Z

We have been using UnsafePinotBufferFactory for a while with no regression detected. I think it is a good time to start working on the plan to remove LArray dependency (see #12810)

This PR changes the default PinotBufferFactory used in Pinot. Before we were using LArrayPinotBufferFactory on Java < 16 but give LArray doesn't work in Java >= 16, we used UnsafePinotBufferFactory in newer JVMs. With this PR, we always use UnsafePinotBufferFactory.

Users that want to still use LArrayPinotBufferFactory can do that by providing the following config to Pinot:

pinot.offheap.buffer.factory=org.apache.pinot.segment.spi.memory.LArrayPinotBufferFactory

Remember that this is not recommended given LArray implementation has been proved to be incorrect several times (see for example https://github.com/apache/pinot/pull/10774).\

A side effect of this PR is that Pinot should be able to be executed (and probably compiled) in Mac computers using aarch64 (like M1, M2, etc) without having to enable Rosetta. We need to open a parallel PR to change this doc page and probably this one. There are still some issues in ARMs (AFAIR CLP lib does not include a binary for Mac + arm64, @suddendust may know more about that) and maybe other build pages that may not be updated.

codecov-commenter · 2024-07-17T11:35:48Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.99%. Comparing base (59551e4) to head (93a2108).
Report is 782 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #13639      +/-   ##
============================================
+ Coverage     61.75%   61.99%   +0.24%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2553     +117     
  Lines        133233   140468    +7235     
  Branches      20636    21849    +1213     
============================================
+ Hits          82274    87085    +4811     
- Misses        44911    46749    +1838     
- Partials       6048     6634     +586

Flag	Coverage Δ
custom-integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration	`<0.01% <0.00%> (-0.01%)`	⬇️
integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration2	`0.00% <0.00%> (ø)`
java-11	`61.91% <100.00%> (+0.20%)`	⬆️
java-21	`61.88% <100.00%> (+0.26%)`	⬆️
skip-bytebuffers-false	`61.97% <100.00%> (+0.22%)`	⬆️
skip-bytebuffers-true	`61.83% <100.00%> (+34.10%)`	⬆️
temurin	`61.99% <100.00%> (+0.24%)`	⬆️
unittests	`61.99% <100.00%> (+0.24%)`	⬆️
unittests1	`46.45% <100.00%> (-0.44%)`	⬇️
unittests2	`27.76% <0.00%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

siddharthteotia · 2024-07-18T20:47:03Z

@gortiz - how will this work for existing segments in production that currently LBuffer writer and therefore depend on the corresponding reader ?

I see that in #13545 you mentioned about incrementally working towards deleting LArray starting by making the new implementation as default one.

I understand that new segments will pick up the new writer (Unsafe based). But how do we take care of rebuilding existing segments ? We need to have an approach that will work for production users. Otherwise we will always have to keep the Larray reader code around.

May be we should do something like we did for segment v2 -> v3 format migration? IIRC, during pre-processor we detected V2 and moved to V3.

Thoughts @vvivekiyer @Jackie-Jiang ?

gortiz · 2024-07-19T05:30:13Z

Good question. The answer is... it doesn't matter 😄 .

The library used to implement buffers does not require or modify the buffer on disk. I mean, a segment might have been created using LArray and can be read using Unsafe and the other way around. These libraries do not modify the content on disk in any way. They just change the jvm/system calls used to load the content on memory.

Basically LArray uses some JNI+sun.misc.Unsafe calls to allocate and load memory while Unsafe just use sun.misc.Unsafe. But in the end both libraries end up mmaping the content of a file indicated with an offset and length.

vvivekiyer

@siddharthteotia Verified that this will not cause any issues for us. We already have many hosts running java 17 and using Unsafe.

gortiz requested review from siddharthteotia and Jackie-Jiang July 17, 2024 10:49

This was referenced Jul 17, 2024

Remove larray #13545

Closed

Remove LArray #12810

Open

Always use Unsafe as the default PinotBufferFactory

93a2108

gortiz force-pushed the disable-larray-by-default branch from e8159fe to 93a2108 Compare July 18, 2024 08:36

gortiz mentioned this pull request Jul 18, 2024

mvn clean install fail on Mac with M1 chip #10945

Closed

siddharthteotia requested a review from vvivekiyer July 18, 2024 20:47

vvivekiyer approved these changes Jul 22, 2024

View reviewed changes

gortiz merged commit 0d33790 into apache:master Jul 29, 2024
19 of 20 checks passed

gortiz deleted the disable-larray-by-default branch July 29, 2024 05:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always use Unsafe as the default PinotBufferFactory #13639

Always use Unsafe as the default PinotBufferFactory #13639

gortiz commented Jul 17, 2024

codecov-commenter commented Jul 17, 2024 •

edited

Loading

siddharthteotia commented Jul 18, 2024

gortiz commented Jul 19, 2024

vvivekiyer left a comment

Always use Unsafe as the default PinotBufferFactory #13639

Always use Unsafe as the default PinotBufferFactory #13639

Conversation

gortiz commented Jul 17, 2024

codecov-commenter commented Jul 17, 2024 • edited Loading

Codecov Report

siddharthteotia commented Jul 18, 2024

gortiz commented Jul 19, 2024

vvivekiyer left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jul 17, 2024 •

edited

Loading