Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always use Unsafe as the default PinotBufferFactory #13639

Merged
merged 1 commit into from
Jul 29, 2024

Conversation

gortiz
Copy link
Contributor

@gortiz gortiz commented Jul 17, 2024

We have been using UnsafePinotBufferFactory for a while with no regression detected. I think it is a good time to start working on the plan to remove LArray dependency (see #12810)

This PR changes the default PinotBufferFactory used in Pinot. Before we were using LArrayPinotBufferFactory on Java < 16 but give LArray doesn't work in Java >= 16, we used UnsafePinotBufferFactory in newer JVMs. With this PR, we always use UnsafePinotBufferFactory.

Users that want to still use LArrayPinotBufferFactory can do that by providing the following config to Pinot:

pinot.offheap.buffer.factory=org.apache.pinot.segment.spi.memory.LArrayPinotBufferFactory

Remember that this is not recommended given LArray implementation has been proved to be incorrect several times (see for example https://github.com/apache/pinot/pull/10774).\

A side effect of this PR is that Pinot should be able to be executed (and probably compiled) in Mac computers using aarch64 (like M1, M2, etc) without having to enable Rosetta. We need to open a parallel PR to change this doc page and probably this one. There are still some issues in ARMs (AFAIR CLP lib does not include a binary for Mac + arm64, @suddendust may know more about that) and maybe other build pages that may not be updated.

This was referenced Jul 17, 2024
@codecov-commenter
Copy link

codecov-commenter commented Jul 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.99%. Comparing base (59551e4) to head (93a2108).
Report is 782 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13639      +/-   ##
============================================
+ Coverage     61.75%   61.99%   +0.24%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2553     +117     
  Lines        133233   140468    +7235     
  Branches      20636    21849    +1213     
============================================
+ Hits          82274    87085    +4811     
- Misses        44911    46749    +1838     
- Partials       6048     6634     +586     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.91% <100.00%> (+0.20%) ⬆️
java-21 61.88% <100.00%> (+0.26%) ⬆️
skip-bytebuffers-false 61.97% <100.00%> (+0.22%) ⬆️
skip-bytebuffers-true 61.83% <100.00%> (+34.10%) ⬆️
temurin 61.99% <100.00%> (+0.24%) ⬆️
unittests 61.99% <100.00%> (+0.24%) ⬆️
unittests1 46.45% <100.00%> (-0.44%) ⬇️
unittests2 27.76% <0.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@siddharthteotia
Copy link
Contributor

@gortiz - how will this work for existing segments in production that currently LBuffer writer and therefore depend on the corresponding reader ?

I see that in #13545 you mentioned about incrementally working towards deleting LArray starting by making the new implementation as default one.

I understand that new segments will pick up the new writer (Unsafe based). But how do we take care of rebuilding existing segments ? We need to have an approach that will work for production users. Otherwise we will always have to keep the Larray reader code around.

May be we should do something like we did for segment v2 -> v3 format migration? IIRC, during pre-processor we detected V2 and moved to V3.

Thoughts @vvivekiyer @Jackie-Jiang ?

@gortiz
Copy link
Contributor Author

gortiz commented Jul 19, 2024

Good question. The answer is... it doesn't matter 😄 .

The library used to implement buffers does not require or modify the buffer on disk. I mean, a segment might have been created using LArray and can be read using Unsafe and the other way around. These libraries do not modify the content on disk in any way. They just change the jvm/system calls used to load the content on memory.

Basically LArray uses some JNI+sun.misc.Unsafe calls to allocate and load memory while Unsafe just use sun.misc.Unsafe. But in the end both libraries end up mmaping the content of a file indicated with an offset and length.

Copy link
Contributor

@vvivekiyer vvivekiyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddharthteotia Verified that this will not cause any issues for us. We already have many hosts running java 17 and using Unsafe.

@gortiz gortiz merged commit 0d33790 into apache:master Jul 29, 2024
19 of 20 checks passed
@gortiz gortiz deleted the disable-larray-by-default branch July 29, 2024 05:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants