-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ship plasma store with Ray #7901
Conversation
Can one of the admins verify this patch? |
Test FAILed. |
2a86a91
to
86a546f
Compare
Test PASSed. |
Test FAILed. |
Test FAILed. |
Test PASSed. |
Test FAILed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
Test FAILed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
Test FAILed. |
Test FAILed. |
Test PASSed. |
Test FAILed. |
Test PASSed. |
Test PASSed. |
Test FAILed. |
Can I ask what is the motivation for this? And also, what is the eventual goal for the Plasma store that's inside Arrow? Will it evolve in parallel with this one, or is it planned to be removed? (CC @pcmoritz) If Arrow's copy isn't getting removed once this PR is merged, I feel like we should consider whether it might make sense to keep the patches separate, so that they're easier to maintain, since it'll let us just pull the changes from upstream. Otherwise the code will diverge from upstream and it'll become difficult to reconcile changes between the two. (For what it's worth, I was hoping to upstream some of the patches, but only after I was finished with the initial part of the Windows port, so that I could e.g. be sure they wouldn't need further modifications.) |
Test PASSed. |
Test PASSed. |
@mehrdadn I think the final goal is to disentangle with arrow. With this PR, we have shipped the plasma store with Ray, and windows patches are applied. |
Test FAILed. |
Test FAILed. |
e7ea882
to
2b47bc6
Compare
Test FAILed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
A quick question: For the future, how will we maintain the changes in Arrow community? |
Same question here. What is the purpose of doing this? Will it be maintained separately? |
@winningsix @ConeyLiu For now we will develop the Plasma Store separately in Ray. This is mainly due to performance and architecture issues. For example, Plasma Store still computes digests when sealing objects, while this is no longer necessary for Ray (#8980); and a standalone Plasma Store is becoming the IPC bottleneck for Ray, so now we are moving forward to contain it in a thread. Also, the upstream Plasma Store is lack of windows support because it is written in a pure unix style, by moving it under Ray, we no longer need to maintain these patches, and we can quickly fix compatibility issues. All these changes break the Plasma Store interface (or the dependencies), so we can no longer sync with the Arrow upstream. However, we are still willing to contribute to other open source projects. If we find the new object store in Ray very useful for common users, we could upload it to Arrow (just like how we originally did for Plasma) or as an independent project. |
@suquark thanks for the answer. |
@suquark I am thinking it's not a good practice to have a self-maintained component.
I would like to bring this into further discussion whether we can maintain a same code base for Plasma. Like digest compute, it's required for general object store. But It's under Ray or our internal use case (we use Plasma as Spark data source), we can disable it for better performance. I think the Arrow version is maintainable by introducing a new configuration for that. We did have a plan for that. For API changes, that could be a problem. In fact, we introduce a new set of async API based on your boost change in Arrow. We're thinking introducing a new client protocol to support API backward compatibility. Any other cases not covered we need to address? cc @jikunshang |
@winningsix thanks for your response! Somehow I didn't receive notifications from the closed PR. maybe we can move our conversation to Ray Slack if it is convenient for you, and we can also discuss with other ray maintainers. I am sure that Ray will still contain a different plasma store in the future (and the final goal is total integrating Plasma into Ray as a new distributed store). But we can still contribute to the Arrow Plasma Store upstream based on our experience of Plasma Store and share as much components as possible. |
@suquark Yes, let's move to Slack for boarder discussion. Hope we can develop a good practice balancing keeping the backward compatibility and new features. |
The current documentation states that Ray uses Arrow Plasma Object Store, which is incorrect for ~ 2 years now. This PR corrects this, and adds a short historical note on Ray's in-memory object store and Arrow's plasma. Note that as Arrow's Plasma will be deprecated in Arrow's version 10.0.0, this mistake is even more so misleading. A related discussion for the "Arrow's fork into ray" can be found in #7901 I'd be happy if you can suggest a more accurate description of the historical note added to the docs in this PR. Signed-off-by: harelos <[email protected]> Signed-off-by: harelos <[email protected]>
The current documentation states that Ray uses Arrow Plasma Object Store, which is incorrect for ~ 2 years now. This PR corrects this, and adds a short historical note on Ray's in-memory object store and Arrow's plasma. Note that as Arrow's Plasma will be deprecated in Arrow's version 10.0.0, this mistake is even more so misleading. A related discussion for the "Arrow's fork into ray" can be found in ray-project#7901 I'd be happy if you can suggest a more accurate description of the historical note added to the docs in this PR. Signed-off-by: harelos <[email protected]> Signed-off-by: harelos <[email protected]> Signed-off-by: Capiru <[email protected]>
The current documentation states that Ray uses Arrow Plasma Object Store, which is incorrect for ~ 2 years now. This PR corrects this, and adds a short historical note on Ray's in-memory object store and Arrow's plasma. Note that as Arrow's Plasma will be deprecated in Arrow's version 10.0.0, this mistake is even more so misleading. A related discussion for the "Arrow's fork into ray" can be found in #7901 I'd be happy if you can suggest a more accurate description of the historical note added to the docs in this PR. Signed-off-by: harelos <[email protected]> Signed-off-by: harelos <[email protected]>
The current documentation states that Ray uses Arrow Plasma Object Store, which is incorrect for ~ 2 years now. This PR corrects this, and adds a short historical note on Ray's in-memory object store and Arrow's plasma. Note that as Arrow's Plasma will be deprecated in Arrow's version 10.0.0, this mistake is even more so misleading. A related discussion for the "Arrow's fork into ray" can be found in ray-project#7901 I'd be happy if you can suggest a more accurate description of the historical note added to the docs in this PR. Signed-off-by: harelos <[email protected]> Signed-off-by: harelos <[email protected]> Signed-off-by: tmynn <[email protected]>
The current documentation states that Ray uses Arrow Plasma Object Store, which is incorrect for ~ 2 years now. This PR corrects this, and adds a short historical note on Ray's in-memory object store and Arrow's plasma. Note that as Arrow's Plasma will be deprecated in Arrow's version 10.0.0, this mistake is even more so misleading. A related discussion for the "Arrow's fork into ray" can be found in ray-project#7901 I'd be happy if you can suggest a more accurate description of the historical note added to the docs in this PR. Signed-off-by: harelos <[email protected]> Signed-off-by: harelos <[email protected]> Signed-off-by: tmynn <[email protected]>
The current documentation states that Ray uses Arrow Plasma Object Store, which is incorrect for ~ 2 years now. This PR corrects this, and adds a short historical note on Ray's in-memory object store and Arrow's plasma. Note that as Arrow's Plasma will be deprecated in Arrow's version 10.0.0, this mistake is even more so misleading. A related discussion for the "Arrow's fork into ray" can be found in ray-project#7901 I'd be happy if you can suggest a more accurate description of the historical note added to the docs in this PR. Signed-off-by: harelos <[email protected]> Signed-off-by: harelos <[email protected]> Signed-off-by: tmynn <[email protected]>
Why are these changes needed?
This PR copies files from the arrow/plasma upstream, and windows patches are applied to them.
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.