Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ship plasma store with Ray #7901

Merged
merged 7 commits into from
Jun 4, 2020
Merged

Conversation

suquark
Copy link
Member

@suquark suquark commented Apr 5, 2020

Why are these changes needed?

This PR copies files from the arrow/plasma upstream, and windows patches are applied to them.

Related issue number

Checks

@suquark suquark mentioned this pull request Apr 5, 2020
6 tasks
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24255/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24261/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24264/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24266/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24269/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24277/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24278/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24279/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24280/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24281/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24282/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24283/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24284/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24286/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24310/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24311/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24323/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24324/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24326/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24329/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24327/
Test FAILed.

@mehrdadn
Copy link
Contributor

mehrdadn commented Apr 7, 2020

Can I ask what is the motivation for this? And also, what is the eventual goal for the Plasma store that's inside Arrow? Will it evolve in parallel with this one, or is it planned to be removed? (CC @pcmoritz)

If Arrow's copy isn't getting removed once this PR is merged, I feel like we should consider whether it might make sense to keep the patches separate, so that they're easier to maintain, since it'll let us just pull the changes from upstream. Otherwise the code will diverge from upstream and it'll become difficult to reconcile changes between the two.

(For what it's worth, I was hoping to upstream some of the patches, but only after I was finished with the initial part of the Windows port, so that I could e.g. be sure they wouldn't need further modifications.)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24336/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24343/
Test PASSed.

@suquark
Copy link
Member Author

suquark commented Apr 7, 2020

@mehrdadn I think the final goal is to disentangle with arrow. With this PR, we have shipped the plasma store with Ray, and windows patches are applied.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24368/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24393/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/26621/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/26622/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/26623/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/26653/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/26678/
Test PASSed.

@ericl ericl merged commit ea05ebe into ray-project:master Jun 4, 2020
@winningsix
Copy link

A quick question: For the future, how will we maintain the changes in Arrow community?

@ConeyLiu
Copy link
Contributor

Same question here. What is the purpose of doing this? Will it be maintained separately?

@suquark
Copy link
Member Author

suquark commented Jun 18, 2020

@winningsix @ConeyLiu For now we will develop the Plasma Store separately in Ray. This is mainly due to performance and architecture issues. For example, Plasma Store still computes digests when sealing objects, while this is no longer necessary for Ray (#8980); and a standalone Plasma Store is becoming the IPC bottleneck for Ray, so now we are moving forward to contain it in a thread. Also, the upstream Plasma Store is lack of windows support because it is written in a pure unix style, by moving it under Ray, we no longer need to maintain these patches, and we can quickly fix compatibility issues.

All these changes break the Plasma Store interface (or the dependencies), so we can no longer sync with the Arrow upstream. However, we are still willing to contribute to other open source projects. If we find the new object store in Ray very useful for common users, we could upload it to Arrow (just like how we originally did for Plasma) or as an independent project.

@ConeyLiu
Copy link
Contributor

@suquark thanks for the answer.

@winningsix
Copy link

@suquark I am thinking it's not a good practice to have a self-maintained component.

  1. Plasma is a very important component to Ray.
  2. Arrow Plasma is not used by many other components.
    Code diverge is not good for both communities.

I would like to bring this into further discussion whether we can maintain a same code base for Plasma.

Like digest compute, it's required for general object store. But It's under Ray or our internal use case (we use Plasma as Spark data source), we can disable it for better performance. I think the Arrow version is maintainable by introducing a new configuration for that. We did have a plan for that.

For API changes, that could be a problem. In fact, we introduce a new set of async API based on your boost change in Arrow. We're thinking introducing a new client protocol to support API backward compatibility.

Any other cases not covered we need to address?

cc @jikunshang

@suquark
Copy link
Member Author

suquark commented Jun 25, 2020

@winningsix thanks for your response! Somehow I didn't receive notifications from the closed PR. maybe we can move our conversation to Ray Slack if it is convenient for you, and we can also discuss with other ray maintainers.

I am sure that Ray will still contain a different plasma store in the future (and the final goal is total integrating Plasma into Ray as a new distributed store). But we can still contribute to the Arrow Plasma Store upstream based on our experience of Plasma Store and share as much components as possible.

@winningsix
Copy link

@suquark Yes, let's move to Slack for boarder discussion. Hope we can develop a good practice balancing keeping the backward compatibility and new features.

@suquark suquark deleted the copy_plasma_files branch July 9, 2020 18:29
pcmoritz pushed a commit that referenced this pull request Dec 17, 2022
The current documentation states that Ray uses Arrow Plasma Object Store, which is incorrect for ~ 2 years now.

This PR corrects this, and adds a short historical note on Ray's in-memory object store and Arrow's plasma.

Note that as Arrow's Plasma will be deprecated in Arrow's version 10.0.0, this mistake is even more so misleading.

A related discussion for the "Arrow's fork into ray" can be found in #7901

I'd be happy if you can suggest a more accurate description of the historical note added to the docs in this PR.

Signed-off-by: harelos <[email protected]>

Signed-off-by: harelos <[email protected]>
Capiru pushed a commit to Capiru/ray that referenced this pull request Dec 22, 2022
The current documentation states that Ray uses Arrow Plasma Object Store, which is incorrect for ~ 2 years now.

This PR corrects this, and adds a short historical note on Ray's in-memory object store and Arrow's plasma.

Note that as Arrow's Plasma will be deprecated in Arrow's version 10.0.0, this mistake is even more so misleading.

A related discussion for the "Arrow's fork into ray" can be found in ray-project#7901

I'd be happy if you can suggest a more accurate description of the historical note added to the docs in this PR.

Signed-off-by: harelos <[email protected]>

Signed-off-by: harelos <[email protected]>
Signed-off-by: Capiru <[email protected]>
AmeerHajAli pushed a commit that referenced this pull request Jan 12, 2023
The current documentation states that Ray uses Arrow Plasma Object Store, which is incorrect for ~ 2 years now.

This PR corrects this, and adds a short historical note on Ray's in-memory object store and Arrow's plasma.

Note that as Arrow's Plasma will be deprecated in Arrow's version 10.0.0, this mistake is even more so misleading.

A related discussion for the "Arrow's fork into ray" can be found in #7901

I'd be happy if you can suggest a more accurate description of the historical note added to the docs in this PR.

Signed-off-by: harelos <[email protected]>

Signed-off-by: harelos <[email protected]>
tamohannes pushed a commit to ju2ez/ray that referenced this pull request Jan 16, 2023
The current documentation states that Ray uses Arrow Plasma Object Store, which is incorrect for ~ 2 years now.

This PR corrects this, and adds a short historical note on Ray's in-memory object store and Arrow's plasma.

Note that as Arrow's Plasma will be deprecated in Arrow's version 10.0.0, this mistake is even more so misleading.

A related discussion for the "Arrow's fork into ray" can be found in ray-project#7901

I'd be happy if you can suggest a more accurate description of the historical note added to the docs in this PR.

Signed-off-by: harelos <[email protected]>

Signed-off-by: harelos <[email protected]>
Signed-off-by: tmynn <[email protected]>
tamohannes pushed a commit to ju2ez/ray that referenced this pull request Jan 25, 2023
The current documentation states that Ray uses Arrow Plasma Object Store, which is incorrect for ~ 2 years now.

This PR corrects this, and adds a short historical note on Ray's in-memory object store and Arrow's plasma.

Note that as Arrow's Plasma will be deprecated in Arrow's version 10.0.0, this mistake is even more so misleading.

A related discussion for the "Arrow's fork into ray" can be found in ray-project#7901

I'd be happy if you can suggest a more accurate description of the historical note added to the docs in this PR.

Signed-off-by: harelos <[email protected]>

Signed-off-by: harelos <[email protected]>
Signed-off-by: tmynn <[email protected]>
tamohannes pushed a commit to ju2ez/ray that referenced this pull request Jan 25, 2023
The current documentation states that Ray uses Arrow Plasma Object Store, which is incorrect for ~ 2 years now.

This PR corrects this, and adds a short historical note on Ray's in-memory object store and Arrow's plasma.

Note that as Arrow's Plasma will be deprecated in Arrow's version 10.0.0, this mistake is even more so misleading.

A related discussion for the "Arrow's fork into ray" can be found in ray-project#7901

I'd be happy if you can suggest a more accurate description of the historical note added to the docs in this PR.

Signed-off-by: harelos <[email protected]>

Signed-off-by: harelos <[email protected]>
Signed-off-by: tmynn <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants