-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Transform custom dataset to deeplake dataset/database/vectorstore conveniently using DDP #2602
Comments
@davidbuniat Thanks. It's really urgent for me. |
Hey @ChawDoe! Thanks for opening the issue. Let us look into whether any of our current workflows will satisfy your use case and we'll get back to you in a few days. |
Thanks! I hope that I have explained my use case clearly.
where auto commit will find the best memory-time trade-off in a for loop. |
@FayazRahman Hi, do you have any updates on this? |
Sorry @ChawDoe, I haven't been able to work on this yet, I will update here as soon as I make any headway regarding this. |
Description
Here is my use case:
I have 4 gpu nodes for training (including compute tensors) on aws.
I want to save pre-computed tensors to deeplake (Dataset/database/vectorstore), aiming to save a lot of time for next training.
I use accelerate as my distributed parallel framework.
So my framework works like this:
Note that I can use deeplake instead of computing the tensors i need again in the next training after the deeplake dataset construction.
The problem includes:
So Is there any feature to transform custom dataset to deeplake dataset?
If we have a function which works like this:
or could you give me a standard workflow to solve this?
I don't know which is the best method for this scenario.
The document did not cover this problem. #2596 also indicates this problem.
Use Cases
Distributed parallel computing and saving to deeplake.
The text was updated successfully, but these errors were encountered: