Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Define DataSet API to allow users to specify more options of data input for training #157

Open
hongchaodeng opened this issue Aug 2, 2021 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@hongchaodeng
Copy link

Currently, KubeDL workloads requires users to write the PVC and volumeMounts config to consume the source dataset. This also implies that users would need to put dataset into a PV first. This incurs a very heavy overhead -- especially for users who might not use the same infrastructure to produce the dataset. For example, users who did created the dataset locally might not know how to put the data to a PV in a Cloud managed k8s cluster.

To solve this problem and improve user experience, we should define a DataSet API to allow users to specify more options of data input for training -- S3 buckets, NAS Storage, HTTP file server, etc. KubeDL controllers should be able to handle the creation of PV and those k8s internals to store and transfer data around under the hood.

@hongchaodeng
Copy link
Author

/assign @SimonCqk

@SimonCqk SimonCqk added the enhancement New feature or request label Aug 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants