Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to disable data sampling? #1005

Closed
haozhouamzn opened this issue Jul 28, 2023 · 1 comment
Closed

Is there a way to disable data sampling? #1005

haozhouamzn opened this issue Jul 28, 2023 · 1 comment
Labels
feature request New feature or request

Comments

@haozhouamzn
Copy link

haozhouamzn commented Jul 28, 2023

Is your feature request related to a problem? Please describe.
Looks like with multiple datasets, e.g. in local_setup.yml, each dataset will be sampled certain % of data. For example, with two datasets, and each has 100 samples with weights [1,2]. Then it ends up with 33 samples from dataset A, and 66 samples from dataset B.

Is there a way to keep 100 samples of dataset A and 100 samples of dataset B?

Describe the solution you'd like
A flag or instruction on which codes need to be changed.

Describe alternatives you've considered
Overwrite the ratio in helper.cpp.

Additional context
I have a very large dataset, and I have to partition it into multiple small ones in order to process to mmap files. I want to train the model with all data.

@haozhouamzn haozhouamzn added the feature request New feature or request label Jul 28, 2023
@StellaAthena
Copy link
Member

Is there a reason why using weights of [1, 1] doesn't accomplish what you want?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants