-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to create a small sample of 1000 train and 100 using MultilabelStratifiedShuffleSplit #15
Comments
meltedhead, Thank you for catching this bug. I do not think I ever tested with train_size set to a value other than None. As a workaround, you could do the following:
|
Hi there, I don't know if it helps, but I can see the same in that case with only test_size:
The above prints:
but I expected:
|
Ah, just read that in the doc of
Knowing that the above case should be very well distributed, I wonder if an acceptable solution with the given test size is that uncommon |
Hi trent-b:
Thanks for this repository, hope you can help with my issue. I have a large json data set that i want to use MultilabelStratifiedShuffleSplit to create a smaller sample set.
i then call the function as :
train_idx, test_idx = mlb_train_test_split(labels, test_size=1000 train_size=200, random_state=0)
When i look at the numbers I'm seeing way more than 200 rows. Is there a limitation? The labels length is approximately 500,000 in the dataset.
The text was updated successfully, but these errors were encountered: