Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update gpt2_dataset.py #974

Merged
merged 2 commits into from
Jun 22, 2023
Merged

Update gpt2_dataset.py #974

merged 2 commits into from
Jun 22, 2023

Conversation

StellaAthena
Copy link
Member

@StellaAthena StellaAthena commented Jun 14, 2023

Fixes #972

I have checked that it runs in one example and that the code appears to be mathematically correct. I also checked out the current main branch of Megatron-DS, but it has a substantially different structure now and doesn't have this assertion check at all. The comments on both our version and theirs implies that this change is correct.

@Quentin-Anthony Quentin-Anthony merged commit 2534e3d into main Jun 22, 2023
2 checks passed
@Quentin-Anthony Quentin-Anthony deleted the StellaAthena-patch-4 branch June 22, 2023 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WARNING: shuffle index length is not equal to sample index length
2 participants