Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataload fix #979

Merged
merged 14 commits into from
Jul 6, 2023
Merged

Dataload fix #979

merged 14 commits into from
Jul 6, 2023

Conversation

jahatef
Copy link
Collaborator

@jahatef jahatef commented Jun 22, 2023

In #958

https://github.com/EleutherAI/gpt-neox/pull/979/files#diff-e2b248f8c422a601bcb0b7d93f96c1dff070f2694737e2b69f1def64ab9c1844L276

When data = None, _keys = ["text", "label"]

This led to access errors:

https://github.com/EleutherAI/gpt-neox/pull/979/files#diff-e2b248f8c422a601bcb0b7d93f96c1dff070f2694737e2b69f1def64ab9c1844R280-R287

where data_b does not have any values in the "label" key.

Instead,

https://github.com/EleutherAI/gpt-neox/pull/979/files#diff-e2b248f8c422a601bcb0b7d93f96c1dff070f2694737e2b69f1def64ab9c1844R306

was placed to only access the "label" key if neox_args.label_data_paths is set from the config file, which is the config determining if a labeled dataset is used.

@CLAassistant
Copy link

CLAassistant commented Jun 22, 2023

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ StellaAthena
✅ Quentin-Anthony
❌ Quentin TastyRice


Quentin TastyRice seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

configs/1-3B.yml Outdated Show resolved Hide resolved
@StellaAthena
Copy link
Member

@Quentin-Anthony I have made the requested changes. I'm finding that training hangs in mulit-node settings with tp > 1, as reported in #985. However I see the same on main and don't think this PR is to blame.

Copy link
Member

@StellaAthena StellaAthena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After applying this fix to both nodes it now works. Whoops xD

@StellaAthena StellaAthena merged commit 700219b into main Jul 6, 2023
1 of 2 checks passed
@StellaAthena StellaAthena deleted the dataloadFix branch July 6, 2023 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants