Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run eval harness during training #367

Merged
merged 5 commits into from
Aug 31, 2021
Merged

Run eval harness during training #367

merged 5 commits into from
Aug 31, 2021

Conversation

sdtblck
Copy link
Contributor

@sdtblck sdtblck commented Jun 24, 2021

addresses #366

The generation tasks aren't that fast - so best to stick to the log likelihood tasks (you can add them to the yaml with the "eval_tasks" parameter.

e.g

"eval_tasks": ["lambada", "wikitext", "piqa"],

@sdtblck sdtblck requested a review from a team as a code owner June 24, 2021 19:35
Copy link
Member

@StellaAthena StellaAthena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it's locking on piqa. We need to check out the HF dataset builder internals and make sure it's thread safe.

@StellaAthena StellaAthena linked an issue Jun 24, 2021 that may be closed by this pull request
@StellaAthena
Copy link
Member

StellaAthena commented Jul 11, 2021

Notes on this PR's behavior:

  1. Running just lambada works fine
  2. Running just piqa works fine
  3. Running both results in the code freezing as show below, regardless of which is listed first
Running loglikelihood requests
  0%|          | 0.00/1.82M [00:00<?, ?byte/s]Running loglikelihood requests
Running loglikelihood requests
100%|##########| 1.82M/1.82M [00:01<00:00, 1.79Mbyte/s]File downloaded. Checksum: 4aa8d02cd17c719165fc8a7887fddd641f43fcafa4b1c806ca8abc31fabdb226

Running loglikelihood requests
 42%|####1     | 3668/8827 [00:40<00:36, 139.70it/s]

@StellaAthena
Copy link
Member

mathqa + piqa has been running for over 8 hours without any problems. I’m wondering if the core problem is about mixing HF and non-HF tasks

@sdtblck
Copy link
Contributor Author

sdtblck commented Aug 31, 2021

Hey @StellaAthena can you test out the above change, and let me know if it makes a difference?
I try to pre-download all the tasks on local rank 0, to get rid of any multithreading problems

@StellaAthena
Copy link
Member

Hey @StellaAthena can you test out the above change, and let me know if it makes a difference?
I try to pre-download all the tasks on local rank 0, to get rid of any multithreading problems

It's on today's TODO list :)

@sdtblck
Copy link
Contributor Author

sdtblck commented Aug 31, 2021

Just some peace of mind for @StellaAthena that this is definitely working :)
Screenshot from 2021-08-31 16-36-22
Screenshot from 2021-08-31 16-38-07

@StellaAthena StellaAthena merged commit 069f856 into main Aug 31, 2021
@StellaAthena StellaAthena deleted the eval_harness_update branch August 31, 2021 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Eval Harness doesn't log during training
3 participants