Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inverse Scaling Tasks? #1442

Open
RylanSchaeffer opened this issue Feb 18, 2024 · 11 comments · May be fixed by #1589
Open

Inverse Scaling Tasks? #1442

RylanSchaeffer opened this issue Feb 18, 2024 · 11 comments · May be fixed by #1589
Assignees
Labels
feature request A feature that isn't implemented yet. good first issue Good for newcomers help wanted Contributors and extra help welcome.

Comments

@RylanSchaeffer
Copy link
Contributor

Apologies if this has been asked before, but I couldn't find the answer in lm_evals/tasks or any issues. Are there plans to add Inverse Scaling (https://github.com/inverse-scaling/prize) into the lm-evaluation-harness?

@haileyschoelkopf haileyschoelkopf added help wanted Contributors and extra help welcome. feature request A feature that isn't implemented yet. good first issue Good for newcomers labels Feb 19, 2024
@haileyschoelkopf
Copy link
Contributor

Hasn't been asked before!

Supporting these tasks as originally implemented would be very nice! We ourselves probably won't have the bandwidth for it soon, but if anyone wishes to contribute them we'd be happy to assist and review.

@h-albert-lee
Copy link
Contributor

That implementation looks interesting, do you mind if I try it?

@haileyschoelkopf
Copy link
Contributor

Yes, that'd be fantastic if you're interested!

@h-albert-lee
Copy link
Contributor

Thank you for assigning. I'll get to work soon!

@h-albert-lee
Copy link
Contributor

To address any possible issues, I'm currently asking the inverse scaling slack if it's okay to implement these tasks. I will start implementing them as soon as they are approved.

@h-albert-lee
Copy link
Contributor

@RylanSchaeffer @haileyschoelkopf The initial implementation is done, all that's left is to test that it produces results like the paper. I'll make a pull request once I've verified the results.

@RylanSchaeffer
Copy link
Contributor Author

RylanSchaeffer commented Mar 9, 2024 via email

@h-albert-lee
Copy link
Contributor

@haileyschoelkopf
Hi! I compared the scores shown in the paper with the scores obtained through lm-eval-harness using OPT 125M through 6.7B models, and there is a slight difference. I was wondering if lm-eval-harness tolerates a certain amount of score difference?

@h-albert-lee
Copy link
Contributor

Or maybe I'll use the code utilized in the evaluation(of inverse scaling prize) as a custom metric. I didn't get anything from the inverse-scaling team, but I did find some related work in the authors' github.

@haileyschoelkopf
Copy link
Contributor

@h-albert-lee That’s great progress!

Would you be able to open a PR with your implementation and resulting scores so we can discuss there? It’s hard to say without being able to look at the implementation differences/concrete numbers.

@h-albert-lee
Copy link
Contributor

@haileyschoelkopf Thanks a lot!, I'll apply the pre-commit and post a pull request with my experimental results soon.

@h-albert-lee h-albert-lee linked a pull request Mar 16, 2024 that will close this issue
@h-albert-lee h-albert-lee linked a pull request Mar 16, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request A feature that isn't implemented yet. good first issue Good for newcomers help wanted Contributors and extra help welcome.
Projects
Status: In progress
Development

Successfully merging a pull request may close this issue.

3 participants