Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tasks for performance on long context lengths #1748

Open
nairbv opened this issue Apr 25, 2024 · 1 comment
Open

Add tasks for performance on long context lengths #1748

nairbv opened this issue Apr 25, 2024 · 1 comment
Labels
feature request A feature that isn't implemented yet.

Comments

@nairbv
Copy link
Contributor

nairbv commented Apr 25, 2024

There are a couple of papers I see with benchmarks for really long context lengths that don't seem to be available in lm-evaluation-harness. It would be great to have one of these or something similar for measuring ability to extract information from long context windows, important for RAG.

@haileyschoelkopf
Copy link
Contributor

Needle-in-a-haystack might also be a nice-to-have though I think more difficult / "natural" long-context evals should be prioritized.

@haileyschoelkopf haileyschoelkopf added the feature request A feature that isn't implemented yet. label Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request A feature that isn't implemented yet.
Projects
None yet
Development

No branches or pull requests

2 participants