Skip to content

Issues: EleutherAI/lm-evaluation-harness

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

The problem about the overall score of BBH and GPQA datasets asking questions For asking for clarification / support on library usage.
#2101 by marvelcell was closed Jul 15, 2024
Same accuracy and std with different seeds asking questions For asking for clarification / support on library usage.
#2089 by yogi9879 was closed Jul 15, 2024
How is the score calculated in open-llm-leaderboard asking questions For asking for clarification / support on library usage.
#2088 by marvelcell was closed Jul 11, 2024
TinyBenchmark/TinyMMLU broken?
#2068 by skramer-dev was closed Jul 9, 2024
LLM leader board setting for mmlu.
#2066 by dsj96 was closed Jul 8, 2024
lm_eval --tasks list return nothing?
#2043 by fahadh4ilyas was closed Jul 2, 2024
Per-sample perplexity of a continuation?
#2040 by YilunZhou was closed Jun 29, 2024
Using chat template with vllm engine
#2033 by mohit-rag was closed Jun 28, 2024
vllm backend faild
#2028 by chunniunai220ml was closed Jun 27, 2024
Test Open LLM Leaderboard 2 asking questions For asking for clarification / support on library usage.
#2026 by matouk98 was closed Jul 3, 2024
Does it support Triton server? asking questions For asking for clarification / support on library usage.
#2018 by AndyZZt was closed Jun 25, 2024
mmlu evaluation fail
#2005 by jxiw was closed Jun 21, 2024
ProTip! Exclude everything labeled bug with -label:bug.