EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 1.6k
Star 5.9k

Code
Issues 219
Pull requests 72
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: EleutherAI/lm-evaluation-harness

[Discussion] Add Major Code Benchmarks

#1157 opened Dec 18, 2023 by haileyschoelkopf

Open 4

Labels 10 Milestones 1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

219 Open 727 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Getting "jinja2.exceptions.TemplateError: System role not supported" exception with some tasks using --apply_chat_template

#2109 by chimezie was closed Jul 16, 2024

The problem about the overall score of BBH and GPQA datasets asking questions

For asking for clarification / support on library usage.

#2101 by marvelcell was closed Jul 15, 2024

not getting same accuracy as on the leaderboard when evaluating locally.

#2091 by sorobedio was closed Jul 13, 2024

Same accuracy and std with different seeds asking questions

For asking for clarification / support on library usage.

#2089 by yogi9879 was closed Jul 15, 2024

How is the score calculated in open-llm-leaderboard asking questions

For asking for clarification / support on library usage.

#2088 by marvelcell was closed Jul 11, 2024

How do I send the api key when using local-chat-completions?

#2078 by URRealHero was closed Jul 8, 2024

Adding a metric and an aggregation requires knowledge of input

#2071 by notrichardren was closed Jul 5, 2024

TinyBenchmark/TinyMMLU broken?

#2068 by skramer-dev was closed Jul 9, 2024

LLM leader board setting for mmlu.

#2066 by dsj96 was closed Jul 8, 2024

package version conflict while launching leaderboard2 eval

#2065 by dhiaEddineRhaiem was closed Jul 8, 2024

Error Running New Open LLM Leaderboard Tasks

#2064 by annekethvij was closed Jul 5, 2024

Inconsistent format of doc_to_text in the task.yaml files?

#2062 by andrew0411 was closed Jul 3, 2024

Can I see all raw inputs to models and raw outputs from models?

#2061 by zsaladin was closed Jul 3, 2024

lm_eval --tasks list return nothing?

#2043 by fahadh4ilyas was closed Jul 2, 2024

Per-sample perplexity of a continuation?

#2040 by YilunZhou was closed Jun 29, 2024

The problem of generate responses with my own trained model

#2035 by marvelcell was closed Jun 29, 2024

Using chat template with vllm engine

#2033 by mohit-rag was closed Jun 28, 2024

vllm backend faild

#2028 by chunniunai220ml was closed Jun 27, 2024

--log_samples not saving all inference output

#2027 by zitgit was closed Jul 9, 2024

Test Open LLM Leaderboard 2 asking questions

For asking for clarification / support on library usage.

#2026 by matouk98 was closed Jul 3, 2024

YAML config was updated, but the project still remains the same as before

#2021 by 2018211801 was closed Jun 27, 2024

Does it support Triton server？ asking questions

For asking for clarification / support on library usage.

#2018 by AndyZZt was closed Jun 25, 2024

Running on custom model, getting 'TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

#2016 by Fchaubard was closed Jun 24, 2024

mmlu evaluation fail

#2005 by jxiw was closed Jun 21, 2024

piqa task need add trust_remote_code true in piqa.yml

#1985 by changwangss was closed Jun 19, 2024

Previous 1 2 3 4 5 … 29 30 Next

Previous Next

ProTip! Exclude everything labeled bug with -label:bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly