EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 1.5k
Star 5.8k

Code
Issues 212
Pull requests 77
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: EleutherAI/lm-evaluation-harness

[Discussion] Add Major Code Benchmarks

#1157 opened Dec 18, 2023 by haileyschoelkopf

Open 4

Labels 10 Milestones 1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

212 Open 713 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Results is weird for Qwen2-1.5B

#1944 by SefaZeng was closed Jun 24, 2024

mmlu evaluation fail

#2005 by jxiw was closed Jun 21, 2024

piqa task need add trust_remote_code true in piqa.yml

#1985 by changwangss was closed Jun 19, 2024

How to enable trust_remote_code when encountered programmatically via get_task_dict?

#1980 by Jack-Khuu was closed Jun 18, 2024

Add a way to instantiate from HF.AutoModel (again)

#1978 by dmitrii-palisaderesearch was closed Jun 19, 2024

incomplete task list

#1972 by hlzhang109 was closed Jun 19, 2024

TemplateLM#_encode_pair() only works for HF transformers auto-models

#1966 by Birch-san was closed Jun 14, 2024

Error while installing

#1965 by surya-narayanan was closed Jun 19, 2024

Multi-gpu evaluation with external library usage.

#1960 by xinghaow99 was closed Jun 13, 2024

Cannot load model 'local-chat-completions' and 'local-completions'

#1957 by awesom112 was closed Jun 12, 2024

Keep getting error: 'VLLM' object has no attribute 'AUTO_MODEL_CLASS'

#1953 by andrew0411 was closed Jun 12, 2024

.ipynb_checkpoints causes eval harness to fail

#1952 by johnwee1 was closed Jun 13, 2024

Plans for a new release?

#1951 by nathan-weinberg was closed Jul 1, 2024

llama3-base gsm8k score

#1896 by rangehow was closed May 29, 2024

Save fewshot_as_multiturn argument in results.json

#1941 by djstrong was closed Jun 19, 2024

Format of Personal Defined Dataset for Evaluation

#1937 by OscarC9912 was closed Jun 9, 2024

Parallel GPU evaluation using simple_evaluate /evaluate functions? #1934

#1935 by PalaashAgrawal was closed Jun 7, 2024

Parallel GPU evaluation using simple_evaluate /evaluate functions?

#1934 by Naitik1502 was closed Jun 7, 2024

--trust_remote_code does it actually do anything? bug

Something isn't working.

#1932 by devzzzero was closed Jun 19, 2024

build commit_id=b281b09, I cannot find lm-eval command.

#1920 by jieheroli was closed Jun 4, 2024

Add New Benchmark

#1915 by khalil-Hennara was closed Jun 10, 2024

accuracy precision

#1911 by lernerjenny was closed Jun 4, 2024

social_iqa choices do not use actual answers

#1908 by ozgurcelik was closed May 31, 2024

Fewshot seed only set when overriding num_fewshot bug

Something isn't working.

#1906 by stoical07 was closed Jun 3, 2024

Load sentencepiece tokenizer for evaluation

#1904 by ayushsml was closed May 30, 2024

Previous 1 2 … 25 26 27 28 29 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly