Pulse · EleutherAI/lm-evaluation-harness

August 28, 2024 – September 28, 2024

39 Active pull requests

79 Active issues

v0.4.4
published Sep 5, 2024

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Add new benchmark: Catalan bench
#2154 commented on Sep 27, 2024 • 10 new comments
Add new benchmark: Galician bench
#2155 commented on Sep 27, 2024 • 8 new comments
Add new benchmark: Basque bench
#2153 commented on Sep 27, 2024 • 5 new comments
Add new benchmark: Spanish bench
#2157 commented on Sep 27, 2024 • 4 new comments
Minor features
#2249 commented on Sep 14, 2024 • 3 new comments
Draft - Support ov models via genai
#1862 commented on Sep 3, 2024 • 3 new comments
[Draft] llm-as-judge
#2251 commented on Sep 25, 2024 • 1 new comment
Add new benchmark: Portuguese bench
#2156 commented on Sep 27, 2024 • 1 new comment
[rank1]: huggingface_hub.utils._errors.HfHubHTTPError: 429 Client Error: Too Many Requests for url:
#2202 commented on Aug 28, 2024 • 0 new comments
TypeError: argument 'ids': 'NoneType' object cannot be converted to 'Sequence'
#2178 commented on Aug 29, 2024 • 0 new comments
Add KoCommonGEN v2 benchmark
#2208 commented on Aug 28, 2024 • 0 new comments
Evaluate Gemma with Chat Template
#2069 commented on Sep 5, 2024 • 0 new comments
Supporting Multimodality
#2014 commented on Sep 5, 2024 • 0 new comments
Allow Task objects to defer dataset download
#1558 commented on Sep 9, 2024 • 0 new comments
GPT2 eval in lambada_openai, acc only 0.325
#2159 commented on Sep 9, 2024 • 0 new comments
How to use Custom Prompt during Evaluation
#2131 commented on Sep 12, 2024 • 0 new comments
Medical specialities
#2113 commented on Sep 18, 2024 • 0 new comments
Chat template fix
#2058 commented on Sep 11, 2024 • 0 new comments
Fix partial caching of openai models
#1997 commented on Aug 29, 2024 • 0 new comments
Confusion matrix metric
#1921 commented on Sep 8, 2024 • 0 new comments
mlx Model (loglikelihood & generate_until)
#1902 commented on Sep 9, 2024 • 0 new comments
Low results on TriviaQA
#1292 commented on Sep 17, 2024 • 0 new comments
add context-based requests processing
#1571 commented on Sep 6, 2024 • 0 new comments
HellaSwag with UnicodeDecodeError
#1757 commented on Sep 26, 2024 • 0 new comments
Add long context evaluation benchmarks such as LongBench and LEval.
#2180 commented on Sep 23, 2024 • 0 new comments
eval gsm8k from local dataset folder with the bug info "ValueError: BuilderConfig 'main' not found."
#1829 commented on Sep 23, 2024 • 0 new comments
The response is too short to extract answer on GPQA. What should I set to extend it?
#2081 commented on Sep 18, 2024 • 0 new comments
Inconsistent evaluation results with Chat Template
#1841 commented on Sep 17, 2024 • 0 new comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

August 28, 2024 – September 28, 2024

Insights: EleutherAI/lm-evaluation-harness

August 28, 2024 – September 28, 2024

Overview

Could not load contribution data

1 Release published by 1 person

23 Pull requests merged by 10 people

16 Pull requests opened by 11 people

42 Issues closed by 13 people

37 Issues opened by 33 people

28 Unresolved conversations