Pulse · EleutherAI/lm-evaluation-harness

June 11, 2024 – June 18, 2024

Fix task.py and evaluator.py
#1954 opened Jun 12, 2024
Mmlu Pro
#1961 opened Jun 13, 2024
Add BertaQA dataset tasks
#1964 opened Jun 13, 2024
Fix OpenAI API discrepancies
#1969 opened Jun 14, 2024
mela
#1970 opened Jun 16, 2024
Add GigaChat API
#1973 opened Jun 17, 2024
Fix local completion huggingface tokenizer
#1975 opened Jun 17, 2024
add persianmmlu benchmark for assessing Persian Language understanding
#1979 opened Jun 17, 2024
Add Task: CBT
#1981 opened Jun 18, 2024
Update interface.md
#1982 opened Jun 18, 2024
Added ArabicMMLU
#1987 opened Jun 18, 2024
main
#1988 opened Jun 18, 2024
[Fix] Replace generic exception classes with a more specific ones
#1989 opened Jun 18, 2024

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Adding LLaVa support
#1832 commented on Jun 17, 2024 • 10 new comments
#1442 inverse scaling tasks implementation
#1589 commented on Jun 12, 2024 • 5 new comments
Plans for a new release?
#1951 commented on Jun 13, 2024 • 3 new comments
--trust_remote_code does it actually do anything?
#1932 commented on Jun 17, 2024 • 2 new comments
The output of ceval is not as the same format at the official version?
#1945 commented on Jun 13, 2024 • 1 new comment
Add support to azure-openai deployed models
#1733 commented on Jun 13, 2024 • 1 new comment
Add parallel processing for OpenAI completion models
#1460 commented on Jun 13, 2024 • 1 new comment
Added CommonsenseQA task
#1721 commented on Jun 18, 2024 • 1 new comment
add arc_challenge_mt
#1900 commented on Jun 12, 2024 • 1 new comment
mlx Model (loglikelihood & generate_until)
#1902 commented on Jun 18, 2024 • 1 new comment
Confusion matrix metric
#1921 commented on Jun 18, 2024 • 1 new comment
[New Task] Add Paloma benchmark
#1928 commented on Jun 18, 2024 • 1 new comment
Easier unitxt tasks loading and removal of unitxt library dependancy
#1933 commented on Jun 13, 2024 • 1 new comment
Alghafa benchmark
#1946 commented on Jun 15, 2024 • 1 new comment
Multiple issues Encountered During Tasks Verification
#1885 commented on Jun 18, 2024 • 0 new comments