-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Insights: EleutherAI/lm-evaluation-harness
Overview
Could not load contribution data
Please try again later
8 Pull requests merged by 7 people
-
Fix self assignment in neuron_optimum.py
#1990 merged
Jun 18, 2024 -
add trust_remote_code for piqa
#1983 merged
Jun 18, 2024 -
fix: add directory filter to os.walk to ignore 'ipynb_checkpoints'
#1956 merged
Jun 13, 2024 -
Make `scripts.write_out` error out when no splits match
#1796 merged
Jun 13, 2024 -
Fix `--gen_kwargs` and VLLM (`temperature` not respected)
#1800 merged
Jun 13, 2024 -
`samples` is newline delimited
#1930 merged
Jun 13, 2024 -
Fix self.max_tokens in anthropic_llms.py
#1848 merged
Jun 12, 2024 -
Fix a tiny typo in `docs/interface.md`
#1955 merged
Jun 12, 2024
13 Pull requests opened by 13 people
-
Fix task.py and evaluator.py
#1954 opened
Jun 12, 2024 -
Mmlu Pro
#1961 opened
Jun 13, 2024 -
Add BertaQA dataset tasks
#1964 opened
Jun 13, 2024 -
Fix OpenAI API discrepancies
#1969 opened
Jun 14, 2024 -
mela
#1970 opened
Jun 16, 2024 -
Add GigaChat API
#1973 opened
Jun 17, 2024 -
Fix local completion huggingface tokenizer
#1975 opened
Jun 17, 2024 -
add persianmmlu benchmark for assessing Persian Language understanding
#1979 opened
Jun 17, 2024 -
Add Task: CBT
#1981 opened
Jun 18, 2024 -
Update interface.md
#1982 opened
Jun 18, 2024 -
Added ArabicMMLU
#1987 opened
Jun 18, 2024 -
main
#1988 opened
Jun 18, 2024 -
[Fix] Replace generic exception classes with a more specific ones
#1989 opened
Jun 18, 2024
9 Issues closed by 8 people
-
`piqa` task need add trust_remote_code true in piqa.yml
#1985 closed
Jun 19, 2024 -
How to enable trust_remote_code when encountered programmatically via get_task_dict?
#1980 closed
Jun 18, 2024 -
TemplateLM#_encode_pair() only works for HF transformers auto-models
#1966 closed
Jun 14, 2024 -
.ipynb_checkpoints causes eval harness to fail
#1952 closed
Jun 13, 2024 -
Multi-gpu evaluation with external library usage.
#1960 closed
Jun 13, 2024 -
Suboptimal Performance on Generation Tasks
#1353 closed
Jun 13, 2024 -
Keep getting error: 'VLLM' object has no attribute 'AUTO_MODEL_CLASS'
#1953 closed
Jun 12, 2024 -
Cannot load model 'local-chat-completions' and 'local-completions'
#1957 closed
Jun 12, 2024 -
`openai.BadRequestError` when running `lm_eval` with piqa task using vLLM's OpenAI compatible server
#1735 closed
Jun 12, 2024
11 Issues opened by 9 people
-
Long time testing Qwen2-72B
#1984 opened
Jun 18, 2024 -
Add a way to instantiate from HF.AutoModel (again)
#1978 opened
Jun 17, 2024 -
What is the output_type in the metric for?
#1976 opened
Jun 17, 2024 -
incomplete task list
#1972 opened
Jun 16, 2024 -
Ubelievable long time when host the gguf mode ?
#1971 opened
Jun 16, 2024 -
OpenAI completions model not using OpenAI Completion API properly to extract LogProbs
#1967 opened
Jun 14, 2024 -
Error while installing
#1965 opened
Jun 14, 2024 -
How to use a vllm hosted model?
#1963 opened
Jun 13, 2024 -
Error when chat template is not a string
#1962 opened
Jun 13, 2024 -
Making torch dep optional?
#1959 opened
Jun 12, 2024 -
Wandb logger can't handle groups with heterogenous metrics
#1958 opened
Jun 12, 2024
15 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Adding LLaVa support
#1832 commented on
Jun 17, 2024 • 10 new comments -
#1442 inverse scaling tasks implementation
#1589 commented on
Jun 12, 2024 • 5 new comments -
Plans for a new release?
#1951 commented on
Jun 13, 2024 • 3 new comments -
--trust_remote_code does it actually do anything?
#1932 commented on
Jun 17, 2024 • 2 new comments -
The output of ceval is not as the same format at the official version?
#1945 commented on
Jun 13, 2024 • 1 new comment -
Add support to azure-openai deployed models
#1733 commented on
Jun 13, 2024 • 1 new comment -
Add parallel processing for OpenAI completion models
#1460 commented on
Jun 13, 2024 • 1 new comment -
Added CommonsenseQA task
#1721 commented on
Jun 18, 2024 • 1 new comment -
add arc_challenge_mt
#1900 commented on
Jun 12, 2024 • 1 new comment -
mlx Model (loglikelihood & generate_until)
#1902 commented on
Jun 18, 2024 • 1 new comment -
Confusion matrix metric
#1921 commented on
Jun 18, 2024 • 1 new comment -
[New Task] Add Paloma benchmark
#1928 commented on
Jun 18, 2024 • 1 new comment -
Easier unitxt tasks loading and removal of unitxt library dependancy
#1933 commented on
Jun 13, 2024 • 1 new comment -
Alghafa benchmark
#1946 commented on
Jun 15, 2024 • 1 new comment -
Multiple issues Encountered During Tasks Verification
#1885 commented on
Jun 18, 2024 • 0 new comments