29 Mar 16:42

e6fdcc7

v0.3.0

Release Note

This introduced the new extended tasks feature, documentation and many other patches for improved stability.
New tasks are also introduced:

Big Bench Hard: https://huggingface.co/papers/2210.09261
AGIEval: https://huggingface.co/papers/2304.06364
TinyBench:
MT Bench: https://huggingface.co/papers/2306.05685
AlGhafa Benchmarking Suite: https://aclanthology.org/2023.arabicnlp-1.21/

MT-Bench marks the introduction of multi-turn prompting as well as llm-as-a-judge metric.

New tasks

Add BBH by @clefourrier in #7, @bilgehanertan in #126
Add AGIEval by @clefourrier in #121
Adding TinyBench by @clefourrier in #104
Adding support for Arabic benchmarks : AlGhafa benchmarking suite by @alielfilali01 in #95
Add mt-bench by @NathanHB in #75

Features

Extended Tasks ! by @clefourrier in #101, @lewtun in #108, @NathanHB in #122, #123
Added support for launching inference endpoint with different model dtypes by @shaltielshmid in #124

Documentation

Adding LICENSE by @clefourrier in #86, @NathanHB in #89
Make it clearer in the README that the leaderboard uses the harness by @clefourrier in #94

Small patches

Update huggingface-hub for compatibility with datasets 2.18 by @clefourrier in #84
Tidy up dependency groups by @lewtun in #81
bump git python by @NathanHB in #90
Sets a max length for the MATH task by @clefourrier in #83
Fix parallel data processing bug by @clefourrier in #92
Change the eos condition for GSM8K by @clefourrier in #85
Fixing rolling loglikelihood management by @clefourrier in #78
Fixes input length management for generative evals by @clefourrier in #103
Reorder addition of instruction in chat template by @clefourrier in #111
Ensure chat models terminate generation with EOS token by @lewtun in #115
Fix push details to hub by @NathanHB in #98
Small fixes to InferenceEndpointModel by @shaltielshmid in #112
Fix import typo autogptq by @clefourrier in #116
Fixed the loglikelihood method in inference endpoints models by @clefourrier in #119
Fix TextGenerationResponse import from hfh by @Wauplin in #129
Do not use deprecated list_files_info by @Wauplin in #133
Update test workflow name to 'Tests' by @Wauplin in #134

New Contributors

@shaltielshmid made their first contribution in #112
@bilgehanertan made their first contribution in #126
@Wauplin made their first contribution in #129

Full Changelog: v0.2.0...v0.3.0

Contributors

Wauplin, shaltielshmid, and 5 other contributors

Assets 2

01 Mar 14:31

NathanHB

v0.2.0

ab05db9

v0.2.0 Latest

Latest

Release Note

This release focuses on customization and personalisation: it's now possible to define custom metrics, not just custom tasks, see the README for the full mechanism.
Also includes small fixes to improve stability and new tasks. We made the choice to split community tasks from the main library source to better manage maintenance.

Better community task handling

New mechanism for evaluation contributions by @clefourrier in #47
Adding the custom metrics system by @clefourrier in #65

New tasks

Add GPQA by @clefourrier in #42
Adding support for Arabic benchmarks : AceGPT benchmarking suite by @alielfilali01 in #44
IFEval by @clefourrier in #48

Features

Add an automatic system to compute average for tasks with subtasks by @clefourrier in #41

small patches

Typos #27, #28, #30, #29, #34,
Better README #26, #37, #55,
Patch fix to match with config update/simplification in nanotron by @thomwolf in #35
bump transformers to 4.38 by @NathanHB in #46
Small fix to be able to use extensions of nanotron configs by @thomwolf in #58
Remove the eos token override in the Default Config Task by @clefourrier in #54
Update leaderboard task set by @lewtun in #60
Remove the eos token override in the Default Config Task by @clefourrier in #54
Fixes wikitext prompts + some patches on tg models by @clefourrier in #64
Fix unset generation size by @clefourrier in #76
Update ruff by @clefourrier in #71
Relax sentencepiece version by @lewtun in #74
Better chat template system by @clefourrier in #38

✨ Community Contributions

@ledrui made their first contribution in #26
@alielfilali01 made their first contribution in #44
@lewtun made their first contribution in #55

Full Changelog: v0.1.1...v0.2.0

Contributors

ledrui, thomwolf, and 4 other contributors

Assets 2

09 Feb 11:29

thomwolf

v0.1.1

adf1031

v0.1.1

Small patch for PyPi release

Include tasks_table.jsonl in package

Assets 2

08 Feb 10:27

NathanHB

v0.1.0

468d144

v0.1.0

Init

LightEval 🌤️

A lightweight LLM evaluation

Context

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

We're releasing it with the community in the spirit of building in the open.

Note that it is still very much early so don't expect 100% stability ^^'
In case of problems or question, feel free to open an issue!

Full Changelog: https://github.com/huggingface/lighteval/commits/v0.1

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release Note

New tasks

Features

Documentation

Small patches

New Contributors

Contributors

Release Note

Better community task handling

New tasks

Features

small patches

✨ Community Contributions

Contributors

Init

LightEval 🌤️

Context

Releases: huggingface/lighteval

v0.3.0

Release Note

New tasks

Features

Documentation

Small patches

New Contributors

Contributors

v0.2.0

Release Note

Better community task handling

New tasks

Features

small patches

✨ Community Contributions

Contributors

v0.1.1

v0.1.0

Init

LightEval 🌤️

Context