Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize metrics #1167

Draft
wants to merge 26 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
e7cd7d6
sample metrics that have both sample-wise and set-wise operations
lintangsutawika Dec 19, 2023
1d262a5
change how metrics are registered
lintangsutawika Dec 19, 2023
028f04c
loglikelihood and loglikelihood rolling modified
lintangsutawika Dec 19, 2023
6117c50
changed how metrics are calculated
lintangsutawika Dec 19, 2023
a808c66
Merge branch 'main' of https://github.com/EleutherAI/lm-evaluation-ha…
lintangsutawika Dec 19, 2023
c6a9158
update
lintangsutawika Dec 27, 2023
4d49dd0
aggregation to compute_metric
lintangsutawika Dec 28, 2023
9d6bc92
aggregation to compute_metric
lintangsutawika Dec 28, 2023
3888193
simplify registry
lintangsutawika Dec 28, 2023
039832e
removed passthrough fn
lintangsutawika Dec 28, 2023
e5b245c
remove aggregation
lintangsutawika Dec 28, 2023
20c10df
kwargs are added to metric_fn through partial at the beginning
lintangsutawika Dec 28, 2023
6a336b1
use HFEvaluateAdaptor for hf metrics
lintangsutawika Dec 28, 2023
150f11f
revert to just load metric_fn
lintangsutawika Dec 28, 2023
99ce4ef
process hf evaluate metrics
lintangsutawika Dec 28, 2023
439dca5
list tuple for string based multigpu collection
lintangsutawika Dec 29, 2023
aaf64aa
readded suport for aggregation
lintangsutawika Jan 2, 2024
787b23f
readd aggregation
lintangsutawika Jan 2, 2024
703e0d5
adjusted aggregation config
lintangsutawika Jan 2, 2024
2a573a1
adjust to be backwards compatible
lintangsutawika Jan 2, 2024
2054c2e
revert
lintangsutawika Jan 2, 2024
dfb4183
revert
lintangsutawika Jan 2, 2024
cda25fe
Merge branch 'main' into standardize_metrics
lintangsutawika Jan 2, 2024
470fb31
resolved git conflict
lintangsutawika Jan 2, 2024
dfb036b
resolved again
lintangsutawika Jan 2, 2024
de46fb9
reformat
lintangsutawika Jan 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 0 additions & 21 deletions lm_eval/api/registry.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,10 @@
<<<<<<< HEAD
import os
import logging
import evaluate
import collections
from functools import partial

from lm_eval.api.model import LM
=======
import logging

import evaluate

from lm_eval.api.model import LM

>>>>>>> 4d10ad56b1ffe569467eee2297e2317c99313118

eval_logger = logging.getLogger("lm-eval")

Expand Down Expand Up @@ -129,7 +120,6 @@ def decorate(fn):
return decorate


<<<<<<< HEAD
def get_metric(name):

if name in METRIC_REGISTRY:
Expand All @@ -139,17 +129,6 @@ def get_metric(name):


def get_evaluate(name, **kwargs):
=======
def get_metric(name, hf_evaluate_metric=False):
if not hf_evaluate_metric:
if name in METRIC_REGISTRY:
return METRIC_REGISTRY[name]
else:
eval_logger.warning(
f"Could not find registered metric '{name}' in lm-eval, searching in HF Evaluate library..."
)
>>>>>>> 4d10ad56b1ffe569467eee2297e2317c99313118

try:

class HFEvaluateAdaptor:
Expand Down
6 changes: 6 additions & 0 deletions lm_eval/api/task.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@
mean,
weighted_perplexity,
<<<<<<< HEAD
<<<<<<< HEAD
=======
>>>>>>> cda25fef4e1df2f4bc2dab3ec6668ae9f5bf7296
bits_per_byte,
)
from lm_eval.api.registry import (
Expand All @@ -27,6 +30,7 @@
get_aggregation,
METRIC_REGISTRY,
DEFAULT_METRIC_REGISTRY,
<<<<<<< HEAD
=======
)
from lm_eval.api.registry import (
Expand All @@ -37,6 +41,8 @@
get_metric_aggregation,
is_higher_better,
>>>>>>> 4d10ad56b1ffe569467eee2297e2317c99313118
=======
>>>>>>> cda25fef4e1df2f4bc2dab3ec6668ae9f5bf7296
)
from lm_eval.filters import build_filter_ensemble
from lm_eval.prompts import get_prompt
Expand Down
Loading