Easier unitxt tasks loading and removal of unitxt library dependancy #1933

elronbandel · 2024-06-06T14:57:10Z

Easier loading for our lm-eval-harness users. This has significant importance for our user base and can allow us to bring more users to the lm-eval-harness platform.

Signed-off-by: Elron Bandel <[email protected]>

CLAassistant · 2024-06-06T14:57:18Z

All committers have signed the CLA.

Signed-off-by: Elron Bandel <[email protected]>

lm_eval/tasks/__init__.py

haileyschoelkopf

Hi @elronbandel , thank you for this PR!

Just 2 small points before we can merge:

made a comment about naming conventions for unitxt tasks: which do you prefer?
+1 to testing the small change in __init__.py on some of the other Python tasks in the repo if possible: squad_completion, fda, swde would be some nice tasks to test this on.

lm_eval/tasks/unitxt/20_newsgroups.yaml

… the constructor Signed-off-by: Elron Bandel <[email protected]>

elronbandel · 2024-06-09T06:29:14Z

Thank you for reviewing my PR so quickly @haileyschoelkopf @LSinev!

I adjusted every use of ConfigurableTask to accept the config.
The following command worked: lm_eval --model hf --model_args pretrained=google/flan-t5-small --tasks fda,squad_completion,squadv2,swde --device cpu --limit 10
The output is as expected:

Tasks	Version	Filter	Metric		Value		Stderr
fda	0	none	contains	↑	0.0	±	N/A
squad_completion	0	none	contains	↑	0.2	±	N/A
squadv2	3	none	exact	↑	60.0	±	N/A
		none	f1	↑	60.0	±	N/A
		none	HasAns_exact	↑	100.0	±	N/A
		none	HasAns_f1	↑	100.0	±	N/A
		none	NoAns_exact	↑	0.0	±	N/A
		none	NoAns_f1	↑	0.0	±	N/A
		none	best_exact	↑	70.0	±	N/A
		none	best_f1	↑	70.0	±	N/A
swde	0	none	contains	↑	0.3	±	N/A

LSinev · 2024-06-09T10:49:57Z

I believe, one should proactively find every task affected by this small change in __init__.py, and then check and update them accordingly.
For example, one could search for python tasks with class: !function search string. Then find every class which may be affected. Not only ConfigurableTask descendants like (but not limited to this example) https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/squadv2/task.py#L49 but also Task descendants like (but not limited to this example) https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/scrolls/task.py#L111.

elronbandel · 2024-06-11T14:34:06Z

@LSinev Do you think it might be safer to use this function:

import inspect

def has_config_in_constructor(cls):
    """
    Determines whether the constructor of the provided class has a parameter named 'config'.
    
    Args:
    cls (type): The class to be inspected.

    Returns:
    bool: True if the 'config' parameter exists in the constructor, False otherwise.
    """
    constructor = getattr(cls, '__init__', None)
    return 'config' in inspect.signature(constructor).parameters if constructor else False

Then conditioning on it:

if has_config_in_constructor(config["class"]):
    task = config["class"](config=config)
else:
    task = config["class"]()

LSinev · 2024-06-13T07:02:40Z

This has_config_in_constructor check may be a good idea for new test added.
Also it seems good for compatibility with all python tasks already done in other repos to be included in benchmarking through task adding (--include_path option used) — they will not fail after update.

Idea behind changing all python tasks in this repo accordingly is not only fully-working repo, but also providing useful and working examples for new task creators. With has_config_in_constructor check there will be two options available: with or without config. So, thorough documentation improvement is also needed, as current state "do the way we already did in example provided" in https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/new_task_guide.md#configuring-python-classes becomes not the only way.

Thus said, easying unitxt tasks loading may cause too many improvements in all repo, or may be it will be easier to convert unitxt to ConfigurableTask and improve config options through this class.

Signed-off-by: elronbandel <[email protected]>

elronbandel · 2024-07-03T08:20:34Z

Current change:

Fix any task using "class: !function"
Update documentation
Add backward compatibility mechanism

Details:

Fixing tasks using "class: !function"

By running grep -rl 'class: \!function' ./ | xargs -n 1 basename | cut -d. -f1

We get everything need to be updated:

unitxt
fda
scrolls_qmsum
scrolls_quality
scrolls_narrativeqa
scrolls_qasper
scrolls_summscreenfd
scrolls_govreport
scrolls_contractnli
squadv2
swde
squad_completion

All the classes can be tested end to end with: lm_eval --model hf --model_args pretrained=google/flan-t5-small --tasks fda,scrolls_qmsum,squadv2,swde,squad_completion --device cpu --limit 10
Output:

Tasks	Version	Filter	Metric		Value		Stderr
fda	0	none	contains	↑	0.0000	±	N/A
scrolls_qmsum	2	none	rouge1	↑	7.5993	±	N/A
		none	rouge2	↑	0.4706	±	N/A
		none	rougeL	↑	5.9726	±	N/A
squad_completion	0	none	contains	↑	0.2000	±	N/A
squadv2	3	none	HasAns_exact	↑	100.0000	±	N/A
		none	HasAns_f1	↑	100.0000	±	N/A
		none	NoAns_exact	↑	0.0000	±	N/A
		none	NoAns_f1	↑	0.0000	±	N/A
		none	best_exact	↑	70.0000	±	N/A
		none	best_f1	↑	70.0000	±	N/A
		none	exact	↑	60.0000	±	N/A
		none	f1	↑	60.0000	±	N/A
swde	0	none	contains	↑	0.3000	±	N/A

Similarly when I revert the change in squad_completion as an example I still get the same result:

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
squad_completion	0	none	0	contains	↑	0.2	±	N/A

This verifies that we have full backward computability.
@LSinev @haileyschoelkopf can you review again?

haileyschoelkopf · 2024-07-03T12:18:52Z

I will look at this today @elronbandel , sorry for letting it sit so long!

haileyschoelkopf

@elronbandel the PR looks good to me!

We're just finishing up a couple loose ends on #1741 and will have that merged either later today or, at worst by the weekend due to the US holiday tomorrow (the PR is ready, we're just confirming some behavior changes with task authors).

I'm intending to merge this PR after fixing up merge conflicts and retesting here, as soon as that other one goes though.

haileyschoelkopf · 2024-07-08T16:44:20Z

Merged into main ; last test failure is just due to tests not installing optional deps for unitxt metrics! LGTM, thanks @elronbandel !

Updated unitxt loading

d7f145e

Signed-off-by: Elron Bandel <[email protected]>

elronbandel requested review from haileyschoelkopf and lintangsutawika as code owners June 6, 2024 14:57

elronbandel changed the title ~~Updated unitxt loading~~ Easier unitxt tasks loading and removal of unitxt library dependancy Jun 6, 2024

Revert change to general Readme

43e3341

Signed-off-by: Elron Bandel <[email protected]>

LSinev reviewed Jun 7, 2024

View reviewed changes

lm_eval/tasks/__init__.py Outdated Show resolved Hide resolved

haileyschoelkopf requested changes Jun 7, 2024

View reviewed changes

lm_eval/tasks/unitxt/20_newsgroups.yaml Show resolved Hide resolved

Adjust fda,squadv2,squad_completion and swde to work accept config in…

6ea1bff

… the constructor Signed-off-by: Elron Bandel <[email protected]>

elronbandel requested review from haileyschoelkopf and LSinev June 9, 2024 06:29

elronbandel added 5 commits July 3, 2024 09:56

Merge remote-tracking branch 'origin/main' into unitxt-improvments

c8bcb8e

Signed-off-by: elronbandel <[email protected]>

Fix scrolls

0916fcd

Signed-off-by: elronbandel <[email protected]>

Update documentation

a948ca2

Signed-off-by: elronbandel <[email protected]>

Enforce backward compatability

6ef2ed1

Signed-off-by: elronbandel <[email protected]>

Format unitxt class

08b3bd4

Signed-off-by: elronbandel <[email protected]>

haileyschoelkopf approved these changes Jul 3, 2024

View reviewed changes

fix merge conflicts

b03e64f

haileyschoelkopf merged commit ad80f55 into EleutherAI:main Jul 8, 2024
8 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Easier unitxt tasks loading and removal of unitxt library dependancy #1933

Easier unitxt tasks loading and removal of unitxt library dependancy #1933

elronbandel commented Jun 6, 2024

CLAassistant commented Jun 6, 2024 •

edited

Loading

haileyschoelkopf left a comment

elronbandel commented Jun 9, 2024

LSinev commented Jun 9, 2024 •

edited

Loading

elronbandel commented Jun 11, 2024 •

edited

Loading

LSinev commented Jun 13, 2024 •

edited

Loading

elronbandel commented Jul 3, 2024 •

edited

Loading

haileyschoelkopf commented Jul 3, 2024

haileyschoelkopf left a comment

haileyschoelkopf commented Jul 8, 2024

Easier unitxt tasks loading and removal of unitxt library dependancy #1933

Easier unitxt tasks loading and removal of unitxt library dependancy #1933

Conversation

elronbandel commented Jun 6, 2024

CLAassistant commented Jun 6, 2024 • edited Loading

haileyschoelkopf left a comment

Choose a reason for hiding this comment

elronbandel commented Jun 9, 2024

LSinev commented Jun 9, 2024 • edited Loading

elronbandel commented Jun 11, 2024 • edited Loading

LSinev commented Jun 13, 2024 • edited Loading

elronbandel commented Jul 3, 2024 • edited Loading

haileyschoelkopf commented Jul 3, 2024

haileyschoelkopf left a comment

Choose a reason for hiding this comment

haileyschoelkopf commented Jul 8, 2024

CLAassistant commented Jun 6, 2024 •

edited

Loading

LSinev commented Jun 9, 2024 •

edited

Loading

elronbandel commented Jun 11, 2024 •

edited

Loading

LSinev commented Jun 13, 2024 •

edited

Loading

elronbandel commented Jul 3, 2024 •

edited

Loading