Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pile dataset not found #1338

Closed
RenatoGeh opened this issue Jan 23, 2024 · 1 comment
Closed

Pile dataset not found #1338

RenatoGeh opened this issue Jan 23, 2024 · 1 comment

Comments

@RenatoGeh
Copy link

Hi,

I'm trying to run the pile group through lm_eval.simple_evaluate, but I am getting the following error.

Traceback (most recent call last):
  File "/scratch/renatolg/tokens/harness.py", line 72, in <module>
    res = lm_eval.simple_evaluate(**eval_args, bootstrap_iters=0)
  File "/home/renatolg/.local/lib/python3.10/site-packages/lm_eval/utils.py", line 415, in _wrapper
    return fn(*args, **kwargs)
  File "/home/renatolg/.local/lib/python3.10/site-packages/lm_eval/evaluator.py", line 122, in simple_evaluate
    task_dict = lm_eval.tasks.get_task_dict(tasks)
  File "/home/renatolg/.local/lib/python3.10/site-packages/lm_eval/tasks/__init__.py", line 255, in get_task_dict
    task_obj = get_task_dict(task_name)
  File "/home/renatolg/.local/lib/python3.10/site-packages/lm_eval/tasks/__init__.py", line 275, in get_task_dict
    task_name: get_task(task_name=task_element, config=config),
  File "/home/renatolg/.local/lib/python3.10/site-packages/lm_eval/tasks/__init__.py", line 217, in get_task
    return TASK_REGISTRY[task_name](config=config)
  File "/home/renatolg/.local/lib/python3.10/site-packages/lm_eval/api/task.py", line 622, in __init__
    self.download(self.config.dataset_kwargs)
  File "/home/renatolg/.local/lib/python3.10/site-packages/lm_eval/api/task.py", line 717, in download
    self.dataset = datasets.load_dataset(
  File "/home/renatolg/.local/lib/python3.10/site-packages/datasets/load.py", line 2129, in load_dataset
    builder_instance = load_dataset_builder(
  File "/home/renatolg/.local/lib/python3.10/site-packages/datasets/load.py", line 1852, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
  File "/home/renatolg/.local/lib/python3.10/site-packages/datasets/builder.py", line 373, in __init__
    self.config, self.config_id = self._create_builder_config(
  File "/home/renatolg/.local/lib/python3.10/site-packages/datasets/builder.py", line 539, in _create_builder_config
    raise ValueError(
ValueError: BuilderConfig 'pile_arxiv' not found. Available: ['all', 'enron_emails', 'europarl', 'free_law', 'hacker_news', 'nih_exporter', 'pubmed', 'pubmed_central', 'ubuntu_irc', 'uspto', 'github']

I thought pile (and its subsets, like pile_arxiv) was included in lm-evaluation-harness?

Just to clarify, I did successfully initialize tasks with lm_eval.tasks.initialize_tasks().

Thanks

@haileyschoelkopf
Copy link
Collaborator

Hi!

to run the Pile tasks, you'll need to use the fix described in #731 and have access to the Pile locally, since it is no longer downloadable via the Eye.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants