ValueError: BuilderConfig 'pile_freelaw' not found., issue on running PILE eval #1714

Harryalways317 · 2024-04-16T19:40:02Z

Execution Command

!lm_eval --model hf \
    --model_args pretrained=hvadaparty/Featherlite-2.5-Mistral-7B \
    --tasks self_consistency,realtoxicityprompts,toxigen,pile \
    --device cuda:0 \
    --batch_size auto --device cuda

Error Message

/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
2024-04-16:19:37:29,582 INFO     [__main__.py:251] Verbosity set to INFO
2024-04-16:19:37:33,734 INFO     [__main__.py:335] Selected Tasks: ['pile', 'realtoxicityprompts', 'self_consistency', 'toxigen']
2024-04-16:19:37:33,734 INFO     [__main__.py:336] Loading selected tasks...
2024-04-16:19:37:33,734 INFO     [evaluator.py:131] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234
2024-04-16:19:37:33,736 INFO     [huggingface.py:162] Using device 'cuda:0'
Loading checkpoint shards: 100%|██████████████████| 3/3 [00:01<00:00,  1.57it/s]
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/usr/local/lib/python3.10/dist-packages/datasets/load.py:1429: FutureWarning: The repository for EleutherAI/pile contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/EleutherAI/pile
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
  warnings.warn(
Downloading builder script: 100%|██████████| 9.53k/9.53k [00:00<00:00, 55.6MB/s]
Downloading readme: 100%|██████████████████| 14.2k/14.2k [00:00<00:00, 47.1MB/s]
Traceback (most recent call last):
  File "/usr/local/bin/lm_eval", line 8, in <module>
    sys.exit(cli_evaluate())
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/__main__.py", line 342, in cli_evaluate
    results = evaluator.simple_evaluate(
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/evaluator.py", line 192, in simple_evaluate
    task_dict = get_task_dict(tasks, task_manager)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/tasks/__init__.py", line 420, in get_task_dict
    task_name_from_string_dict = task_manager.load_task_or_group(
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/tasks/__init__.py", line 270, in load_task_or_group
    collections.ChainMap(*map(self._load_individual_task_or_group, task_list))
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/tasks/__init__.py", line 253, in _load_individual_task_or_group
    **dict(collections.ChainMap(*map(fn, subtask_list))),
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/tasks/__init__.py", line 161, in _load_individual_task_or_group
    return load_task(task_config, task=name_or_config, group=parent_name)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/tasks/__init__.py", line 150, in load_task
    task_object = ConfigurableTask(config=config)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/api/task.py", line 782, in __init__
    self.download(self.config.dataset_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/api/task.py", line 871, in download
    self.dataset = datasets.load_dataset(
  File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2523, in load_dataset
    builder_instance = load_dataset_builder(
  File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2232, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 371, in __init__
    self.config, self.config_id = self._create_builder_config(
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 592, in _create_builder_config
    raise ValueError(
ValueError: BuilderConfig 'pile_freelaw' not found. Available: ['all', 'enron_emails', 'europarl', 'free_law', 'hacker_news', 'nih_exporter', 'pubmed', 'pubmed_central', 'ubuntu_irc', 'uspto', 'github']

The text was updated successfully, but these errors were encountered:

LeeGitHub1 · 2024-07-20T07:31:34Z

Did you solve this problem? I also encountered this problem.

haileyschoelkopf · 2024-07-21T15:32:47Z

The Pile tasks are broken because the Pile is no longer hosted. I am intending to upload a tokenized version of the validation sets and update these tasks, which will then once again be usable.

haileyschoelkopf self-assigned this Jul 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: BuilderConfig 'pile_freelaw' not found., issue on running PILE eval #1714

ValueError: BuilderConfig 'pile_freelaw' not found., issue on running PILE eval #1714

Harryalways317 commented Apr 16, 2024

LeeGitHub1 commented Jul 20, 2024

haileyschoelkopf commented Jul 21, 2024

ValueError: BuilderConfig 'pile_freelaw' not found., issue on running PILE eval #1714

ValueError: BuilderConfig 'pile_freelaw' not found., issue on running PILE eval #1714

Comments

Harryalways317 commented Apr 16, 2024

LeeGitHub1 commented Jul 20, 2024

haileyschoelkopf commented Jul 21, 2024