Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: BuilderConfig 'pile_freelaw' not found., issue on running PILE eval #1714

Open
Harryalways317 opened this issue Apr 16, 2024 · 2 comments
Assignees

Comments

@Harryalways317
Copy link
Contributor

Execution Command

!lm_eval --model hf \
    --model_args pretrained=hvadaparty/Featherlite-2.5-Mistral-7B \
    --tasks self_consistency,realtoxicityprompts,toxigen,pile \
    --device cuda:0 \
    --batch_size auto --device cuda

Error Message

/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
2024-04-16:19:37:29,582 INFO     [__main__.py:251] Verbosity set to INFO
2024-04-16:19:37:33,734 INFO     [__main__.py:335] Selected Tasks: ['pile', 'realtoxicityprompts', 'self_consistency', 'toxigen']
2024-04-16:19:37:33,734 INFO     [__main__.py:336] Loading selected tasks...
2024-04-16:19:37:33,734 INFO     [evaluator.py:131] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234
2024-04-16:19:37:33,736 INFO     [huggingface.py:162] Using device 'cuda:0'
Loading checkpoint shards: 100%|██████████████████| 3/3 [00:01<00:00,  1.57it/s]
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/usr/local/lib/python3.10/dist-packages/datasets/load.py:1429: FutureWarning: The repository for EleutherAI/pile contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/EleutherAI/pile
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
  warnings.warn(
Downloading builder script: 100%|██████████| 9.53k/9.53k [00:00<00:00, 55.6MB/s]
Downloading readme: 100%|██████████████████| 14.2k/14.2k [00:00<00:00, 47.1MB/s]
Traceback (most recent call last):
  File "/usr/local/bin/lm_eval", line 8, in <module>
    sys.exit(cli_evaluate())
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/__main__.py", line 342, in cli_evaluate
    results = evaluator.simple_evaluate(
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/evaluator.py", line 192, in simple_evaluate
    task_dict = get_task_dict(tasks, task_manager)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/tasks/__init__.py", line 420, in get_task_dict
    task_name_from_string_dict = task_manager.load_task_or_group(
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/tasks/__init__.py", line 270, in load_task_or_group
    collections.ChainMap(*map(self._load_individual_task_or_group, task_list))
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/tasks/__init__.py", line 253, in _load_individual_task_or_group
    **dict(collections.ChainMap(*map(fn, subtask_list))),
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/tasks/__init__.py", line 161, in _load_individual_task_or_group
    return load_task(task_config, task=name_or_config, group=parent_name)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/tasks/__init__.py", line 150, in load_task
    task_object = ConfigurableTask(config=config)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/api/task.py", line 782, in __init__
    self.download(self.config.dataset_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/api/task.py", line 871, in download
    self.dataset = datasets.load_dataset(
  File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2523, in load_dataset
    builder_instance = load_dataset_builder(
  File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2232, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 371, in __init__
    self.config, self.config_id = self._create_builder_config(
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 592, in _create_builder_config
    raise ValueError(
ValueError: BuilderConfig 'pile_freelaw' not found. Available: ['all', 'enron_emails', 'europarl', 'free_law', 'hacker_news', 'nih_exporter', 'pubmed', 'pubmed_central', 'ubuntu_irc', 'uspto', 'github']
@LeeGitHub1
Copy link

Did you solve this problem? I also encountered this problem.

@haileyschoelkopf
Copy link
Contributor

The Pile tasks are broken because the Pile is no longer hosted. I am intending to upload a tokenized version of the validation sets and update these tasks, which will then once again be usable.

@haileyschoelkopf haileyschoelkopf self-assigned this Jul 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants