Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eval gsm8k from local dataset folder with the bug info "ValueError: BuilderConfig 'main' not found." #1829

Open
Jp-17 opened this issue May 12, 2024 · 0 comments
Assignees

Comments

@Jp-17
Copy link

Jp-17 commented May 12, 2024

i have the same problems with this issue ( #1347 )

i just want to eval gsm8k from local dataset folder, as the web in China can't access huggingfaces during using lm-eval.

i just follow the guide ( https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/new_task_guide.md#beautifying-table-display ) to Using Local Datasets

I use the "dataset.save_to_disk()" to save gsm8k dataset into local folder, "llm/dataset/gsm8k". then i set gsm8k.yaml as
"
task: try_gsm8k
dataset_path: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k
dataset_name: main
"
or
"
task: try_gsm8k
dataset_path: gsm8k
dataset_kwargs:
data_dir: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k/
dataset_name: main
"
it doesn't work neither, and show the same bug info " File "/home/jiangp/.conda/envs/llm2/lib/python3.8/site-packages/datasets/builder.py", line 371, in init
self.config, self.config_id = self._create_builder_config(
File "/home/jiangp/.conda/envs/llm2/lib/python3.8/site-packages/datasets/builder.py", line 592, in _create_builder_config
raise ValueError(
ValueError: BuilderConfig 'main' not found. Available: ['default']"

However when i try to set the gsm8k.yaml as
"
task: gsm8k
dataset_path: arrow # original gsm8k
dataset_kwargs:
data_files:
train: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k/main/train/data-00000-of-00001.arrow
test: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k/main/test/data-00000-of-00001.arrow
dataset_name: main
"
it works, however it's not convenient as i also want to evaluate mmlu benchmark, which contain many tasks, it's not convenient to reset every subtask yaml with "data files in dataset_kwargs".

Want any help if possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants