You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use the "dataset.save_to_disk()" to save gsm8k dataset into local folder, "llm/dataset/gsm8k". then i set gsm8k.yaml as
"
task: try_gsm8k
dataset_path: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k
dataset_name: main
"
or
"
task: try_gsm8k
dataset_path: gsm8k
dataset_kwargs:
data_dir: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k/
dataset_name: main
"
it doesn't work neither, and show the same bug info " File "/home/jiangp/.conda/envs/llm2/lib/python3.8/site-packages/datasets/builder.py", line 371, in init
self.config, self.config_id = self._create_builder_config(
File "/home/jiangp/.conda/envs/llm2/lib/python3.8/site-packages/datasets/builder.py", line 592, in _create_builder_config
raise ValueError(
ValueError: BuilderConfig 'main' not found. Available: ['default']"
However when i try to set the gsm8k.yaml as
"
task: gsm8k
dataset_path: arrow # original gsm8k
dataset_kwargs:
data_files:
train: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k/main/train/data-00000-of-00001.arrow
test: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k/main/test/data-00000-of-00001.arrow
dataset_name: main
"
it works, however it's not convenient as i also want to evaluate mmlu benchmark, which contain many tasks, it's not convenient to reset every subtask yaml with "data files in dataset_kwargs".
Want any help if possible
The text was updated successfully, but these errors were encountered:
i have the same problems with this issue ( #1347 )
i just want to eval gsm8k from local dataset folder, as the web in China can't access huggingfaces during using lm-eval.
i just follow the guide ( https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/new_task_guide.md#beautifying-table-display ) to Using Local Datasets
I use the "dataset.save_to_disk()" to save gsm8k dataset into local folder, "llm/dataset/gsm8k". then i set gsm8k.yaml as
"
task: try_gsm8k
dataset_path: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k
dataset_name: main
"
or
"
task: try_gsm8k
dataset_path: gsm8k
dataset_kwargs:
data_dir: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k/
dataset_name: main
"
it doesn't work neither, and show the same bug info " File "/home/jiangp/.conda/envs/llm2/lib/python3.8/site-packages/datasets/builder.py", line 371, in init
self.config, self.config_id = self._create_builder_config(
File "/home/jiangp/.conda/envs/llm2/lib/python3.8/site-packages/datasets/builder.py", line 592, in _create_builder_config
raise ValueError(
ValueError: BuilderConfig 'main' not found. Available: ['default']"
However when i try to set the gsm8k.yaml as
"
task: gsm8k
dataset_path: arrow # original gsm8k
dataset_kwargs:
data_files:
train: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k/main/train/data-00000-of-00001.arrow
test: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k/main/test/data-00000-of-00001.arrow
dataset_name: main
"
it works, however it's not convenient as i also want to evaluate mmlu benchmark, which contain many tasks, it's not convenient to reset every subtask yaml with "data files in dataset_kwargs".
Want any help if possible
The text was updated successfully, but these errors were encountered: