Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Story_cloze fails #14

Closed
FabienRoger opened this issue Feb 2, 2023 · 6 comments · Fixed by #29
Closed

Story_cloze fails #14

FabienRoger opened this issue Feb 2, 2023 · 6 comments · Fixed by #29
Labels
bug Something isn't working

Comments

@FabienRoger
Copy link
Collaborator

FabienRoger commented Feb 2, 2023

This dataset is not recognized by huggingface

load raw dataset story-cloze from module.
Downloading and preparing dataset story_cloze/2016 to /home/ubuntu/.cache/huggingface/datasets/story_cloze/2016-data_dir=.%2Fdatasets%2Frawdata/0.0.0/45cead0538c3deb72d731a7990e60835c2c9c5d5d5b1e95a7dd47ccf593671e4...


Generating validation split:   0%|          | 0/1871 [00:00<?, ? examples/s]�[A
Iterating over prefixes::   0%|          | 0/1 [03:59<?, ?it/s]
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/elk/lib/python3.9/site-packages/datasets/builder.py", line 1570, in _prepare_split_single
    for key, record in generator:
  File "/home/ubuntu/.cache/huggingface/modules/datasets_modules/datasets/story_cloze/45cead0538c3deb72d731a7990e60835c2c9c5d5d5b1e95a7dd47ccf593671e4/story_cloze.py", line 112, in _generate_examples
    with open(filepath, encoding="utf-8") as csv_file:
  File "/home/ubuntu/miniconda3/envs/elk/lib/python3.9/site-packages/datasets/streaming.py", line 69, in wrapper
    return function(*args, use_auth_token=use_auth_token, **kwargs)
  File "/home/ubuntu/miniconda3/envs/elk/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 445, in xopen
    return open(main_hop, mode, *args, **kwargs)
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/elk/elk/datasets/rawdata/cloze_test_val__spring2016 - cloze_test_ALL_val.csv'
@FabienRoger
Copy link
Collaborator Author

FabienRoger commented Feb 2, 2023

Probably due to these lines. Does someone know why they are here?

if set_name != "story-cloze":
    raw_set = load_dataset(*getLoadName(set_name))
else:
    raw_set = load_dataset(*getLoadName(set_name), data_dir="./datasets/rawdata")

@FabienRoger FabienRoger added the bug Something isn't working label Feb 2, 2023
@lauritowal
Copy link
Collaborator

Probably due to these lines. Does someone know why they are here?

if set_name != "story-cloze":
    raw_set = load_dataset(*getLoadName(set_name))
else:
    raw_set = load_dataset(*getLoadName(set_name), data_dir="./datasets/rawdata")

We are cleaning up the whole generation process right now.. I'll have a look at this too

@haileyschoelkopf
Copy link

@FabienRoger
Copy link
Collaborator Author

The files are super lightweight (<1MB). I'm considering adding them to the repo rather than asking users to download them... Or at least have a simple .sh scirpt which downloads them from somewhere @norabelrose what do you think about it?

@norabelrose
Copy link
Member

@FabienRoger I'm not sure the people who made the dataset would be happy with us posting it in this repo? Presumably they ask people to fill out the form for a reason. I'd prefer to just remove the special case for story-cloze from the repo

@FabienRoger
Copy link
Collaborator Author

I think we should still keep the special case code around, at least for replications. But I'll add an appropriate error message that tells you which form you should fill (I filled it and I received the data automatically). You can't just remove the special case and still use story-cloze because huggingface raises an error if you try to load it

datasets.builder.ManualDownloadError: The dataset story_cloze with config 2016 requires manual data.
Please follow the manual download instructions:
 To use Story Cloze you have to download it manually. Please fill this google form (http:https://goo.gl/forms/aQz39sdDrO). Complete the form. Then you will receive a download link for the dataset. Load it using: `datasets.load_dataset('story_cloze', data_dir='path/to/folder/folder_name')`
Manual data can be loaded with:
 datasets.load_dataset("story_cloze", data_dir="<path/to/manual/data>")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants