Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ASR task for SUPERB #2619

Merged
merged 11 commits into from
Jul 13, 2021
Merged

Add ASR task for SUPERB #2619

merged 11 commits into from
Jul 13, 2021

Conversation

lewtun
Copy link
Member

@lewtun lewtun commented Jul 9, 2021

This PR starts building up the SUPERB benchmark by including the ASR task as described in the SUPERB paper and s3prl instructions.

Usage:

from datasets import load_dataset 

asr = load_dataset("superb", "asr")
# DatasetDict({
#     train: Dataset({
#         features: ['file', 'text', 'speaker_id', 'chapter_id', 'id'],
#         num_rows: 28539
#     })
#     validation: Dataset({
#         features: ['file', 'text', 'speaker_id', 'chapter_id', 'id'],
#         num_rows: 2703
#     })
#     test: Dataset({
#         features: ['file', 'text', 'speaker_id', 'chapter_id', 'id'],
#         num_rows: 2620
#     })
# })

I've used the GLUE benchmark as a guide for filling out the README.

To move fast during the evaluation PoC I propose to merge one task at a time, so we can continue building the training / evaluation framework in parallel.

Note: codewise this PR is ready for review - I'll add the missing YAML tags once #2620 is merged :)

@lewtun lewtun changed the title Add ASR task Add ASR task for SUPERB Jul 9, 2021
@lewtun
Copy link
Member Author

lewtun commented Jul 12, 2021

Wait until #2620 is merged before pushing the README tags in this PR

@lewtun lewtun marked this pull request as ready for review July 12, 2021 15:33
Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2620 is already merged into master. You can merge master into this branch.

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

One question: aren't you adding task_templates to the _info method (and to the dataset_infos.json?

@lewtun
Copy link
Member Author

lewtun commented Jul 13, 2021

Thanks!

One question: aren't you adding task_templates to the _info method (and to the dataset_infos.json?

great catch! i've now added the asr task template (along with a mapping from superb task -> template) and updated the dataset_infos.json :)

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good!

I have a suggested refactoring... Tell me what you think! :)

datasets/superb/superb.py Outdated Show resolved Hide resolved
datasets/superb/superb.py Outdated Show resolved Hide resolved
datasets/superb/superb.py Show resolved Hide resolved
datasets/superb/superb.py Show resolved Hide resolved
datasets/superb/superb.py Show resolved Hide resolved
datasets/superb/superb.py Outdated Show resolved Hide resolved
Co-authored-by: Albert Villanova del Moral <[email protected]>
@lewtun
Copy link
Member Author

lewtun commented Jul 13, 2021

Good!

I have a suggested refactoring... Tell me what you think! :)

your approach is much more elegant - i've included your suggestions 🙏

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@albertvillanova albertvillanova added this to the 1.10 milestone Jul 13, 2021
@albertvillanova albertvillanova merged commit ddb5d80 into huggingface:master Jul 13, 2021
@lewtun lewtun deleted the add-superb branch July 13, 2021 13:01
This was referenced Jul 15, 2021
@anton-l anton-l mentioned this pull request Aug 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants