Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of the Hamilton 2022 dataset #87

Closed
wants to merge 13 commits into from
Closed

Addition of the Hamilton 2022 dataset #87

wants to merge 13 commits into from

Conversation

jteijema
Copy link
Member

@jteijema jteijema commented Nov 2, 2022

Add a new dataset

This is a request for the addition of a new dataset. Below is metadata regarding the dataset.

Dataset information

This is a multilingual dataset spanning English, German, Italian, and French research literature. It accompanies a systematic review on the topic of using music as part of teaching language to children aged 2 through 18.

Dataset Reference

The dataset is from a paper currently in preprint, titled "The effectiveness of using songs for teaching second or foreign languages to primary and secondary school learners: a systematic review."

Authors

  "Hamilton, C.",
  "Schulz, J.",
  "Chalmers, H.",
  "Murphy V."

Reference URL

TBD

Doi

TBD

Dataset URL

https://osf.io/3u982/

Dataset Name

Hamilton_2022

License

CC-By 4.0

Copy link
Member

@J535D165 J535D165 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, see comments.

datasets/Hamilton_2022/etl_doi.py Outdated Show resolved Hide resolved
df["label"] = [1 if "Included" in note else 0 for note in df.full_labels]

# save results to file
df[["title", "abstract", "label"]].to_csv("Hamilton_2022.csv", index=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the current format. Can you also add the future format? See PRs of @gimoAI.

Copy link
Member Author

@jteijema jteijema Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no DOIs in this dataset. Do you want IDs of type ISSN and see far we can get with those, or should we do title-based DOI retrieval?

@@ -0,0 +1,20 @@
{
"dataset_id": "Hamilton_2022",
"url": "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Educated guess?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have sent out an email requesting a link. I'll update with new info

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what happens when a dataset doesn't have any other page than the OSF page? @J535D165

@jteijema
Copy link
Member Author

This dataset consists mostly out of records that link to book chapters, and it will be difficult getting it in the standard DOI + label format we are aiming for in next versions. Let's close this PR for now, and keep it as an option for a later date.

@jteijema jteijema closed this Nov 24, 2022
@jteijema jteijema deleted the Hamilton_2022 branch November 24, 2022 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants