Skip to content

Scrap metadata of research articles from psychological academic journals such as Psychological Science, Collabra: Psychology, Journal of Cognition to feed the database of Curate Science.

Notifications You must be signed in to change notification settings

dominik-lenda/web-scraping-curate-science

Repository files navigation

CurateScienceBots

The project aims to scrap metadata of research articles from academic journals such as Psychological Science, Collabra: Psychology, Journal of Cognition to feed database of Curate Science.

Curate Science is a platform whose goal is to help in verification of transparency and credibility of the research.

Extracted data

The extracted data looks like this sample:

{
    'title': 'A New Replication Norm for Psychology',
    'year': '2015',
    'article_type': 'Original research report',
    'doi': '10.1525/collabra.23',
    'keywords': 'Independent replication, cumulative knowledge, replication norm',
    'peer_review_url': 'http:https://dx.doi.org/10.1525/collabra.23.opr',
    'conflict_of_interests': 'The author declares that they have no competing interests.',
    'views': '2244',
    'downloads': '412'
}

Spiders

This project contains three spiders and you can list them using the list command:

$ scrapy list
collabra
jofcognition
psych_science

You can learn more about the spiders by going through the Scrapy Tutorial.

Running the spiders

You can run a spider using the scrapy crawl command, such as:

$ scrapy crawl collabra
$ scrapy crawl jofcognition
$ scrapy crawl psych_science

If you want to save the scraped data to a file, you can pass the -o option:

$ scrapy crawl psych_science -o psychological_science.csv

CS logo

About

Scrap metadata of research articles from psychological academic journals such as Psychological Science, Collabra: Psychology, Journal of Cognition to feed the database of Curate Science.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages