Skip to content

getpapers TUTORIAL for EUPMC search

VAISHALI ARORA edited this page Jun 14, 2020 · 9 revisions

Retrieving papers from EUPMC using getpapers

getpapers is a simple, powerful tool for querying repositories of scholarly articles using a simple one-line command. Full instructions for installation and use are given at getpapers OVERVIEW Please download and install. Our first search query would be 'Viral Epidemics'

Step 1

when we type 'getpapers' on terminal we get following informations

Usage: getpapers [options]

  Options:

    -h, --help                output usage information
    -V, --version             output the version number
    -q, --query <query>       search query (required)
    -o, --outdir <path>       output directory (required - will be created if not found)
    --api <name>              API to search [eupmc, crossref, ieee, arxiv] (default: eupmc)
    -x, --xml                 download fulltext XMLs if available
    -p, --pdf                 download fulltext PDFs if available
    -s, --supp                download supplementary files if available
    -t, --minedterms          download text-mined terms if available
    -l, --loglevel <level>    amount of information to log (silent, verbose, info*, data, warn, error, or debug)
    -a, --all                 search all papers, not just open access
    -n, --noexecute           report how many results match the query, but don't actually download anything
    -f, --logfile <filename>  save log to specified file in output directory as well as printing to terminal
    -k, --limit <int>         limit the number of hits and downloads
    --filter <filter object>  filter by key value pair, passed straight to the crossref api only
    -r, --restart             restart file downloads after failure

Use command getpapers -q viral epidemics -n for the results. -n is no-execute mode so it only shows Open access results

Let's first download 100 papers in xml format in the AMI directory by issuing the query getpapers -q viral epidemics -x -k 100 --outdir test The -o creates a directory viral_epidemics of the articles The -x downloads XML copies of the articles (these can be turned into HTML later using ami). The -p downloads PDF copies. These have the same text but in a different format

It will look something like this:

C:\Users\Kareena\Desktop\openVirus\cmder
λ getpapers --query "viral epidemics" -k -10 --outdir test
info: Searching using eupmc API
info: Found 16918 open access results
warn: This version of getpapers wasn't built with this version of the EuPMC api in mind
warn: getpapers EuPMCVersion: 5.3.2 vs. 6.2 reported by api
info: Limiting to -10 hits
Retrieving results [------------------------------] 0% (eta 0.0s)
info: Done collecting results
info: limiting hits
info: Saving result metadata
info: Full EUPMC result metadata written to eupmc_results.json
info: Individual EUPMC result metadata records written
info: Extracting fulltext HTML URL list (may not be available for all articles)
info: Fulltext HTML URL list written to eupmc_fulltext_html_urls.txt

Step 2

Here we only get xml_full_text files having HTML url links of eupmc.

Let's download the pdf format of the above 100 papers by giving the command getpapers -q 'viral epidemics' -p -k 100 --outdir test_pdf

Above results will all be in pdf formats and stored in viral_epidemics folder Now we have files in Two formats in pdf as well as in xml formats.

let us see another query on n95 face masks

Step 1

let us first see the open access results found on the query n95 face masks

type

getpapers -q "n95 face masks" -n

always use query in between double-quotes or else it takes as two words.

The result will be like this:

info : Searching using eupmc API

info : Running in no-execute mode, so nothing will be downloaded.

info : Found 869 open access results.

warn : This versin of getpapers wasn't built wit this version of the EuPMC api in mind.

warn : getpapers EuPMCVersion: 5.3.2 vs. 6.2 reported by api.

Step 2

In order to gain all papers regarding n95 face masks use

-a, --all : search all papers, not only open access

to download the papers

getpapers -q "n95 face masks" -a -k 100 -o n95

the result will be ike this:

info : Searching using eupmc API

info : Found 1074 results

warn : This version of getpapers wasn't built with version of the EuPMC api in mind

warn : getpapers EuPMCVersion: 5.3.2 vs. 6.2 reported by api

info : limiting to 100 hits

Retrieving results [=====================] 100% (eta 0.05)

info : Done collecting results

info : limiting hits

info : Saving result metadata

info : Full EuPMC result metadata return to eupmc_results.json

info : Individual EUPMC result metadata records written

info : Extracting fulltext HTML URL list (may not be available for all articles)

info : Fulltext HTML URL list written to eupmc_fultext_html_urls.txt

This gives only the .xml fies to the directory n95, as mentioned.

To download the pdf open access files use

getpapers -q "n95 face masks" -p -k 100 -o n95

the result will be gained as like this:

https://drive.google.com/file/d/1JMNgzJTajFqNg1XWWgpX3HLIh4ruZkrm/view?usp=sharing

To search query on _ particular dates_

such as for example let's say on viral epidemics

use the following syntax

getpapers -q ("viral epidemics PUB_YEAR:[2018 TO 2019]") -k 100 -p -x -o viral_epidemics
PUB_YEAR : will give the open access results that were published on the years represented.

the results will be like below:

https://drive.google.com/file/d/1Nc3UtBkIUTmG6VB2N4UaALpHQnC7ya3F/view?usp=sharing


Query for Retrieving Papers from Specific Journals

Beta Tester: Ambreen Hamadani

Specific journals can be searched using the getpapers --query

Eg: --query "viral epidemics Journal:medrxiv" --limit 100 --outdir jun_14_new_3

The output will produce papers specific to medRxiv

Commandline output:

C:\Users\xxx>getpapers --query "viral epidemics Journal:medrxiv" --limit 100 --outdir jun_14_new_3
info: Searching using eupmc API
info: Found 10 open access results
warn: This version of getpapers wasn't built with this version of the EuPMC api in mind
warn: getpapers EuPMCVersion: 5.3.2 vs. 6.3 reported by api
Retrieving results [==============================] 100% (eta 0.0s)
info: Done collecting results
info: Saving result metadata
info: Full EUPMC result metadata written to eupmc_results.json
info: Individual EUPMC result metadata records written
info: Extracting fulltext HTML URL list (may not be available for all articles)
info: Fulltext HTML URL list written to eupmc_fulltext_html_urls.txt

Crosscheck

Crosschecking may be done using Europe PMC

Tester 3: Vaishali Arora

To make Journal-specific searches using getpapers:

  1. Open the Command Prompt.
  2. Give the command as:getpapers -q "n95 face masks Journal:medrxiv" -p 50 -o n95

Result: Successful download of PDF files with 0 errors, 2 warnings

Screen as: https://drive.google.com/file/d/1n8OC1TR1_RP7V3NkrY6UY5PUfvtsRxQl/view?usp=sharing

Clone this wiki locally