Ferret

Installing Ferret on MacOS

GO (currently doesn't work on MacOS):

brew install golang

go get github.com/MontFerret/ferret

Binary: Download the ferret_darwin_x86_64.tar.gz binary from the ferret releases page, unzip your local directory and link an alias to it

alias ferret="/your/local/directory/ferret_darwin_x86_64/ferret"

To test type the ferret command

$ ferret
Welcome to Ferret REPL 0.10.1
Please use `exit` or `Ctrl-D` to exit this program.
>

Further information and tutorials about ferret can be found here

Sample Ferret Code for Scraping a Biorxiv page:

LET doc = DOCUMENT(@url, { driver: "cdp" })
LET authors = (
     FOR auth in ELEMENTS(doc, '.highwire-citation-authors')
       RETURN {
            firstname : INNER_TEXT(auth,'.nlm-given-names'),
            surname : INNER_TEXT(auth,'.nlm-surname'),
            orcid_id : auth.a
       }
)

RETURN {
        abstract: INNER_TEXT(doc, '.abstract'),
        acknowledgements: INNER_TEXT(doc,'.ack'),
        title:  INNER_TEXT(doc,'.highwire-cite-title'),
        pub_time: ELEMENT(doc, 'meta[name="description"]'),
        authors: authors,
        sections: INNER_TEXT_ALL(doc, '[id^="sec-"]')
}

Ferret Command:

ferret --param=url:\"https://www.biorxiv.org/content/10.1101/2020.02.02.931162v2.full\" get_data.fql

Installing Ferret on Ubuntu

Refresh the packages

sudo apt-get update

Make a folder to hold Ferret and download it, then make it executable

mkdir ~/ferret
cd ferret/
wget https://github.com/MontFerret/ferret/releases/download/v0.10.2/ferret_linux_x86_64.tar.gz
tar -zxvf ferret_linux_x86_64.tar.gz
chmod 777 ferret

Now install Docker and then install Chrome to run headlessly

sudo apt install docker.io
sudo docker pull alpeware/chrome-headless-stable
sudo docker run -d -p=0.0.0.0:9222:9222 --name=chrome-headless -v /tmp/chromedata/:/data alpeware/chrome-headless
-stable

Set up an alias to point to Ferret

alias ferret="~/ferret/ferret"

Create a get_data.fql file as above, by running nano and cutting and pasting.

And then run the retrieval

ferret --param=url:\"https://www.biorxiv.org/content/10.1101/2020.02.02.931162v2.full\" getdata.fql >getdata.json

PLEASE SHOW THE OUTPUT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ferret

Installing Ferret on MacOS

Installing Ferret on Ubuntu

Clone this wiki locally