Skip to content
petermr edited this page May 4, 2020 · 4 revisions

Installing Ferret on MacOS

GO (currently doesn't work on MacOS):

brew install golang

go get github.com/MontFerret/ferret

Binary: Download the ferret_darwin_x86_64.tar.gz binary from the ferret releases page, unzip your local directory and link an alias to it

alias ferret="/your/local/directory/ferret_darwin_x86_64/ferret"

To test type the ferret command

$ ferret
Welcome to Ferret REPL 0.10.1
Please use `exit` or `Ctrl-D` to exit this program.
>  

Further information and tutorials about ferret can be found here

Sample Ferret Code for Scraping a Biorxiv page:

LET doc = DOCUMENT(@url, { driver: "cdp" })
LET authors = (
     FOR auth in ELEMENTS(doc, '.highwire-citation-authors')
       RETURN {
            firstname : INNER_TEXT(auth,'.nlm-given-names'),
            surname : INNER_TEXT(auth,'.nlm-surname'),
            orcid_id : auth.a
       }
)

RETURN {
        abstract: INNER_TEXT(doc, '.abstract'),
        acknowledgements: INNER_TEXT(doc,'.ack'),
        title:  INNER_TEXT(doc,'.highwire-cite-title'),
        pub_time: ELEMENT(doc, 'meta[name="description"]'),
        authors: authors,
        sections: INNER_TEXT_ALL(doc, '[id^="sec-"]')
}

Ferret Command:

ferret --param=url:\"https://www.biorxiv.org/content/10.1101/2020.02.02.931162v2.full\" get_data.fql

Installing Ferret on Ubuntu

Refresh the packages

sudo apt-get update

Make a folder to hold Ferret and download it, then make it executable

mkdir ~/ferret
cd ferret/
wget https://github.com/MontFerret/ferret/releases/download/v0.10.2/ferret_linux_x86_64.tar.gz
tar -zxvf ferret_linux_x86_64.tar.gz
chmod 777 ferret

Now install Docker and then install Chrome to run headlessly

sudo apt install docker.io
sudo docker pull alpeware/chrome-headless-stable
sudo docker run -d -p=0.0.0.0:9222:9222 --name=chrome-headless -v /tmp/chromedata/:/data alpeware/chrome-headless
-stable

Set up an alias to point to Ferret

alias ferret="~/ferret/ferret"

Create a get_data.fql file as above, by running nano and cutting and pasting.

And then run the retrieval

ferret --param=url:\"https://www.biorxiv.org/content/10.1101/2020.02.02.931162v2.full\" getdata.fql >getdata.json

PLEASE SHOW THE OUTPUT

Clone this wiki locally