Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to generate the CTD dataset #11

Open
freesunshine0316 opened this issue Apr 1, 2019 · 1 comment
Open

Fail to generate the CTD dataset #11

freesunshine0316 opened this issue Apr 1, 2019 · 1 comment

Comments

@freesunshine0316
Copy link

[lsong10@bhg0031 bran]$ ./extract.sh
Downloading Pubtator dump
--2019-03-31 21:09:22-- ftp:https://ftp.ncbi.nlm.nih.gov/pub/lu/PubTator/bioconcepts2pubtator_offsets.gz
=> ‘/home/lsong10/ws/exp.dep_forest/bran/data/ctd/bioconcepts2pubtator_offsets.gz’
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.13, 2607:f220:41e:250::7
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.13|:21... failed: Connection refused.
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|2607:f220:41e:250::7|:21... failed: Network is unreachable.
Converting data from pubtator to tsv format
usage: process_CDR_data.py [-h] -i INPUT_FILE -d OUTPUT_DIR -f
OUTPUT_FILE_SUFFIX [-s MAX_SEQ] [-a FULL_ABSTRACT]
[-p PUBMED_FILTER] [-r RELATIONS]
[-w WORD_PIECE_CODES] [-t SHARDS]
[-x EXPORT_ALL_EPS] [-n EXPORT_NEGATIVES]
[-e ENCODING] [-m MAX_DISTANCE]
process_CDR_data.py: error: argument -a/--full_abstract: expected one argument
split: extra operand ‘up’
Try 'split --help' for more information.
map relations to smaller set
awk: cmd. line:1: fatal: cannot open file positive_0_genia' for reading (No such file or directory) seperate data into train dev test positive train 50 500 positive dev 50 500 positive test 50 500 negative train 50 500 awk: cmd. line:1: fatal: cannot open file negative_0_genia' for reading (No such file or directory)
negative dev 50 500
awk: cmd. line:1: fatal: cannot open file negative_0_genia' for reading (No such file or directory) negative test 50 500 awk: cmd. line:1: fatal: cannot open file negative_0_genia' for reading (No such file or directory)

@patverga
Copy link
Owner

Sorry for the delayed response. It looks like a network issue caused the download of the initial file to fail: "Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.13|:21... failed: Connection refused.". This is causing all of the subsequent errors to print because each of the following steps require this initial file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants