-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail to generate the CTD dataset #11
Comments
Sorry for the delayed response. It looks like a network issue caused the download of the initial file to fail: "Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.13|:21... failed: Connection refused.". This is causing all of the subsequent errors to print because each of the following steps require this initial file. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
[lsong10@bhg0031 bran]$ ./extract.sh
Downloading Pubtator dump
--2019-03-31 21:09:22-- ftp:https://ftp.ncbi.nlm.nih.gov/pub/lu/PubTator/bioconcepts2pubtator_offsets.gz
=> ‘/home/lsong10/ws/exp.dep_forest/bran/data/ctd/bioconcepts2pubtator_offsets.gz’
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.13, 2607:f220:41e:250::7
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.13|:21... failed: Connection refused.
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|2607:f220:41e:250::7|:21... failed: Network is unreachable.
Converting data from pubtator to tsv format
usage: process_CDR_data.py [-h] -i INPUT_FILE -d OUTPUT_DIR -f
OUTPUT_FILE_SUFFIX [-s MAX_SEQ] [-a FULL_ABSTRACT]
[-p PUBMED_FILTER] [-r RELATIONS]
[-w WORD_PIECE_CODES] [-t SHARDS]
[-x EXPORT_ALL_EPS] [-n EXPORT_NEGATIVES]
[-e ENCODING] [-m MAX_DISTANCE]
process_CDR_data.py: error: argument -a/--full_abstract: expected one argument
split: extra operand ‘up’
Try 'split --help' for more information.
map relations to smaller set
awk: cmd. line:1: fatal: cannot open file
positive_0_genia' for reading (No such file or directory) seperate data into train dev test positive train 50 500 positive dev 50 500 positive test 50 500 negative train 50 500 awk: cmd. line:1: fatal: cannot open file
negative_0_genia' for reading (No such file or directory)negative dev 50 500
awk: cmd. line:1: fatal: cannot open file
negative_0_genia' for reading (No such file or directory) negative test 50 500 awk: cmd. line:1: fatal: cannot open file
negative_0_genia' for reading (No such file or directory)The text was updated successfully, but these errors were encountered: