Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error at end of process #15

Closed
splaisan opened this issue Oct 13, 2017 · 6 comments
Closed

error at end of process #15

splaisan opened this issue Oct 13, 2017 · 6 comments

Comments

@splaisan
Copy link

Flo works but at the end dies due to some CDS extraction issue.
Any idea what I should do to fix this?

...
ln -s /data/nanopore/2741_MinION/flo_results/R64_genomic_cleaned.gff run/../R64_genomic_cleaned/input.gff
/data/nanopore/2741_MinION/flo_results/flo_canu_contigs_R64/gff_compare.rb cds run/source.fa run/target.fa run/../R64_genomic_cleaned/input.gff run/../R64_genomic_cleaned/lifted_cleaned.gff > run/../R64_genomic_cleaned/unmapped.txt
gt extractfeat -type CDS -join -retainids -seqfile run/source.fa -matchdescstart run/../R64_genomic_cleaned/input.gff > run/../R64_genomic_cleaned/input.cds.fa
gt extractfeat: error: the file run/../R64_genomic_cleaned/input.gff is not sorted (example: line 5 and 6)
/usr/lib/ruby/vendor_ruby/rake/file_utils.rb:66:in `block in create_shell_runner': Command failed with status (1): [gt extractfeat -type CDS -join -retainids ...] (RuntimeError)
        from /usr/lib/ruby/vendor_ruby/rake/file_utils.rb:57:in `sh'
        from /usr/lib/ruby/vendor_ruby/rake/file_utils_ext.rb:37:in `sh'
        from /data/nanopore/2741_MinION/flo_results/flo_canu_contigs_R64/gff_compare.rb:25:in `extract_cds'
        from /data/nanopore/2741_MinION/flo_results/flo_canu_contigs_R64/gff_compare.rb:45:in `<main>'
rake aborted!
Command failed with status (1): [/data/nanopore/2741_MinION/flo_results/flo...]
/data/nanopore/2741_MinION/flo_results/flo_canu_contigs_R64/Rakefile:56:in `block (2 levels) in <top (required)>'
/data/nanopore/2741_MinION/flo_results/flo_canu_contigs_R64/Rakefile:40:in `each'
/data/nanopore/2741_MinION/flo_results/flo_canu_contigs_R64/Rakefile:40:in `block in <top (required)>'
Tasks: TOP => default
(See full trace by running task with --trace)
@yeban
Copy link
Collaborator

yeban commented Oct 29, 2017

Hi. If you could pre-process the input GFF using genometools gt gff3 -tidy -sort -addids -retainids input.gff > input_sorted_and_tidied.gff that should make this problem go away. I will include this in the documentation. Thanks.

@yeban yeban closed this as completed in 8b7372d Oct 29, 2017
@yeban
Copy link
Collaborator

yeban commented Oct 29, 2017

I modified gff_compare.rb script to automatically sort the GFFs. Please reopen if the problem persists.

@14zac2
Copy link

14zac2 commented Dec 2, 2020

Hi! I'm having the same error and I ran the gt gff3 -tidy -sort -addids -retainids input.gff > input_sorted_and_tidied.gff on my gff file.

...
Processing NW_015365749.1
Processing NW_015365750.1
Processing NW_015365751.1
mkdir run/genomic_transcripts_sorted_and_tidied_long
liftOver -gff /path/genomic_transcripts_sorted_and_tidied_long.gff run/liftover.chn run/genomic_transcripts_sorted_and_tidied_long/lifted.gff3 run/genomic_transcripts_sorted_and_tidied_long/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
/path/flo/gff_recover.rb run/genomic_transcripts_sorted_and_tidied_long/lifted.gff3 2> run/genomic_transcripts_sorted_and_tidied_long/lifted_cleanup.log | gt gff3 -tidy -sort -addids -retainids - > run/genomic_transcripts_sorted_and_tidied_long/lifted_cleaned.gff 2>> run/genomic_transcripts_sorted_and_tidied_long/lifted_cleanup.log
rake aborted!
Command failed with status (1): [/path/flo/gff_recover.rb run/GCF_001...]
/path/flo/Rakefile:60:in block (2 levels) in <top (required)>' /path/flo/Rakefile:40:in each'
/path/flo/Rakefile:40:in block in <top (required)>' /path/gems/gems/rake-13.0.1/exe/rake:27:in <top (required)>'
Tasks: TOP => default
(See full trace by running task with --trace)

Since I sorted and tidied my gffs as recommended, do you have any further advice?

@yeban
Copy link
Collaborator

yeban commented Dec 3, 2020

@14zac2 I think your issue might be more similar to #34. If you run /path/flo/gff_recover.rb run/genomic_transcripts_sorted_and_tidied_long/lifted.gff3 do you get an error saying 'Please install the bio gem' first?

@14zac2
Copy link

14zac2 commented Dec 4, 2020

@yeban thank you for pointing me there! I think I found the issue. I don't get the error you mentioned. After running the equivalent to:

gff_recover.rb run/ref_v5.6_exons3_chromosome_2/lifted.gff3 > processed.gff 2> unprocessed.gff

And then running:

gt gff3 -tidy -sort -addids -retainids processed.gff > further_processed.gff

I get the following error:

...
warning: the multi-feature with ID "cds-XP_015332030.1" on line 301782 in file "processed.gff" has a different strand than its counterpart on line 301780 (possible in rare cases)
warning: the multi-feature with ID "cds-XP_015332030.1" on line 301784 in file "processed.gff" has a different strand than its counterpart on line 301780 (possible in rare cases)
warning: the multi-feature with ID "cds-XP_015332297.1" on line 302166 in file "processed.gff" has a different strand than its counterpart on line 302164 (possible in rare cases)
warning: the multi-feature with ID "cds-XP_015332491.1" on line 302726 in file "processed.gff" has a different strand than its counterpart on line 302725 (possible in rare cases)
warning: the multi-feature with ID "cds-XP_015332597.1" on line 302984 in file "processed.gff" has a different strand than its counterpart on line 302982 (possible in rare cases)
warning: the multi-feature with ID "cds-XP_015332597.1" on line 302986 in file "processed.gff" has a different strand than its counterpart on line 302982 (possible in rare cases)
gt gff3: error: Parent "gene-Npbwr2" on line 47 in file "processed.gff" was not defined (via "ID=")

And certainly enough, I believe that the problem is my mRNAs have parents that do not exist because I filtered out the genes from the gff file. Do you know if there is a fix to this? I can try using sed to get rid of all of those instances and see if that helps.

EDIT:

I used sed to get rid of the problematic "Parent" instances (there were ~1,200 of them - perhaps because my original gff file contained pseudogenes and various sorts of RNAs, I'm not sure). Then when doing gt gff3 -tidy -sort -addids -retainids processed.gff > further_processed.gff, I bunch of new warnings were introduced, e.g.:

...
warning: CDS feature on line 245716 in file "processed_sed.gff" has the wrong phase 2 -> correcting it to 0
warning: CDS feature on line 245717 in file "processed_sed.gff" has the wrong phase 0 -> correcting it to 1
warning: CDS feature on line 245724 in file "processed_sed.gff" has the wrong phase 0 -> correcting it to 2
warning: CDS feature on line 245726 in file "processed_sed.gff" has the wrong phase 1 -> correcting it to 0
warning: CDS feature on line 245728 in file "processed_sed.gff" has the wrong phase 2 -> correcting it to 1
warning: CDS feature on line 245730 in file "processed_sed.gff" has the wrong phase 0 -> correcting it to 2
warning: CDS feature on line 245732 in file "processed_sed.gff" has the wrong phase 0 -> correcting it to 2
warning: CDS feature on line 245734 in file "processed_sed.gff" has the wrong phase 2 -> correcting it to 1
warning: CDS feature on line 245735 in file "processed_sed.gff" has the wrong phase 0 -> correcting it to 2
...

but I ended up with further_processed.gff which has transcripts, exons, and CDS attributes. Do you think this is an okay final file? Or should I be wary because of the warnings?

Also, my biggest apologies for making a mess of your "Issues" section by referencing the wrong post and closing my old one!!

@yeban
Copy link
Collaborator

yeban commented Dec 4, 2020

The warning CDS feature on line 245716 in file "processed_sed.gff" has the wrong phase 2 -> correcting it to 0 is okay. In fact, it is one of the reasons to use genometools (gt) after lift over - it fixes such issues in the lifted gff aside from tidying it up. The warning the multi-feature with ID "cds-XP_015332030.1" on line 301782 in file "processed.gff" has a different strand than its counterpart on line 301780 (possible in rare cases) is new to me. What you can do is, similarly run gt on the gff file you are trying to lift over and see if it gives the same warning. That will tell us if the features were on different strands to begin with, or if it was caused during lift over. If it is the latter, it might be worth investigating further.

If the above results in something worth discussing further, probably best to open a new issue so we don't end up annoying the original poster with emails any further :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants