Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flo failed at the GenomTools section #32

Closed
Homap opened this issue Feb 21, 2020 · 3 comments
Closed

Flo failed at the GenomTools section #32

Homap opened this issue Feb 21, 2020 · 3 comments

Comments

@Homap
Copy link

Homap commented Feb 21, 2020

Hello,

I ran flo on my data to convert the gff coordinates from one assembly version to the other. I have the files, lifted.gff3 and unlifted.gff3. The lifted.gff3 looks fine in terms of the size comparison with the original gff3.

However, at the end, I get the following error:

liftOver -gff GCF_000698965.1_ASM69896v1_genomic.flo.gff run/liftover.chn run/GCF_000698965.1_ASM69896v1_genomic.flo/lifted.gff3 run/GCF_000698965.1_ASM69896v1_genomic.flo/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
/crex/proj/uppstore2017180/private/homap/ostrich_Z_diversity/src/flo/gff_recover.rb run/GCF_000698965.1_ASM69896v1_genomic.flo/lifted.gff3 2> unprocessed.gff | gt gff3 -tidy -sort -addids -retainids - > run/GCF_000698965.1_ASM69896v1_genomic.flo/lifted_cleaned.gff
warning: line 1 in file "-" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically
gt gff3: error: line 1 in file "-" does not contain 9 tab (\t) separated fields
rake aborted!
Command failed with status (1): [/crex/proj/uppstore2017180/private/homap/o...]
/crex/proj/uppstore2017180/private/homap/ostrich_Z_diversity/src/flo/Rakefile:60:in `block (2 levels) in <top (required)>'
/crex/proj/uppstore2017180/private/homap/ostrich_Z_diversity/src/flo/Rakefile:40:in `each'
/crex/proj/uppstore2017180/private/homap/ostrich_Z_diversity/src/flo/Rakefile:40:in `block in <top (required)>'
Tasks: TOP => default
(See full trace by running task with --trace)

I was wondering how I could resolve this issue?

@yeban
Copy link
Collaborator

yeban commented Feb 21, 2020

You are likely getting this error because the output of gff_recover.rb script is empty. This script is run on lifted.gff3 to remove any non-sensical annotations like genes mapped to different scaffolds. The filtered output which is then piped to genome tools for validation.

If you can share lifted.gff3 file, I might be able to guess why the output of gff_recover.rb is empty.

@Homap
Copy link
Author

Homap commented Feb 21, 2020

Thanks a lot for your prompt reply. Please find the lifted gff attached.
I am now also trying myself to write some Python scripts to clean it but of course, It would be really wonderful if you could have a look as well.

Thank you,
Homa
lifted.gff3.gz

@yeban
Copy link
Collaborator

yeban commented Feb 25, 2020

So the gff_recover.rb script is failing because you have tabs in your 9th column. Tabs within a column must be escaped: GFF3 spec.

Even otherwise gff_recover.rb would largely be unable to work with your GFF as it contains too many features that the script does not recognise. I wrote the script for our simpler use case: transcripts and their coding sequences. Writing your own script to clean up lifted.gff3 is thus a good idea.

@yeban yeban closed this as completed Nov 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants