-
Fixes
IndexError: list index out of range
when--sqanti_output
is set (#186). -
Fixes
IndexError: list index out of range
in printing grouped transcript models TPMs (#187). -
Reduced running time when
--sqanti_output
is set.
Major novelties and improvements:
-
Significant speed-up on datasets containing regions with extremely high coverage, often encountered on mitochondrial chromosomes (#97).
-
Added support for Illumina reads for spliced alignment correction (thanks to @rkpfeil).
-
Added support YAML files (thanks to @rkpfeil). Old options
--bam_list
and--fastq_list
are still availble, but deprecated since this version.
Transcript discovery and GTF processing:
-
Fixed missing genes in extended GTF (#140, #147, #151, #175).
-
Fixed strand detection and output of transcripts with
.
strand (#107). -
Added
--report_canonical
and--polya_requirement
options that allows to control level of filtering of output transcripts based on canonical splice sites and the presence of poly-A tails. (#128) -
Added check for input GTFs (#155).
-
Extract CDS, other features and attributes from reference GTF to the output GTFs (#176).
-
Reworked novel gene merging procedure (#164).
-
Revamped algorithm for assigning reads to novel transcripts and their quantification (#127).
Read assignment and quantification:
-
Optimized read-to-isoform assignment algorithm.
-
Added
gene_assignment_type
attribute to read assignments. -
Fixed duplicated records in
read_assignments.tsv
(#168). -
Improved gene and transcript quantification. Only unique assignments are now used for transcript quantification. Added more options for quantification strategies (
--gene_quantification
and--transcript_quantification
). -
Improved consistency between
trascript_counts.tsv
andtranscript_model_counts.tsv
(#137). -
Introduced mapping quality filtering:
--min_mapq
,--inconsistent_mapq_cutoff
and--simple_alignments_mapq_cutoff
(#110).
Minor fixes and improvements:
-
Added
--bam_tags
option to import additional information from BAM files to read assignments output. -
Large output files are now gzipped by default,
--no_gzip
can be used to keep uncompressed output (#154). -
BAM stats are now printed to the log (#139).
-
Various minor fixes and requests (#106, #141, #143, #146, #179).
Special acknowledgement to @almiheenko for testing and reviewing PRs, and to @alexandrutomescu for supporting the project.
- Fixed
UnboundLocalError: local variable 'match' referenced before assignment
error in SQANTI-like output.
-
Fixed read to novel models assignment.
-
Improved command line options for providing multiple files, added
--prefix
option. -
Additional checks for various unusual cases in input GTFs.
-
Do not output empty files when no GTF is provided.
-
Unspliced novel transcripts are not reported by the default for ONT data, use
--report_novel_unspliced
to generate them. -
When multiple BAM/FASTQ files are provided via
--bam
/--fastq
, they are treated as different replicas/samples of the same experiment; a single GTF and per-sample counts are generated automatically. -
10-15 times lower RAM consumption with the same running time.
-
~5 times lower disk consumption for temporary files.
-
--low_memory
option has no effect (used by default);--high_memory
mimics old behavior by storing alignments in RAM. -
Read assignment reports transcript start and end (TSS/TES) matches.
-
--sqanti_output
generates SQANTI-like output for novel vs reference transcripts. -
Resulting annotation contains exon ids.
-
Supplementary gene attributes are copied from the reference annotation to the output annotations.
-
Improved
--resume
and--force
behaviour. -
--model_construction_strategy sensitive_pacbio
is now more sensitive.
-
Fixed strand detection that caused lower precision for novel transcripts.
-
Fixed known transcript filtering that caused lower recall.
-
Fixed duplicate transcript entries in the output annotation.
-
Fixed duplicate canonical attribute in extended annotation.
-
Fix
--resume
option when relative paths were provided.
-
Fixed error caused by introns of length 0 (strange corner case, but it does happen).
-
Fixed error when using a read grouping file.
-
Implement
--resume
option for resuming failed runs. -
Fix SQANTI-like output for raw reads.
-
Fix read strand detection, improves transcript discovery as well.
-
Simplify transcript naming, IDs of known transcripts are preserved in the output.
-
More information about novel transcripts in GTF
- Fix GTF attributes, thanks to @rsalz.
- Fix
--check_canonical
option.
-
Annotation-free mode for de novo transcript discovery.
-
Significant speed-up.
-
Extended annotation (all reference + novel transcripts) is now part of the output.
-
Intermediate BAM files have nicer names.
-
Proper single-thread mode without thread pool usage.
-
New options for controlling quantification strategies. Default behaviour is changed as well.
-
New option
--genedb_output
for providing a separate folder for gene database in case the output directory is located on a shared disk. -
Possibility to provide read group tables in gzipped format.
-
Fixed
--check_canonical
option. -
Improved running time for the read assignment step (noticeable only for genes with > 100 exons).
- Minor fixes and improvement in output files. Note, that GTFs and some other files have now multiline headers.
-
Parallel processing of transcript model construction phase.
-
Minor improvements in quantification of reference transcripts.
-
Fixed counts/TPM for novel transcript models.
-
Fixed processing of BAM records without sequence data (e.g. secondary alignment).
-
Fixed
list index out of range
bug in long read counter.
-
Improved recall by introducing relative coverage cutoffs.
-
More careful handling of transcript terminal positions.
-
Fixed GTF to BED conversion.
-
Completely new transcript discovery algorithm with significantly higher recall.
-
Algorithm for read alignment correction.
-
Support for technical replicas within a single sample.
-
Significantly improved running time and RAM consumption;
-
Annotation is now fed into minimap2;
-
Extended output format.
- Support for GFF3 mRNA features.
-
Support for BAM files with =/X in CIGAR strings;
-
Fixed canonical splice site detection.
-
Multi-threading;
-
Intermediate results are saved to disc to enable quick restart via --read_assignments option;
-
Significantly improved precision for novel transcript detection;
-
Secondary alignments are now used by default;
-
Fixed several bugs in inconsistency detection algorithm;
-
Reworked polyA detection and reporting once again;
-
Slightly modified read assignment output format;
-
More informative GTF output;
-
Removed --has_polya option, --polya_trimmed is now used as the opposite;
-
Added --check_canonical option.
-
Significantly reworked polyA detection and reporting;
-
Improved detection of inconsistencies, added several new event types;
-
Better recall and precision for read assignment algorithm;
-
Fixed several bug and flaws;
-
Added script for counting simple stats for GTF files (srt/gtf_stats.py).
-
Initial release.