How to construe rna004 model [email protected]_DRACH bedMethyl output? #109

kir1to455 · 2024-01-09T13:25:57Z

Hi, I have successfully run Dorado with RNA004 m6A model and used modkit to convert my modbam file to bedmethyl output.
${DoradoDir}/dorado basecaller --min-qscore 7 --verbose --emit-moves -k 14 --secondary=no --reference ${indexDir}/gencode.v43.transcripts.fa -x cuda:0 ${ModelDir}/[email protected] ${pod5path}/HEK293T.pod5 --modified-bases m6A_DRACH > ${ModDir}/hac.pass.m6A.mod.pass.bam
samtools sort -o hac.pass.m6A.mod.sorted.bam -@ 40 hac.pass.m6A.mod.pass.bam
samtools index -@ 40 hac.pass.m6A.mod.sorted.bam
${modkitDir}/modkit pileup ${bamfile}/hac.pass.m6A.mod.sorted.bam ${bamfile}/hac.pass.m6A.bed --log-filepath ${bamfile}/hac.log --num-reads 20084 --max-depth 20000 -t 35
However, I found that the -t parameter of modkit does not use multiple CPUs to improve the speed of tasks.
And Question2 is that I found the Nvalid_cov of the adjacent position of the same transcripts is particularly large.

If I don't get it wrong, because the training is DRACH motif. If there are two adjacent A at the same time, one of them is fake.
And the the column Nnocall is equivalent to depth.
Here, it is clear that 232 is a fake m6A site.
But how to construe the Ncanonical = 1 in this fake position?

Best wishes,
Kirito

The text was updated successfully, but these errors were encountered:

ArtRand · 2024-01-09T14:57:48Z

Hello @kir1to455,

However, I found that the -t parameter of modkit does not use multiple CPUs to improve the speed of tasks.

When you have a reference FASTA with many short sequences (like transcripts) the parallelism in modkit doesn't work ideally. Improving this performance is a top priority for the next release. I'll let you know as soon as this work is completed.

To your second question:

The output default of modkit pileup as you have it will report all genomic positions with at least 1 modification call. To me it looks like position 231 is not a genomic DRACH motif, however 1 read has a DRACH motif aligned there, the majority of the reads align to position 232.

ArtRand · 2024-01-27T01:04:38Z

Hello @kir1to455,

The latest modkit release candidate v0.2.5-rc1 should have much better performance when using a transcriptome reference. Let me know if you have a chance to try it and or if you encounter any problems.

kir1to455 · 2024-07-22T02:16:32Z

Hi, @ArtRand
The latest modkit release is currently performing well in transcriptome reference.

Best wishes,
kirito

kir1to455 closed this as completed Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to construe rna004 model [email protected]_DRACH bedMethyl output? #109

How to construe rna004 model [email protected]_DRACH bedMethyl output? #109

kir1to455 commented Jan 9, 2024 •

edited

Loading

ArtRand commented Jan 9, 2024 •

edited

Loading

ArtRand commented Jan 27, 2024

kir1to455 commented Jul 22, 2024

How to construe rna004 model [email protected]_DRACH bedMethyl output? #109

How to construe rna004 model [email protected]_DRACH bedMethyl output? #109

Comments

kir1to455 commented Jan 9, 2024 • edited Loading

ArtRand commented Jan 9, 2024 • edited Loading

ArtRand commented Jan 27, 2024

kir1to455 commented Jul 22, 2024

kir1to455 commented Jan 9, 2024 •

edited

Loading

ArtRand commented Jan 9, 2024 •

edited

Loading