Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to construe rna004 model [email protected]_DRACH bedMethyl output? #109

Closed
kir1to455 opened this issue Jan 9, 2024 · 3 comments
Closed

Comments

@kir1to455
Copy link

kir1to455 commented Jan 9, 2024

Hi, I have successfully run Dorado with RNA004 m6A model and used modkit to convert my modbam file to bedmethyl output.
${DoradoDir}/dorado basecaller --min-qscore 7 --verbose --emit-moves -k 14 --secondary=no --reference ${indexDir}/gencode.v43.transcripts.fa -x cuda:0 ${ModelDir}/[email protected] ${pod5path}/HEK293T.pod5 --modified-bases m6A_DRACH > ${ModDir}/hac.pass.m6A.mod.pass.bam
samtools sort -o hac.pass.m6A.mod.sorted.bam -@ 40 hac.pass.m6A.mod.pass.bam
samtools index -@ 40 hac.pass.m6A.mod.sorted.bam
${modkitDir}/modkit pileup ${bamfile}/hac.pass.m6A.mod.sorted.bam ${bamfile}/hac.pass.m6A.bed --log-filepath ${bamfile}/hac.log --num-reads 20084 --max-depth 20000 -t 35
However, I found that the -t parameter of modkit does not use multiple CPUs to improve the speed of tasks.
And Question2 is that I found the Nvalid_cov of the adjacent position of the same transcripts is particularly large.
image
If I don't get it wrong, because the training is DRACH motif. If there are two adjacent A at the same time, one of them is fake.
And the the column Nnocall is equivalent to depth.
Here, it is clear that 232 is a fake m6A site.
But how to construe the Ncanonical = 1 in this fake position?

Best wishes,
Kirito

@ArtRand
Copy link
Contributor

ArtRand commented Jan 9, 2024

Hello @kir1to455,

However, I found that the -t parameter of modkit does not use multiple CPUs to improve the speed of tasks.

When you have a reference FASTA with many short sequences (like transcripts) the parallelism in modkit doesn't work ideally. Improving this performance is a top priority for the next release. I'll let you know as soon as this work is completed.

To your second question:

The output default of modkit pileup as you have it will report all genomic positions with at least 1 modification call. To me it looks like position 231 is not a genomic DRACH motif, however 1 read has a DRACH motif aligned there, the majority of the reads align to position 232.

@ArtRand
Copy link
Contributor

ArtRand commented Jan 27, 2024

Hello @kir1to455,

The latest modkit release candidate v0.2.5-rc1 should have much better performance when using a transcriptome reference. Let me know if you have a chance to try it and or if you encounter any problems.

@kir1to455
Copy link
Author

Hi, @ArtRand
The latest modkit release is currently performing well in transcriptome reference.

Best wishes,
kirito

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants