pseudouridylation extracting #193

Papareddy · 2024-05-31T14:53:47Z

Hi,

Thanks for this nice piece of software.

With Dorado 0.7, pseudouridination can be predicted. I was wondering if you have any tips and manual on how to identify PseU from the bam file generated after basecalling Modkit?

Cheers,
Ranj

Theo-Nelson · 2024-06-02T21:18:03Z

Dear Ranj,

(Not the developer)

You can see how I've implemented Modkit for this question in this notebook: https://github.com/Theo-Nelson/SMS-colab/blob/main/C8/HCV_AND_METTL3_KO_ANALYSIS.ipynb

Unfortunately, it seems like there are places which call both pseudo-Uridine and m6a simultaneously - this indicates to me at least that there is some kind of bug. I have not had a chance to track this further.

Sincerely,
Theo

ArtRand · 2024-06-03T16:52:04Z

Hello @Papareddy,

It depends what you're trying to test with your experiment. If you want to find pseU positions, you could try using modkit pileup and looking for genome positions with high modification percentages. If you tell me what you're doing, maybe I can give you a little more advice.

@Theo-Nelson,

Do you mean that there are genomic positions with a bedmethyl record for pseU and m6A like this?

ctg_01 39      40      a       1       +       39      40      255,0,0 1       0.00    0       1       0       5       119     211     0
ctg_01 39      40      17802   209     +       39      40      255,0,0 209     22.49   47      162     0       5       119     3       0

and

ctg_01 79      80      a       285     +       79      80      255,0,0 285     18.25   52      233     0       6       32      10      0
ctg_01 79      80      17802   6       +       79      80      255,0,0 6       16.67   1       5       0       6       32      289     0

The explination for this is as follows, take the first example. The reference has a U (T in the FASTA) at position 39, 209 reads have valid pseU calls and 1 has a m6A call, the one read with the m6A call has a T->A mismatch and there is a single, canonical call at that position. In the second example, the reverse is true, there is a reference A and 285 reads reporting a potential m6A modification call, and 6 reads with a A->T mismatch. Usually this mismatch records have much lower valid coverage then the "correct" modification record. If you want to remove them, I'd recommend using awk either as a pipe from the output or as a secondary step (in case you want to count the mismatches later).

e.g.

$ modkit pileup ${mod_bam} - | awk '$4=="17802"' > pileup_pseU.bed

# or

$ modkit pileup ${mod_bam} pileup_all_mods.bed
$ awk '$4==17802' pileup_all_mods.bed > pileup_pseU.bed

Hope this helps, happy to answer any additional questions.

ArtRand · 2024-06-24T13:45:03Z

Hello @Papareddy,

Feel free to re-open this issue if you have any additional questions.

ArtRand added the question Further information is requested label Jun 3, 2024

ArtRand closed this as completed Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pseudouridylation extracting #193

pseudouridylation extracting #193

Papareddy commented May 31, 2024

Theo-Nelson commented Jun 2, 2024

ArtRand commented Jun 3, 2024 •

edited

Loading

ArtRand commented Jun 24, 2024

pseudouridylation extracting #193

pseudouridylation extracting #193

Comments

Papareddy commented May 31, 2024

Theo-Nelson commented Jun 2, 2024

ArtRand commented Jun 3, 2024 • edited Loading

ArtRand commented Jun 24, 2024

ArtRand commented Jun 3, 2024 •

edited

Loading