Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pseudouridylation extracting #193

Closed
Papareddy opened this issue May 31, 2024 · 3 comments
Closed

pseudouridylation extracting #193

Papareddy opened this issue May 31, 2024 · 3 comments
Labels
question Further information is requested

Comments

@Papareddy
Copy link

Hi,

Thanks for this nice piece of software.

With Dorado 0.7, pseudouridination can be predicted. I was wondering if you have any tips and manual on how to identify PseU from the bam file generated after basecalling Modkit?

Cheers,
Ranj

@Theo-Nelson
Copy link

Dear Ranj,

(Not the developer)

You can see how I've implemented Modkit for this question in this notebook: https://github.com/Theo-Nelson/SMS-colab/blob/main/C8/HCV_AND_METTL3_KO_ANALYSIS.ipynb

Unfortunately, it seems like there are places which call both pseudo-Uridine and m6a simultaneously - this indicates to me at least that there is some kind of bug. I have not had a chance to track this further.

Sincerely,
Theo

@ArtRand
Copy link
Contributor

ArtRand commented Jun 3, 2024

Hello @Papareddy,

It depends what you're trying to test with your experiment. If you want to find pseU positions, you could try using modkit pileup and looking for genome positions with high modification percentages. If you tell me what you're doing, maybe I can give you a little more advice.

@Theo-Nelson,

Do you mean that there are genomic positions with a bedmethyl record for pseU and m6A like this?

ctg_01 39      40      a       1       +       39      40      255,0,0 1       0.00    0       1       0       5       119     211     0
ctg_01 39      40      17802   209     +       39      40      255,0,0 209     22.49   47      162     0       5       119     3       0

and

ctg_01 79      80      a       285     +       79      80      255,0,0 285     18.25   52      233     0       6       32      10      0
ctg_01 79      80      17802   6       +       79      80      255,0,0 6       16.67   1       5       0       6       32      289     0

The explination for this is as follows, take the first example. The reference has a U (T in the FASTA) at position 39, 209 reads have valid pseU calls and 1 has a m6A call, the one read with the m6A call has a T->A mismatch and there is a single, canonical call at that position. In the second example, the reverse is true, there is a reference A and 285 reads reporting a potential m6A modification call, and 6 reads with a A->T mismatch. Usually this mismatch records have much lower valid coverage then the "correct" modification record. If you want to remove them, I'd recommend using awk either as a pipe from the output or as a secondary step (in case you want to count the mismatches later).

e.g.

$ modkit pileup ${mod_bam} - | awk '$4=="17802"' > pileup_pseU.bed

# or

$ modkit pileup ${mod_bam} pileup_all_mods.bed
$ awk '$4==17802' pileup_all_mods.bed > pileup_pseU.bed

Hope this helps, happy to answer any additional questions.

@ArtRand ArtRand added the question Further information is requested label Jun 3, 2024
@ArtRand
Copy link
Contributor

ArtRand commented Jun 24, 2024

Hello @Papareddy,

Feel free to re-open this issue if you have any additional questions.

@ArtRand ArtRand closed this as completed Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants