Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bedMethyl file has no motif information #216

Open
uuwuyi opened this issue Jun 20, 2024 · 7 comments
Open

bedMethyl file has no motif information #216

uuwuyi opened this issue Jun 20, 2024 · 7 comments
Labels
question Further information is requested

Comments

@uuwuyi
Copy link

uuwuyi commented Jun 20, 2024

Hi there,

Here is the whole process I run after dorado 5mC and 5hmC basecalling.

dorado aligner reference.fasta dorado_5mC_5hmC.bam > aligned.bam
samtools sort aligned.bam > aligned_sort.bam
samtools index aligned_sort.bam

I have tried
modkit pileup aligned_sort.bam pileup_test.bed --cpg --log-filepath pileup.log --ref reference.fasta and

modkit pileup aligned_sort.bam pileup.bed --log-filepath pileup.log --ref reference.fasta -t 8
modkit find-motifs --in-bedmethyl pileup.bed --ref reference.fasta -o motif.tsv -t 8

However, in the "modified base code and motif" column of the modbed file, it only contains the modified base code not motif. and the motif.tsv is empty. Do you have an idea what the problem would be? Thanks!

@ArtRand
Copy link
Contributor

ArtRand commented Jun 20, 2024

Hello @uuwuyi,

However, in the "modified base code and motif" column of the modbed file,

I'm not sure exactly which file you're referring to. Could you tell me the name of the file you're referring to and paste the first few rows of that file?

Also, could you be sure that you're using the latest release candidate?

You can check by doing

$ modkit --version
# should say mod_kit 0.3.1

Happy to help, I just need a little more information.

@ArtRand ArtRand added the troubleshooting workflow and data preparation questions label Jun 20, 2024
@uuwuyi
Copy link
Author

uuwuyi commented Jun 20, 2024

Hi @ArtRand,

my modkit version is 0.3.0. Below is part of the pileup.bed file. Sorry for my confusing way to call it. I mean the bedMethyl table. In the fourth column, it only contains the modified base code. I wonder if this is the reason that I didn't get the motif using find-motifs subcommand. Thanks again for your help!

2 11300 11301 h 1 - 11300 11301 255,0,0 1 0.00 0 1 0 0 0 00
2 11300 11301 m 1 - 11300 11301 255,0,0 1 0.00 0 1 0 0 0 00
2 11603 11604 h 1 - 11603 11604 255,0,0 1 0.00 0 1 0 0 0 00
2 11603 11604 m 1 - 11603 11604 255,0,0 1 0.00 0 1 0 0 0 00
2 11614 11615 h 1 - 11614 11615 255,0,0 1 0.00 0 1 0 0 0 00
2 11614 11615 m 1 - 11614 11615 255,0,0 1 0.00 0 1 0 0 0 00
2 11620 11621 h 1 - 11620 11621 255,0,0 1 0.00 0 1 0 0 0 00
2 11620 11621 m 1 - 11620 11621 255,0,0 1 0.00 0 1 0 0 0 00
2 11661 11662 h 1 - 11661 11662 255,0,0 1 0.00 0 1 0 0 0 00
2 11661 11662 m 1 - 11661 11662 255,0,0 1 0.00 0 1 0 0 0 00

@ArtRand
Copy link
Contributor

ArtRand commented Jun 20, 2024

Hello @uuwuyi,

The fourth column of the bedMethyl table should only have the modified base code, the snippet you have here is what I would expect. Version v0.3.0 had a bug where the output table would not be printed correctly. Could you try v0.3.1rc1? If you continue to see the issue, could you attach the log from the find-motifs run?

@uuwuyi
Copy link
Author

uuwuyi commented Jun 22, 2024

Hi @ArtRand,
find_motif_test.log
find_motif.log

I have updated the modkit to v0.3.1 and run find-motifs on my bacterial samples (~80 samples). As the log file shows, it is indeed searching for motifs. But I am a bit surprised that all of the motif.tsv files I have got have no motif found. I am unsure if I did something wrong. I tried to lower down the and but it didn't help. Many thanks again for your help!

@ArtRand
Copy link
Contributor

ArtRand commented Jun 24, 2024

Hello @uuwuyi,

Thanks for sending the logs over. It looks like the overall level of modification in this sample is low. From the logs you can trace the progression of the search. You start with 153 high-frequency contexts (m and h combined). You can see that the first stage finds some enriched sites but then they are discarded, for example:

[src/find_motifs/mod.rs::1893][2024-06-21 18:11:14][DEBUG] discarding TGAYC[m]GD, high-modified sites too low, 38

You could lower the --min-sites option even more (maybe 30), however I can't say whether this makes sense for your experiment. Another option is to take some of the motifs from the logs (e.g. YC[m]GD) and pass them as known motifs --known-motif YCCGD 2 m and the command should measure the methylation level for those sites. Hope this helps, happy to answer additional questions.

@ArtRand ArtRand added question Further information is requested and removed troubleshooting workflow and data preparation questions labels Jun 24, 2024
@uuwuyi
Copy link
Author

uuwuyi commented Jun 26, 2024

Hi @ArtRand,

I am using 30x sequencing data. I have tried lowering --min-sites to 30 and it still didn't find any motifs. Do you have any flags and thresholds that would recommend considering it's 30x depth? Also, I run the dorado modified basecalling using the command below, if you don't mind checking if this is ok for correct basecalling. I used fast5 files instead of the pod5. Could it be a reason for low 5mC modification detection? Thanks again for your patience and help!
dorado basecaller /path/to/[email protected] /path/to/fast5 --modified-bases-models /path/to/[email protected]_5mCG@v0 > dorado_calls.bam

@ArtRand
Copy link
Contributor

ArtRand commented Jun 27, 2024

Hello @uuwuyi,

Did you use --min-sites 30 and --min-frac-mod 0.5? Could you attach the log?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants