Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CpG Calling with Reference #196

Closed
SophieValk opened this issue Jun 3, 2024 · 2 comments
Closed

CpG Calling with Reference #196

SophieValk opened this issue Jun 3, 2024 · 2 comments
Labels
question Looking for clarification on inputs and/or outputs

Comments

@SophieValk
Copy link

Am I understanding it correctly that when you call CpGs using modekit pileup with the -ref flag, only the CpGs that are present in the reference genome will get called? And any site that is a CpG in you .bam files but not in the reference genome will not be included in the modkit pileup .bedMethyl output file?

Thanks in advance,
Sophie

@ArtRand
Copy link
Contributor

ArtRand commented Jun 3, 2024

Hello @SophieValk,

If you use a command such as this

$ modkit pileup ${mod_bam} ${pileup_bedmethyl} --cpg --ref ${ref}

The bedmethyl records will only be for positions that are CG in the reference FASTA file. You are correct that if a read in the mod-BAM has a CpG methylation call that is not in the reference sequence (i.e. not aligned) it will not be present in the output. If you want to look at unaligned CpG calls you can first filter the mod-BAM to only unaligned records:

$ samtools -f 4 ${modbam} > unaligned.bam

then look at the calls with modkit extract

$ modkit extract unaligned.bam raw_probabilities.tsv --read-calls read_calls.tsv

If you want the positions that were CpG in the read but not CpG in the reference (reference CH), you can use dorado with the CpG modification model and then use

$ modkit pileup ${mod_bam} ${pileup_ch_bedmethyl} --motif CH 0 --ref ${ref}

I'm going to add functionality that will allow you to subset the modification calls in a set of reads to specific read sequences in the next release, right now you'll have to use the CpG modification model.

Hope this helps, happy to answer any more questions you have.

@ArtRand ArtRand added the question Looking for clarification on inputs and/or outputs label Jun 3, 2024
@SophieValk
Copy link
Author

Thanks @ArtRand for your clear and quick reply.
Sophie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Looking for clarification on inputs and/or outputs
Projects
None yet
Development

No branches or pull requests

2 participants