Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent results when using call-mods + pileup or pileup directly #132

Open
ppapasaikas opened this issue Feb 16, 2024 · 7 comments
Open

Comments

@ppapasaikas
Copy link

ppapasaikas commented Feb 16, 2024

I noticed a discrepancy in the obtained calls when using in succession modkit pileup on a bam file obtained with modkit call-mods vs a pileup directly obtained from modkit pileup.

Using call-mods + pileup:

modkit call-mods -p 0.2 --mod-threshold a:0.8  input.bam called.bam
modkit pileup --no-filtering called.bam pileup_A.bed

Using pileup directly:

modkit pileup  -p 0.2 --mod-threshold a:0.8  --no-filtering input.bam pileup_B.bed

I would naively expect the two resulting pileups to be identical or at least close in terms of reported %mA and yet average methylation differs significantly (>10% on my data, ~65% vs 78%).
Perhaps I am missing something in the way modkit call-mods operates?

@ppapasaikas
Copy link
Author

ppapasaikas commented Feb 16, 2024

For completeness when using call-mods + pileup using the same filtering thresholds I get identical results as when using pileup with --no-filtering so:

modkit call-mods -p 0.2 --mod-threshold a:0.8  input.bam called.bam
modkit pileup -p 0.2 --mod-threshold a:0.8 called.bam pileup_C.bed

pileup_C is identical to pileup_A (but both are significantly different to pileup_B).

@ArtRand
Copy link
Contributor

ArtRand commented Feb 16, 2024

Hello @ppapasaikas,

I'm a little confused how you're generating pileup_B.bed. All of the most recent versions will not let you pass the --filter-percentile (-p) option along with the --no-filtering flag. Could you tell me what version of modkit you're using ($ modkit --version)? Also, are you using modkit summary to assess the %mA or pileup? (just so I can reproduce on my side).

@ppapasaikas
Copy link
Author

Thank you for the quick reply.
You are right, that was a copy pasting error on my side. For pileup_B (pileup only bed generation) it should be:

modkit pileup -p 0.2 --mod-threshold a:0.8 input.bam pileup_B.bed

I am using modkit v.0.2.3.

I am using directly the generated modkit pileup outputs in all cases to assess %mA.

@ArtRand
Copy link
Contributor

ArtRand commented Feb 16, 2024

Hello @ppapasaikas,

You may get different results from running modkit pileup with specified thresholds compared to first running modkit call-mods with specified thresholds then using pileup because the filter thresholds are estimated slightly differently. In pileup only mapped reads and mapped bases are used. Whereas in modkit call-mods unmapped reads and bases are used. You can see the difference in the logs:

src/command_utils.rs::117][2024-02-16 18:08:07][DEBUG] estimated pass threshold 0.8417969 for primary sequence base A

The exact line number might be different if you update to a more recent version of modkit. What I would suggest is running modkit sample-probs with --only-mapped to estimate the filter threshold then use this number for both other procedures (--filter-threshold ${sample_probs_output} --mod-threshold a:0.8).

I'll add an option to call-mods to make the threshold estimation exactly identical to pileup. If you confirm this is the case, (i.e. using user-provided thresholds makes the final outputs the same), I'll change the issue to reflect adding the flag to call-mods.

@ppapasaikas
Copy link
Author

I followed the suggested procedure:

# [src/command_utils.rs::117][2024-02-16 21:05:56][DEBUG] estimated pass threshold 0.7324219 for primary sequence base A
Then created the two pileups:

modkit call-mods  --filter-threshold 0.7324219 --mod-threshold a:0.8  input.bam called.bam
samtools index called.bam
modkit pileup  --filter-threshold 0.7324219 --mod-threshold a:0.8 called.bam --only-tabs pileup1.bed

(Again for the above case in the second step replacing --filter-threshold 0.7324219 --mod-threshold a:0.8 with --no-filtering yields identical results.)

And

modkit pileup --filter-threshold 0.7324219 --mod-threshold a:0.8 input.bam --only-tabs pileup2.bed

The two resulting pileups are definitely not identical. Not only in terms of fraction modified but also Nvalid_cov and all other Nxxx columns of the bedmethyl file. The difference is not subtle either. pileup1 (the version going through call-mods) reports consistently higher Nvalid_cov and lower fraction modified.

In case you cannot replicate with an in-house dataset I can share a sample bam file.

@ArtRand
Copy link
Contributor

ArtRand commented Feb 17, 2024

Hello @ppapasaikas,

If you can share a BAM that exposes the issue that would help a lot and I'd be more than happy to take a look. Thanks!

@ArtRand
Copy link
Contributor

ArtRand commented Mar 14, 2024

@ppapasaikas

Any update, keen to investigate the problem if you're willing to share a BAM that exposes it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants