Inconsistent results when using call-mods + pileup or pileup directly #132

ppapasaikas · 2024-02-16T11:54:42Z

I noticed a discrepancy in the obtained calls when using in succession modkit pileup on a bam file obtained with modkit call-mods vs a pileup directly obtained from modkit pileup.

Using call-mods + pileup:

modkit call-mods -p 0.2 --mod-threshold a:0.8  input.bam called.bam
modkit pileup --no-filtering called.bam pileup_A.bed

Using pileup directly:

modkit pileup  -p 0.2 --mod-threshold a:0.8  --no-filtering input.bam pileup_B.bed

I would naively expect the two resulting pileups to be identical or at least close in terms of reported %mA and yet average methylation differs significantly (>10% on my data, ~65% vs 78%).
Perhaps I am missing something in the way modkit call-mods operates?

The text was updated successfully, but these errors were encountered:

ppapasaikas · 2024-02-16T12:19:11Z

For completeness when using call-mods + pileup using the same filtering thresholds I get identical results as when using pileup with --no-filtering so:

modkit call-mods -p 0.2 --mod-threshold a:0.8  input.bam called.bam
modkit pileup -p 0.2 --mod-threshold a:0.8 called.bam pileup_C.bed

pileup_C is identical to pileup_A (but both are significantly different to pileup_B).

ArtRand · 2024-02-16T15:14:24Z

Hello @ppapasaikas,

I'm a little confused how you're generating pileup_B.bed. All of the most recent versions will not let you pass the --filter-percentile (-p) option along with the --no-filtering flag. Could you tell me what version of modkit you're using ($ modkit --version)? Also, are you using modkit summary to assess the %mA or pileup? (just so I can reproduce on my side).

ppapasaikas · 2024-02-16T15:36:01Z

Thank you for the quick reply.
You are right, that was a copy pasting error on my side. For pileup_B (pileup only bed generation) it should be:

modkit pileup -p 0.2 --mod-threshold a:0.8 input.bam pileup_B.bed

I am using modkit v.0.2.3.

I am using directly the generated modkit pileup outputs in all cases to assess %mA.

ArtRand · 2024-02-16T18:45:59Z

Hello @ppapasaikas,

You may get different results from running modkit pileup with specified thresholds compared to first running modkit call-mods with specified thresholds then using pileup because the filter thresholds are estimated slightly differently. In pileup only mapped reads and mapped bases are used. Whereas in modkit call-mods unmapped reads and bases are used. You can see the difference in the logs:

src/command_utils.rs::117][2024-02-16 18:08:07][DEBUG] estimated pass threshold 0.8417969 for primary sequence base A

The exact line number might be different if you update to a more recent version of modkit. What I would suggest is running modkit sample-probs with --only-mapped to estimate the filter threshold then use this number for both other procedures (--filter-threshold ${sample_probs_output} --mod-threshold a:0.8).

I'll add an option to call-mods to make the threshold estimation exactly identical to pileup. If you confirm this is the case, (i.e. using user-provided thresholds makes the final outputs the same), I'll change the issue to reflect adding the flag to call-mods.

ppapasaikas · 2024-02-16T20:50:00Z

I followed the suggested procedure:

# [src/command_utils.rs::117][2024-02-16 21:05:56][DEBUG] estimated pass threshold 0.7324219 for primary sequence base A
Then created the two pileups:

modkit call-mods  --filter-threshold 0.7324219 --mod-threshold a:0.8  input.bam called.bam
samtools index called.bam
modkit pileup  --filter-threshold 0.7324219 --mod-threshold a:0.8 called.bam --only-tabs pileup1.bed

(Again for the above case in the second step replacing --filter-threshold 0.7324219 --mod-threshold a:0.8 with --no-filtering yields identical results.)

And

modkit pileup --filter-threshold 0.7324219 --mod-threshold a:0.8 input.bam --only-tabs pileup2.bed

The two resulting pileups are definitely not identical. Not only in terms of fraction modified but also Nvalid_cov and all other Nxxx columns of the bedmethyl file. The difference is not subtle either. pileup1 (the version going through call-mods) reports consistently higher Nvalid_cov and lower fraction modified.

In case you cannot replicate with an in-house dataset I can share a sample bam file.

ArtRand · 2024-02-17T15:22:17Z

Hello @ppapasaikas,

If you can share a BAM that exposes the issue that would help a lot and I'd be more than happy to take a look. Thanks!

ArtRand · 2024-03-14T14:26:23Z

@ppapasaikas

Any update, keen to investigate the problem if you're willing to share a BAM that exposes it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent results when using call-mods + pileup or pileup directly #132

Inconsistent results when using call-mods + pileup or pileup directly #132

ppapasaikas commented Feb 16, 2024 •

edited

Loading

ppapasaikas commented Feb 16, 2024 •

edited

Loading

ArtRand commented Feb 16, 2024 •

edited

Loading

ppapasaikas commented Feb 16, 2024

ArtRand commented Feb 16, 2024 •

edited

Loading

ppapasaikas commented Feb 16, 2024

ArtRand commented Feb 17, 2024

ArtRand commented Mar 14, 2024

Inconsistent results when using call-mods + pileup or pileup directly #132

Inconsistent results when using call-mods + pileup or pileup directly #132

Comments

ppapasaikas commented Feb 16, 2024 • edited Loading

ppapasaikas commented Feb 16, 2024 • edited Loading

ArtRand commented Feb 16, 2024 • edited Loading

ppapasaikas commented Feb 16, 2024

ArtRand commented Feb 16, 2024 • edited Loading

ppapasaikas commented Feb 16, 2024

ArtRand commented Feb 17, 2024

ArtRand commented Mar 14, 2024

ppapasaikas commented Feb 16, 2024 •

edited

Loading

ppapasaikas commented Feb 16, 2024 •

edited

Loading

ArtRand commented Feb 16, 2024 •

edited

Loading

ArtRand commented Feb 16, 2024 •

edited

Loading