How to detect the background noise from control and test samples #172

xiangpingyu · 2024-04-24T02:57:00Z

Dear developers,

Attached are the results for two samples: 1# (control) and 2# (modified_test), generated from the modkit summary and modkit sample-probs using default parameters. I am uncertain about how to adjust the settings in modkit extract to accurately assess the modification status in the modified_test sample, particularly in the absence of an appropriate reference sequence for our experiments.

I have reviewed related issue #147, but still unclear on how to set the --filter-threshold and --mod-threshold parameters effectively, based on the attached files. If there is a way to establish a threshold for background noise in these two samples? Looking forward to your reply. Thank you all!

Sophia

1.summary.csv
2.summary.csv
2_probabilities.txt
1_probabilities.txt
2_thresholds.csv
1_thresholds.csv

ArtRand · 2024-04-25T22:27:29Z

Hello @xiangpingyu,

In general, you don't have to do anything to adjust the thresholds. The estimated threshold values in both of your samples seem to be about the same. If you want to estimate the false positive rate, use a sample where you know there should not be any 6mA bases. For example, a PCR amplified sample or a genome where you know the organism does not have the enzymes necessary to make this modification. When you use a sample such as this, you know that all 6mA calls must be false positive calls. You can use modkit pileup or modkit summary to aggregate these data (use the --only-mapped flag if you decide to use modkit summary). Optionally, you could use modkit sample-probs on a sample where you know there is 6mA at a reasonable level and get the default threshold value from that sample. Then use that value when you calculate the false positive rate as I've just described. I would do it both ways as a sanity check, because I would not expect the results to be very different. Hope this helps.

A

xiangpingyu · 2024-04-30T20:46:43Z

Hello @ArtRand ,

Due to lacking a suitable reference in our libraries, so that we have methylation bam files that are not aligned to a specific reference. I've found that using "modkit pileup" or "modkit summary to aggregate these data" (with the --only-mapped flag for modkit summary) does not facilitate further analysis of these unaligned bam files.

Could there be something I'm misunderstanding?

I look forward to your assistance.

Thank you!

ArtRand added the question Looking for clarification on inputs and/or outputs label Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to detect the background noise from control and test samples #172

How to detect the background noise from control and test samples #172

xiangpingyu commented Apr 24, 2024 •

edited

Loading

ArtRand commented Apr 25, 2024 •

edited

Loading

xiangpingyu commented Apr 30, 2024

How to detect the background noise from control and test samples #172

How to detect the background noise from control and test samples #172

Comments

xiangpingyu commented Apr 24, 2024 • edited Loading

ArtRand commented Apr 25, 2024 • edited Loading

xiangpingyu commented Apr 30, 2024

xiangpingyu commented Apr 24, 2024 •

edited

Loading

ArtRand commented Apr 25, 2024 •

edited

Loading