Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to detect the background noise from control and test samples #172

Open
xiangpingyu opened this issue Apr 24, 2024 · 2 comments
Open
Labels
question Looking for clarification on inputs and/or outputs

Comments

@xiangpingyu
Copy link

xiangpingyu commented Apr 24, 2024

Dear developers,

Attached are the results for two samples: 1# (control) and 2# (modified_test), generated from the modkit summary and modkit sample-probs using default parameters. I am uncertain about how to adjust the settings in modkit extract to accurately assess the modification status in the modified_test sample, particularly in the absence of an appropriate reference sequence for our experiments.

I have reviewed related issue #147, but still unclear on how to set the --filter-threshold and --mod-threshold parameters effectively, based on the attached files. If there is a way to establish a threshold for background noise in these two samples? Looking forward to your reply. Thank you all!

Sophia

1.summary.csv
2.summary.csv
2_probabilities.txt
1_probabilities.txt
2_thresholds.csv
1_thresholds.csv

@ArtRand
Copy link
Contributor

ArtRand commented Apr 25, 2024

Hello @xiangpingyu,

In general, you don't have to do anything to adjust the thresholds. The estimated threshold values in both of your samples seem to be about the same. If you want to estimate the false positive rate, use a sample where you know there should not be any 6mA bases. For example, a PCR amplified sample or a genome where you know the organism does not have the enzymes necessary to make this modification. When you use a sample such as this, you know that all 6mA calls must be false positive calls. You can use modkit pileup or modkit summary to aggregate these data (use the --only-mapped flag if you decide to use modkit summary). Optionally, you could use modkit sample-probs on a sample where you know there is 6mA at a reasonable level and get the default threshold value from that sample. Then use that value when you calculate the false positive rate as I've just described. I would do it both ways as a sanity check, because I would not expect the results to be very different. Hope this helps.

A

@ArtRand ArtRand added the question Looking for clarification on inputs and/or outputs label Apr 25, 2024
@xiangpingyu
Copy link
Author

Hello @ArtRand ,

Due to lacking a suitable reference in our libraries, so that we have methylation bam files that are not aligned to a specific reference. I've found that using "modkit pileup" or "modkit summary to aggregate these data" (with the --only-mapped flag for modkit summary) does not facilitate further analysis of these unaligned bam files.

Could there be something I'm misunderstanding?

I look forward to your assistance.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Looking for clarification on inputs and/or outputs
Projects
None yet
Development

No branches or pull requests

2 participants