-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validation with ground truth #177
Comments
At a high level I would note that this command is intended for the validation of samples where the modified base status at reference positions is known with absolute certainty. It is possible to run this command with any type of data, but the reported metrics represent the combination of the model performance and the validity of the ground truth data. If you have partial methylation at reference positions you may be more interested in comparing different samples or technologies with a heatmap/correlation analysis, but that is not the intended purpose of this command. Answers to your specific questions are below.
The mod code for modified bases is as specified in the BAM tag format. So for common modified bases this will be a single letter code ( For canonical positions in the reference the Here is an example for two modified positions:
And for canonical positions.
The positions in the BED file specify the positions at which you know the modified base identity with 100% certainty for all reads covering the reference position. Any such positions can be added to the BED file. If these are CpG sites then these can be added to the ground truth BED file. You can generate several BED files if you wish to validate accuracy on different subsets of reference positions (e.g. CG, CHG, and CHH contexts).
Generally this command is intended for positions where the modified base status at a reference position is known with absolute certainty. It is possible to set thresholds given an alternative technology (or even replicates of the same one), but in this case any differences due to biological variation will be represented as "errors" which may not accurately represent the performance of the modified base calls.
Yes. Modified bases occur on a single strand not on both. Some sites show constitutive methylation on both strands within certain motifs (e.g. 5mC in CG contexts). In these cases both strands can be added to the ground truth BED file. |
Thank you! |
Hi, I have a few questions regarding the use of modkit validate.
Thank you!
The text was updated successfully, but these errors were encountered: