-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform mean of probability for each read position with modkit extract #163
Comments
Hello @OceaneMion, I think what you want is to use Maybe I'm not quite understanding what you need, could you provide a concrete example? |
Thank you for your awnser, yes I think pileup is more suitable indeed. But I don't exactly understand how pileup works ? This is an example of the first lines of my output : contig_1 2 3 h 411 . 2 3 255,0,0 411 12.90 53 16 342 0 0 0 3 Is the column 11 : fraction modified, the percentage to have a methylation mark at this position ? Also I don't really understand the link with the mod_qual from modkit extract. I know that modkit extract will give me the probability of a base at a specific position in a read, but sometime I have reads that have different id but same ref position and chromosome so I would like to perform a mean on those. |
I may have another question, is the bedgraph 4th column giving me the probability of the methylation at a specific position it look like this : |
Hello @OceaneMion, During pileup, each read's base modification probability (
These values are the same as the corresponding ones in the bedMethyl output. There are details and examples on how the pass thresholds effect the base modification calls in the documentation as well. I believe what you want is the |
Thanks a lot ! So if I want to plot the probability distribution of each modified bases at each genomic position, I will need to use the fraction_modified values right ? or do I need something else? |
Hello @OceaneMion, Basically, yes. If you have a model that predicts more than one modification at a base (e.g. 5hmC/5mC at cytosine bases), you'll have a categorical distribution where the empirical
You'd have: These probabilities define the categorical distribution at that site. There are, of course, a multitude of fancier things you can do, but this is a good place to start. |
Hi all,
I would like to know if it is possible to write some bash code after performing a modkit extract, to filter based on ref_position and chrom. For example if I have a base that have the same chromosome and ref_position multiple time, let's say contig_1 for chrom associated with ref position 2 appear 4 times, how can I do a new table which will not contain the four line but only one with the mean average of the call_prob ? Also if the call code is different it should not do the mean so it should also verify this criteria. I'm not interested in the other column of the table as I only want to represent the call_prob vs ref_position.
I am kinda new to bioinformatic so any help would be appreciated !
Thanks in advance
The text was updated successfully, but these errors were encountered: