-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify input format (for Bismark output) #7
Comments
Hi @bug1303
they were only examples, I didn't mean for them to be taken literally. However, thank you for seeing this, I've changed it. I've attached 2 files showing how the different inputs look like |
You write in the README.md that you support the following formats:
However, these don’t entirely match what is described in the bismark_methylation_extractor help:
and
You call both "coverage2cytosine" format. The "coverage2cytosine" Bismark module can create a "genome-wide cytosine methylation output file" (which looks ALMOST like Input Type 5) from the coverage output (which looks ALMOST like your Input Type 6), but can also be created from bismark_methylation_extractor directly.
In Input Type 5 example you show start and end position (and 8 columns in total), but describe below only start position and 7 columns in total. I assume it's just a typo in the example?
You write the start/end position for all are in [0,4294967295], Bismark by default uses 1-based, unless
--zero-based
is explicitly specified, and only then it becomes half-open. So, by default it's all 1-based and start position == end position, in your example it says '762 763', so should indeed--zero-based
be specified?Bismark clearly states "count methylated" and "count non-methylated" rather than "methylated C count" and "C count". "C count" sounds like total count (methylated + non-methylated). What is actually expected here?
Input Type 6 "Column4: methylation percentage, which is calculated by Defiant." - Why is this calculated by Defiant? And how? Shouldn't this be input to Defiant? It is part of the Bismark coverage output. However your example... "chr1 762 763 0.265625 17 76 "
How would you get to 0.265625? It's neither 17/76, nor 17/(76+17), depending on what you actually mean in no 4... (17/64=0.265625 , assuming the 64 that you mention in input type 5 example )
However, from an Bismark run, I got e.g. in coverage output (test.deduplicated.bismark.cov.gz):
chr3 3008646 3008646 33.3333333333333 1 2
chr3 5620584 5620584 75 3 1
So, the methylation percentage is (100*col5/(col5+col6)) and not (col5/col6)
(Also, the start and end position are same (as stated in 3), unless --zero-based is used, but then it would not be valid input to the coverage2cytosine script.)
Please consider to provide an example call for the bismark_methylation_extractor, that will produce files of the type that defiant will read and process as expected.
Looking forward to test the program once this is clarified.
The text was updated successfully, but these errors were encountered: