-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clearer A
and B
notation
#91
Comments
Hello @Ge0rges, For human-readable information such as the sample names, the place to look is the log file. The first line will always have the command itself, so any data provenance questions can be answered (hopefully) by inspecting that. Granted, the log file is optional, so maybe there should be some guard rails around that. I've considered adding flags that would add headers to the BED output files (pileup and dmr), just to make parsing with tabular tools easier. So I'll think about adding the sample name as you suggest. On the other hand, adding the sample name changes the schema. To your second question, I'm going to assume you mean nucleotide count is the number of bases with potential modifications (e.g. all Cs in a region). A |
Thank you for the prompt reply as usual. I would argue that the sample naming data should be within the file to make it more independent and less prone to errors when sharing, or coming back to the data after a while. I think log files are often thought of as temporary. |
Hello @Ge0rges, I tend to agree. I find myself relying on the "command" comments in VCF files and PG records in BAMs all the time. There is a lot going into the 0.2.3 release, so I may not get this feature in there - but I'll try. Thanks as always for the suggestion. |
In the output of
dmr multi
there are columns such assamplea_counts
andsampleb_counts
. Given that the command is given sample names it would be nice if the file itself contained pairings between A, B and the sample names. Currently this relies on the user ensuring they don't modify the sample name. Perhaps this is not very practical if we're following any standards though, and that would be understandable in which case perhaps a comment at the top of the file?PS: What is the difference between the nucleotide count and
Total number of base modification calls in the region, including unmodified, for sample A
for the output columnsamplea total
? Is it because some nucleotides don't pass the threshold to be confidently called as unmodified?The text was updated successfully, but these errors were encountered: