Clearer `A` and `B` notation #91

Ge0rges · 2023-11-28T08:26:30Z

In the output of dmr multi there are columns such as samplea_counts and sampleb_counts. Given that the command is given sample names it would be nice if the file itself contained pairings between A, B and the sample names. Currently this relies on the user ensuring they don't modify the sample name. Perhaps this is not very practical if we're following any standards though, and that would be understandable in which case perhaps a comment at the top of the file?

PS: What is the difference between the nucleotide count and Total number of base modification calls in the region, including unmodified, for sample A for the output column samplea total? Is it because some nucleotides don't pass the threshold to be confidently called as unmodified?

The text was updated successfully, but these errors were encountered:

ArtRand · 2023-11-28T17:28:55Z

Hello @Ge0rges,

For human-readable information such as the sample names, the place to look is the log file. The first line will always have the command itself, so any data provenance questions can be answered (hopefully) by inspecting that. Granted, the log file is optional, so maybe there should be some guard rails around that. I've considered adding flags that would add headers to the BED output files (pileup and dmr), just to make parsing with tabular tools easier. So I'll think about adding the sample name as you suggest. On the other hand, adding the sample name changes the schema.

To your second question, I'm going to assume you mean nucleotide count is the number of bases with potential modifications (e.g. all Cs in a region). sample_a_total is the total number of calls in the region as reported by the bedMethyl. As you know, some calls are filtered out before making this table. Further, every base has some amount of coverage >= 1. So the maximum this number could be, assuming no filtering, is the nucleotide count (# C bases) multiplied by the coverage. Hope this helps,

A

Ge0rges · 2023-11-28T17:30:41Z

Thank you for the prompt reply as usual. I would argue that the sample naming data should be within the file to make it more independent and less prone to errors when sharing, or coming back to the data after a while. I think log files are often thought of as temporary.

ArtRand · 2023-11-28T23:47:22Z

Hello @Ge0rges,

I tend to agree. I find myself relying on the "command" comments in VCF files and PG records in BAMs all the time. There is a lot going into the 0.2.3 release, so I may not get this feature in there - but I'll try. Thanks as always for the suggestion.

ArtRand added this to the 0.2.3 milestone Nov 28, 2023

ArtRand added the enhancement New feature or request label Nov 28, 2023

ArtRand removed this from the 0.2.3 milestone Nov 28, 2023

ArtRand removed the enhancement New feature or request label Nov 28, 2023

ArtRand added this to the 0.2.4 milestone Dec 4, 2023

ArtRand modified the milestones: 0.2.4, 0.2.5 Dec 22, 2023

ArtRand removed this from the 0.2.5 milestone Oct 10, 2024

ArtRand added the enhancement New feature or request label Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clearer `A` and `B` notation #91

Clearer `A` and `B` notation #91

Ge0rges commented Nov 28, 2023 •

edited

Loading

ArtRand commented Nov 28, 2023 •

edited

Loading

Ge0rges commented Nov 28, 2023 •

edited

Loading

ArtRand commented Nov 28, 2023

Clearer A and B notation #91

Clearer A and B notation #91

Comments

Ge0rges commented Nov 28, 2023 • edited Loading

ArtRand commented Nov 28, 2023 • edited Loading

Ge0rges commented Nov 28, 2023 • edited Loading

ArtRand commented Nov 28, 2023

Clearer `A` and `B` notation #91

Clearer `A` and `B` notation #91

Ge0rges commented Nov 28, 2023 •

edited

Loading

ArtRand commented Nov 28, 2023 •

edited

Loading

Ge0rges commented Nov 28, 2023 •

edited

Loading