-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How does Modkit handle Large Genome Data? #190
Comments
And for large genomes, memory usage is also very scary, the peak memory usage of |
I have read the documention of Performance Consideration;but it has almost no effect on my doubts. For 300GB input files, the memory has already exploded before seed searching step |
2 similar comments
I have read the documention of Performance Consideration;but it has almost no effect on my doubts. For 300GB input files, the memory has already exploded before seed searching step |
I have read the documention of Performance Consideration;but it has almost no effect on my doubts. For 300GB input files, the memory has already exploded before seed searching step |
Hello @Yang990-sys,
For modkit pileup ${modbam} - | awk '$5>5' | bgzip > ${out_filt_bedmethyl} I think a better option is to partition the analysis into genomic regions, for example chromosomes or Mbp-long regions. Differential methylation works on a genomic "column", so you can process each chromosome (or an interval of a chromosome) separately then combine the results together. You can also pipe the output of For
How large is the genome you're using (you previously mentioned studying human methylation). I am working on decreasing the memory usage (and increasing the processing speed) of |
Hello,
I am using modkit to study human methylation. However, the average size of a bed file containing three types of methylation is 300G, which is too large to be analyzed by my process, And in bedfiles, most methylation fractions are 0, Causing inconvenience to subsequent analysis. I am wondering if it is possible to delete all rows with a methylation fraction of 0; And when calculating DMR, the default methylation fraction for unmeasured positions is 0?
I mainly use two programs: dmr pair and find motifs; May I ask if deleting all 0 rows will have an impact on it?
The text was updated successfully, but these errors were encountered: