Source files created during the Master's Thesis
-
As an initial raw data file (reference human genome) is used GRCh38.p14 (from NCBI).
-
For obtaining the reverse complements is used the tool seqkit, command is given in the Appendix section.
-
For counting k-mers is used KMC3.2.1, commands are given in the Appendix section.
Here are the scripts for the generation of files with chromosome-wise CpG positions and lists of "marker" unique 29-mers, which are used in the main working script of the method. The scripts are numerated in order of their execution, additional usage of tools for obtaining the reverse complements and k-mers counting is required (in our case those are seqkit and KMC3.2.1), detailed execution example provided in the Appendix section.
Main working directory of the method. Contains the main working python script of the method, bash scripts for modes, and the Snakefile, responsible for the workflow settings. Execution example provided in the Appendix section.
Contains script for generating .bedGraph file from the resulting output file of the method, as well as the scripts for the comparison of the results of our tool and another tools.
Here is the script for the generation of the statistics regarding the amount of CpGs having a certain number of unique 29-mers in their +-50-window (chromosome-wise).