This project analyzes the overlap of transposable elements and enhancers within the human genome using various machine learning algorithms such as random forest, svm, and tsne, as well as dimensionality reduction with pca.
Many README files will refer to pile paths on the Vanderbilt accre cluster, which will start with /dors/capra_lab/users/yand1/te_ml/. For the purpose of finding files within this Github repository, consider /dors/capra_lab/users/yand1/te_ml/ as equivalent to the root directory of this Github repository.
Source files are in the bin folder, which contains directories corresponding to the creation date of different files. Detailed documentation is inside the bin folder.
The data files are too large to store on Github and are on the Vanderbilt ACCRE cluster at
/dors/capra_lab/users/yand1/te_ml/data
Note that data files are zipped and need to be unzipped using gunzip
The results folder was added on 2018-07-13 for easier synchronizing between local machine and Vanderbilt accre cluster. More detailed documentation is inside the results folder.
results/2018_06_21_chromehmm_te folder is ignored due to large size of file within the folder, but is on the Vanderbilt accre cluster at /dors/capra_lab/users/yand1/te_ml/results/2018_06_21_chromehmm_te