This repository contains all the source files required to reproduce the results in the original DeLUCS paper (https://doi.org/10.1101/2021.05.13.444008), as well as a detailed guide for running the code.
python build_dp.py --data_path=<PATH_sequence_folder>
- Input: Folders with the sequences in FASTA format
- Output : file in the form (label,sequence,accession)
python get_pairs.py --data_path=<PATH_pickle_dataset> --k=6 --modify='mutation' --output=<PATH_output_file> --n_mimics=<n mimics per sequence>
- Input: file in the form (label,sequence,accession)
- Output : file in the form of (pairs, x_test, y_test)
* For training DeLUCS and testing its performance
```
python EvaluateDeLUCS.py --data_dir=<PATH_of_computed_mimics> --out_dir=<OUTPURDIR>
```
* Input: Pickle file with the mimics in the form of (pairs, x_test, y_test).
* Output : Confusion Matrix.
<!--* File with the misclassified sequences in the form (accession, true_label, predicted_label)-->
* For testing the performance a single Neural Network trained in an unsupervised way (labels must be available):
```
python EvaluateSingleRun.py --data_dir=<PATH_of_computed_mimics> --out_dir=<OUTPURDIR>
```
We recomend using the updated version of the code in (https://github.com/Kari-Genomics-Lab) for training on your own data.
If you find DeLUCS useful in your research please consider citing:
@article{10.1371/journal.pone.0261531,
doi = {10.1371/journal.pone.0261531},
author = {Millán Arias, Pablo AND Alipour, Fatemeh AND Hill, Kathleen A. AND Kari, Lila},
journal = {PLOS ONE},
publisher = {Public Library of Science},
title = {DeLUCS: Deep learning for unsupervised clustering of DNA sequences},
year = {2022},
month = {01},
volume = {17},
url = {https://doi.org/10.1371/journal.pone.0261531},
pages = {1-25},
number = {1},
}