CRISPRclassify-CNN-Att is a deep learning-based method that utilizes Convolutional Neural Networks (CNNs) and self-attention mechanisms to classify CRISPR-Cas systems based on repeat sequences.
Repeats | Subtype |
---|---|
GTCGCGCCTTTACGGGCGCGTGGATTGAAAC | I-C |
CGGTTCATCCCCACCTGCGTGGGGTTAAT | I-E |
GATTGAAAGCTATGCGAATTTGCACAGTCTTAAAAC | VI-D |
TCCAGCCGCCTTCAGGCGGCTGGTGTGTTGAAAC | I-C |
ATAAGAGAGAATATAACTCCGATAGGAGACGGAAAC | III-A |
GTCTGCCCCGCGCATGCGGGGATGAACCC | I-E |
-
data/: Contains data files .
repeats_all.csv
: csv file containing all repeats and their corresponding subtypes.
-
source/: Contains all the script files.
CNN_Att.py
: CNN and self-attention mechanism models.dataselect.py
: selecting datasets.repeatsEncoder.py
: encoding repeats.structure_features.py
: calculating additional features (including k-mer frequency, GC content, and sequence length).transferlearning.py
: fine-tuning for classifying less abundant subtypes .typeEncoder.py
: encoding subtypes.stacking.py
: model stacking.
-
model/:
-
cnn_att_large.pth: pre-trained model-large
-
cnn_att_less.pth:pre-trained model-less
-
CRISPRclassify_CNN_Att.pkl : stacking model
[!NOTE]
Due to file size limitations, we have stored the files
cnn_att_large.pth
,cnn_att_less.pth
,andCRISPRclassify_CNN_Att.pkl
at the following link: Google Drive Folder. -
test.py : model testing
-
test.xlsx: test data
-
-
README.md
: project description and instructions. -
requirements.txt
: list of dependencies for the project.
-
Clone the repository:
git clone https://github.com/Xingyu-Liao/CRISPRclassify-CNN-Att.git cd CRISPRclassify-CNN-Att
-
Create and activate a virtual environment :
conda create --name crisprclassify-cnn-att conda activate crisprclassify-cnn-att
-
Install dependencies:
pip install -r requirements.txt
-
Train model-large for the subtype with abundant samples:
cd source python CNN_Att.py
-
Train model-less for the subtype with fewer samples:
cd source python transferlearning.py
-
Model stacking:
cd source python stacking.py
-
Model testing:
cd model python test.py
CRISPRclassify-CNN-Att/
├── data/
│ ├── repeats_all.csv
├── source/
│ ├── CNN_Att.py
│ ├── dataselect.py
│ ├── repeatsEncoder.py
│ ├── structure_features.py
│ ├── transferlearning.py
│ ├── typeEncoder.py
│ ├── stacking.py
├── model/
│ ├── cnn_att_large.pth
│ ├── cnn_att_less.pth
│ ├── CRISPRclassify_CNN_Att.pkl
│ ├── test.py
│ ├── test.xlsx
├── README.md
├── requirements.txt