Skip to content

omar-csse/kmer2vec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kmer2vec model

This is a project to find the relation between different DNA kmers of the promotors with sigma70 factor from regulonDB. Each DNA sequence is divided into kmers of length 6. Tensorboard is used to visualize the prediction of the model. Also, to visualize the nearest kmers in 3D

Prerequisites:

  • Python 3.x and pip
  • Node.js
  • npm
  • tensorflow

To download TensorFlow run the following command in the terminal:

    pip install tensorflow

 

To change the hyperparameters of the kmer2vec model, navigate to line 323 in kmer2vec.py and change the following parameters as wanted.

    kmer2vec = Kmer2vec(embedding_size=128, batch_size=128, num_sampled=16, learningRate=1, window_size=2)

 

 

After installing the Prerequisites, navigate to the kmer2vec folder and run the project with the following commands:

To train the model:

    npm run train

 

The following files will run in this order:

 

 

 

After running the model a data folder inside lib with the filtered regulonDB data will be generated in json files.

Also, a log.txt file will be generated to check the output of the current trained model. It will be added inside the model logs folder. Note that the logs folder is the output of Tensorflow. So, each time training a model, a unique folder with timestamp inside the logs folder will be generated and can be visualized in Tensorboard.

 

To visualize the project with Tensorboard run the following command, make sure to navigate to kmer2vec folder.

    npm run visualize

 

You will get a URL in the response, usually, the port is 6006. So most likely you need to navigate to the following URL:

TensorBoard 1.13.1 at http:https://localhost:6006

 

Finally, you will have three tabs:

  • Scalars
  • Graphs
  • Projector

 

To visualize the loss of the model in 2D graphs, navigate to SCALARS

To visualize the graph of the model, navigate to GRAPHS

To visualize the nearest kmers in 3D, navigate to Projector

 

 

Hyper-parameters:

Learning Rate Window Size Loss
0.1 2 0.7192
0.1 4 1.0523
0.5 2 0.3683
0.5 4 0.8860
1 2 0.2668
1 4 0.9222