Skip to content

Clean version of Protein Modification Repo "Frozen Branch"

Notifications You must be signed in to change notification settings

vzg100/ptm_pred

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Post Translational Modification Prediction

Capstone project for Senior Year at Tulane University

A full write up of using supervised learning and class imbalance methods can be found here: https://docs.google.com/document/d/1Yi3vMEq4l0SLw95HtiVRHsn010nrVaNZiZlV9pi7TjU/edit?usp=sharing

The supervised methods generate precision and accuracy in the 80-90% range with recall in the 10-20% range.

Recently I have started using unsupervised learning methods with interesting results. The word2vec implementations are averaging around 75 in recall, precision, and accuracy for most post translational modifications tests. This presents a possible solution to the recall issue which has plagued post translational modification prediction for the last decade.

TODO:

Write FASTA -> CSV converter for benchmark tests

Implement benchmarks into word2vec.

Try prot2vec implementations

Try using exon/intron as an additional feature set.

Notes:

The data posted comes from dbptm.mbc.nctu.edu.tw which is a great rescource for protien related machine learning projects.

About

Clean version of Protein Modification Repo "Frozen Branch"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages