Skip to content

Prediction and characterization of human ageing-related proteins by using machine learning. Related publication: https://doi.org/10.1038/s41598-018-22240-w

Notifications You must be signed in to change notification settings

kerepesi/aging_ml

Repository files navigation

*******************************************************************************************
Feature tables and main scripts of the paper 
"Prediction and characterization of human ageing-related proteins by using machine learning"
********************************************************************************************

Citation of the paper:

    Kerepesi C, Daróczy B, Sturm Á, Vellai T, Benczúr A
    Prediction and characterization of human ageing-related proteins by using machine learning
    Scientific Reports, Vol 8, 4094 (2018).

Requirements (versions used by us in parenthesis): 

   - Linux (Ubuntu 16.04.3 LTS) 
   - Python 3.5.2 
   - Python packages: pandas (0.20.3), numpy (1.13.1), sklearn (0.19.0)

Running commands: 

  - Running 20 experiments of XGBoost with 5 fold CV (predictions are averaged, parameters: max_depth=1, n_est=50, n_exp=20)
    $ python XGBoost_CV.py 1 50 20 Final_32_features.csv

  - Dumping trees of the XGBoost model (max_d=1, n_est=50, inputfile=Final_feature_table.csv):
    $ python TreeDumper.py 1 50 Final_32_features.csv

Description of the files (for more information please see Methods of the paper):
   
  - aging_labels.csv:
    Labels of the classification ('1' if the given protein included in GenAge database, 0 otherwise)
    
  - uniprot_sprot_human.dat-GO_Digger_Sparse.py.txt-TableGen.py.csv.zip: 
    Gene Ontology features with ancestors and with ageing GOs. 
    
  - uniprot_sprot_human.dat-GO_Digger_Sparse.py.txt-TableGen.py.csv-woAgingGOs.csv.zip
    Gene Ontology features with ancestors and without ageing GOs ( called 'GO' in Table 4 of the paper). 
    
  - uniprot_sprot_human.dat-GO_Digger_Sparse.py.txt-TableGen.py.csv-woAgingGOs.csv-f_sel_based_on_GO_stats.py-thr1000.csv.zip
    Feature set containing only the GO features that occur in least 1000 proteins (called 'Frequent GOs' in Table 5 of the paper).
    
  - uniprot_sprot_human.dat-InteractionDigger.py.txt-CreateNetworkFT_paired.py.csv-join_CytoscapeStats.csv:
    PPI Network features. 
   
  - RNA_CoEx_vs_GenAge_FT.csv: 
    Co-expression features.
    
  - Final_32_features.csv: 
    Table of the final 32 features (selected by XGBoost).
    
  - Final_32_features.csv-XGBoost_CV_preds-n_est50-exp20.csv: 
    Output file of the command 'python XGBoost_CV.py 1 50 20 Final_32_features.csv'.

  - Final_32_features.csv_Trees-n_est50-max_d1.txt:
    Output file of the command 'python TreeDumper.py 1 50 Final_32_features.csv'.
    
    


About

Prediction and characterization of human ageing-related proteins by using machine learning. Related publication: https://doi.org/10.1038/s41598-018-22240-w

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages