Skip to content

Prediction of Henry’s Law Constants using descriptors calculated from simple molecular representations

Notifications You must be signed in to change notification settings

jtd1g16/HLC_Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HLC_Predictor

Computational Chemistry Masters Project - University of Southampton A set of Jupyter notebooks illustrating a Henry's law constant (HLC) predictive model, starting from a species' SMILES string.

The compilation of HLCs used in this project was created by R. Sander, the paper published is available here.

The CAS reference numbers in the compilation were used to create SMILES strings (via cirpy). These were in turn passed through DRAGON or a series of RDkit functions to calculate molecular descriptors.

Supervised machine learning algorithms were trained (using the calculated descriptors labelled with their molecules' HLCs) to predict the constants.

  • 7 ML algorithms
  • 4 feature selection methods
  • 6 sets of descriptors

Dependancies

  • Jupyter notebooks, with the following python packages installed:
    • pandas (data structures)
    • numpy (maths)
    • statsmodels.api (stats)
    • cirpy (conversion between chemical identifiers)
    • ipywidgets and IPython.display (widgets and nicer outputs)
    • RDKit (descriptors)
    • matplotlib.pyplot (visualisation)
    • scikit-learn (models, feature selection, PCA)
    • joblib (saving python objects)
    • mpld3 (hover-over labels for plots)
  • DRAGON 6 (not within python, external software for descriptor calculation)