Skip to content

rowansci/Rowan-pKa-SI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rowan pKa - Supplementary Information

This repository contains the supporting information for Rowan's recent preprint on pKa prediction. We hope that this collection of test datasets can be useful for future work in pKa prediction.

Fitting

The fit dataset was adapted from Thapa and Raghavachari, filtering out any SMILES strings that could not be parsed by RDKit. This resulted in 215 molecules and associated pKa values, which can be found in TR215.csv.

Evaluation

Eight different datasets used to benchmark Rowan pKa are included in assays/, and the results can be visualized by running plot_assay.ipynb. Here's where the data comes from:

SAMPL6 (assays/SAMPL6.csv)

Data for SAMPL6 was obtained from the SAMPL6 Github repository. We compared the experimentally measured macroscopic pKa values to the microscopic pKa values computed by Rowan, considering only the most acidic and basic microscopic sites on each molecule: we compared each one to the closest macroscopic value, consistent with the matching procedures detailed here. This had the effect of excluding doubly ionized microstates.

SAMPL7 (assays/SAMPL7.csv)

Data for SAMPL7 was obtained from the SAMPL7 Github repository. Since only one pKa value was obtained for each molecule, assignment was straightforward.

Amine Oxetanes (assays/amine_oxetane.csv)

Data were obtained from this paper.

α-CF3 Bridged Bicyclic Amines (assays/enamine_aCF3_bicycles.csv)

Data were obtained from this paper.

Aromatic N-Heterocycles (assays/ArN.csv)

Data were obtained from this paper.

Folate Inhibitors (assays/di_kerns_folate.csv)

Data were obtained from Drug-Like Properties: Concepts, Structure Design and Methods from ADME to Toxicity Optimization, by Li Di and Edward Kerns.

BACE1 Inhibitors (assays/bace.csv)

Data were obtained from this paper.

Tricyclic Thrombin Inhibitors (assays/TCT.csv)

Data were obtained from this paper.

Corin Wagen, 2024