Rowan pKa - Supplementary Information

This repository contains the supporting information for Rowan's recent preprint on pKa prediction. We hope that this collection of test datasets can be useful for future work in pKa prediction.

Fitting

The fit dataset was adapted from Thapa and Raghavachari, filtering out any SMILES strings that could not be parsed by RDKit. This resulted in 215 molecules and associated pKa values, which can be found in TR215.csv.

Evaluation

Eight different datasets used to benchmark Rowan pKa are included in assays/, and the results can be visualized by running plot_assay.ipynb. Here's where the data comes from:

SAMPL6 (`assays/SAMPL6.csv`)

Data for SAMPL6 was obtained from the SAMPL6 Github repository. We compared the experimentally measured macroscopic pKa values to the microscopic pKa values computed by Rowan, considering only the most acidic and basic microscopic sites on each molecule: we compared each one to the closest macroscopic value, consistent with the matching procedures detailed here. This had the effect of excluding doubly ionized microstates.

SAMPL7 (`assays/SAMPL7.csv`)

Data for SAMPL7 was obtained from the SAMPL7 Github repository. Since only one pKa value was obtained for each molecule, assignment was straightforward.