Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit valence issue with rdkit #60

Open
wutobias opened this issue May 12, 2021 · 5 comments
Open

Explicit valence issue with rdkit #60

wutobias opened this issue May 12, 2021 · 5 comments

Comments

@wutobias
Copy link

Hi folks,

please have a look at the following code snippet:

import qcportal as ptl
import cmiles

qc_client  = ptl.FractalClient("https://api.qcarchive.molssi.org:443/")
ds         = qc_client.get_collection('OptimizationDataset', "FDA Optimization Dataset 1")
qcrecord   = ds.get_record("C[NH2+]C[C@@H](c1ccc(c(c1)O)O)O-0", 
                           specification="default")
qcmol      = qcrecord.get_final_molecule()
oemol      = cmiles.utils.load_molecule(qcmol.dict(), toolkit="openeye")
rdmol      = cmiles.utils.load_molecule(qcmol.dict(), toolkit="rdkit")

The oemol is returned successfully, however rdmol cannot be created due to valence issues with N atoms: RDKit ERROR: [13:24:10] Explicit valence for atom # 9 N, 4, is greater than permitted. I think this is fixable with the following code block inserted after line 44 in cmiles/_cmiles_rd.py (not elegant though, won't cover general valence issues):

    # Fix N connectivity issue
    for atom_idx in range(em.GetNumAtoms()):
        atom = em.GetAtomWithIdx(atom_idx)
        if atom.GetAtomicNum() == 7:
            total_bo = 0.
            for bond in atom.GetBonds():
                total_bo += bond.GetBondTypeAsDouble()
            if total_bo > 3.:
                atom.SetFormalCharge(1)

Regards,
Tobias

@wutobias
Copy link
Author

Found this PR that seems to target the same issue: #37
Not sure about the details of this particular solution but it seems that the error message from rdkit is parsed in order to identify the offending N atom. This seems to be fragile, since rdkit might change the formatting of error messages in the future. The above solution explicitly counts the number of bonded neighbors, which should be more stable.

@j-wags
Copy link
Member

j-wags commented May 14, 2021

Sorry for the delay on this -- CMILES is effectively deprecated, though we've done a really bad job of communicating it. Its functionality has been migrated to the actively-supported OpenFF toolkit and QCSubmit packages.

QCA molecules submitted by OpenFF now all have CMILES records attached to them, which explicitly contain the connection graph (including things like bond order and formal charge). So we shouldn't need to do any guessing about nitrogens.

We need to put out a migration guide for users for how to do all their favorite CMILES workflows using these two other tools. In the meantime, here's a little code snippet I put together to pull down initial and final molecules using the OpenFF toolkit. I think that there's a more streamlined pathway either already available in QCSubmit, or that will be available in the next release.

from openff.toolkit.topology import Molecule
import pint
from simtk.openmm import unit 
import numpy as np

import qcportal as ptl
from openff.toolkit.topology import Molecule

punit = pint.UnitRegistry()

# Some stuff I scapped together form other code bits. I don't understand how this works. 
client= ptl.FractalClient()
ds = client.get_collection('OptimizationDataset', "FDA Optimization Dataset 1")
entry = ds.get_entry('C[NH2+]C[C@@H](c1ccc(c(c1)O)O)O-0')
qcrecord   = ds.get_record("C[NH2+]C[C@@H](c1ccc(c(c1)O)O)O-0", 
                           specification="default")

initial_qcmol = client.query_molecules(entry.initial_molecule)[0]
final_qcmol = qcrecord.get_final_molecule()

# set conformer qcmol geometry
initial_geometry = unit.Quantity(
        np.array(initial_qcmol.geometry, float), unit.bohr
    )
final_geometry = unit.Quantity(
        np.array(final_qcmol.geometry, float), unit.bohr
    )



# Make a single mol with both conformers so we can visualize it in nglview
combined_offmol = Molecule.from_qcschema(entry)
combined_offmol.add_conformer(initial_geometry.in_units_of(unit.angstrom))
combined_offmol.add_conformer(final_geometry.in_units_of(unit.angstrom))

# Make OE and RDKit mols from this combined offmol
oemol = combined_offmol.to_openeye()
rdmol = combined_offmol.to_rdkit()

combined_offmol.visualize(backend='nglview')

@wutobias
Copy link
Author

Thanks for looking into this. I am now changing my workflow towards Molecule.from_qcschema.

@jchodera
Copy link
Member

Sorry for the delay on this -- CMILES is effectively deprecated, though we've done a really bad job of communicating it

@j-wags : Would it make sense to add a note at the top of the https://github.com/openforcefield/cmiles repo README to communicate this?

@j-wags
Copy link
Member

j-wags commented May 14, 2021

Good call. I've just added a notice and some references to new API points to the top of the README.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants