Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add universal identifier #5

Open
ChayaSt opened this issue Nov 11, 2018 · 3 comments
Open

Add universal identifier #5

ChayaSt opened this issue Nov 11, 2018 · 3 comments
Labels
enhancement New feature or request

Comments

@ChayaSt
Copy link
Collaborator

ChayaSt commented Nov 11, 2018

Openeye now supports a universal identifier (canonical protomer). This will be added to the identifiers generated by cmiles so that all protomers of the same compound can be indexed with the same identifier.
https://www.eyesopen.com/news/openeye-toolkits-v2018.oct

@ChayaSt ChayaSt added the enhancement New feature or request label Nov 11, 2018
@ChayaSt
Copy link
Collaborator Author

ChayaSt commented Nov 25, 2018

I added more extensive testing (#9 ) of the new OpenEye unique protomer and found that there are some tautomers and mesomers it does not capture.
Here is a Jupyter notebook that looks at some tautomer classes.
https://github.com/openforcefield/cmiles/blob/id/examples/Tautomers.ipynb

In addition, you need the OpenEye license to use this so I was wondering if we can use the InChI for this purpose.

The structure of InChI:

  1. The first layer is the chemical formula - so this will be the same for all tautomers
  2. The second layer is the connectivity. This can have up to 3 sublayers.
    1. All bonds to non-bridging hydrogens atoms (same for most tautomers)
    2. Bonds of immobile H-atoms (This will be different for keto-enol tautomers)
    3. H-atoms that can be found in more than one location. This should technically be the same for all tautomers but that is not always the case.

If we use the chemical formula and the first sublayer of the connectivity, we should be able to capture more tautomers. It would be nice if we'd be able to use the first 14 characters of the InChI key, but that uses the entire connectivity layer for hashing so they differ for some tautomers.

We might be able to use the new Mixture InhChI, however, the problem will be enumerating all tautomers a priori.

@ChayaSt
Copy link
Collaborator Author

ChayaSt commented Nov 27, 2018

I'm looking into the FiCuS identifier which can be accessed here.
Some slides I found on this identifier:

@ChayaSt
Copy link
Collaborator Author

ChayaSt commented Nov 28, 2018

Partially addressed by #9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant