Remember, chemistry is not just about reactions; it's about connections. Let's build those connections together! 💫
Table of Contents
Chemical-Converters serves as a foundational showcase of our technological capabilities within the chemical domain. The available models, which could be used in this library, represent our entry-level offerings, designed to provide a glimpse into the potential applications of our advanced solutions. For access to our comprehensive suite of larger and more precise models, we invite interested parties to e ngage directly with us.
Developed by the brilliant minds at Knowledgator, the library showcases the abilities of our chemical transformer models. Whether you're working on a research project, studying for an exam, or just exploring the chemical universe, Chemical-Converters is your go-to tool 🛠.
The models` architecture is based on Google MT5 with certain modification to support different inputs and outputs. All available models are presented in the table:
Model | Accuracy | Size(MB) | Task |
---|---|---|---|
SMILES2IUPAC-canonical-small | 75% | 24 | SMILES to IUPAC |
SMILES2IUPAC-canonical-base | 86.9% | 180 | SMILES to IUPAC |
IUPAC2SMILES-canonical-small | 88.9% | 24 | IUPAC to SMILES |
IUPAC2SMILES-canonical-base | 93.7% | 180 | IUPAC to SMILES |
also, you can check the most resent models within the library:
from chemicalconverters import NamesConverter
print(NamesConverter.available_models())
{'knowledgator/SMILES2IUPAC-canonical-small': 'Small model for converting canonical SMILES to IUPAC with accuracy 75%, does not support isomeric or isotopic SMILES', 'knowledgator/SMILES2IUPAC-canonical-base': 'Medium model for converting canonical SMILES to IUPAC with accuracy 87%, does not support isomeric or isotopic SMILES', 'knowledgator/IUPAC2SMILES-canonical-small': 'Small model for converting IUPAC to canonical SMILES with accuracy 89%, does not support isomeric or isotopic SMILES', 'knowledgator/IUPAC2SMILES-canonical-base': 'Medium model for converting IUPAC to canonical SMILES with accuracy 94%, does not support isomeric or isotopic SMILES'}
Firstly, install the library:
pip install chemical-converters
You can choose pretrained model from table in the section "Models", but we recommend to use model "knowledgator/SMILES2IUPAC-canonical-base".
To choose the preferred IUPAC style, place style tokens before your SMILES sequence.
Style Token | Description |
---|---|
<BASE> |
The most known name of the substance, sometimes is the mixture of traditional and systematic style |
<SYST> |
The totally systematic style without trivial names |
<TRAD> |
The style is based on trivial names of the parts of substances |
from chemicalconverters import NamesConverter
converter = NamesConverter(model_name="knowledgator/SMILES2IUPAC-canonical-base")
print(converter.smiles_to_iupac('CCO'))
print(converter.smiles_to_iupac(['<SYST>CCO', '<TRAD>CCO', '<BASE>CCO']))
['ethanol']
['ethanol', 'ethanol', 'ethanol']
from chemicalconverters import NamesConverter
converter = NamesConverter(model_name="knowledgator/SMILES2IUPAC-canonical-base")
print(converter.smiles_to_iupac(["<BASE>C=CC=C" for _ in range(10)], num_beams=1,
process_in_batch=True, batch_size=1000))
['buta-1,3-diene', 'buta-1,3-diene'...]
It's possible to validate the translations by reverse translation into IUPAC and calculating Tanimoto similarity of two molecules fingerprints.
from chemicalconverters import NamesConverter
converter = NamesConverter(model_name="knowledgator/SMILES2IUPAC-canonical-base")
print(converter.smiles_to_iupac('CCO', validate=True))
['ethanol'] 1.0
The larger is Tanimoto similarity, the more is probability, that the prediction was correct.
You can also process validation manually:
from chemicalconverters import NamesConverter
validation_model = NamesConverter(model_name="knowledgator/IUPAC2SMILES-canonical-base")
print(NamesConverter.validate_iupac(input_sequence='CCO', predicted_sequence='ethanol', validation_model=validation_model))
1.0
!Note validation was not implemented in processing in batches.
You can choose pretrained model from table in the section "Models", but we recommend to use model "knowledgator/IUPAC2SMILES-canonical-base".
from chemicalconverters import NamesConverter
converter = NamesConverter(model_name="knowledgator/IUPAC2SMILES-canonical-base")
print(converter.iupac_to_smiles('ethanol'))
print(converter.iupac_to_smiles(['ethanol', 'ethanol', 'ethanol']))
['CCO']
['CCO', 'CCO', 'CCO']
from chemicalconverters import NamesConverter
converter = NamesConverter(model_name="knowledgator/IUPAC2SMILES-canonical-base")
print(converter.iupac_to_smiles(["buta-1,3-diene" for _ in range(10)], num_beams=1,
process_in_batch=True, batch_size=1000))
['<SYST>C=CC=C', '<SYST>C=CC=C'...]
Our models also predict IUPAC styles from the table:
Style Token | Description |
---|---|
<BASE> |
The most known name of the substance, sometimes is the mixture of traditional and systematic style |
<SYST> |
The totally systematic style without trivial names |
<TRAD> |
The style is based on trivial names of the parts of substances |
Coming soon.