Chemical-Converters

Remember, chemistry is not just about reactions; it's about connections. Let's build those connections together! 💫

Library for translating chemical names

Table of Contents

Introduction
Models
Quickstart
Citation

Introduction

Chemical-Converters serves as a foundational showcase of our technological capabilities within the chemical domain. The available models, which could be used in this library, represent our entry-level offerings, designed to provide a glimpse into the potential applications of our advanced solutions. For access to our comprehensive suite of larger and more precise models, we invite interested parties to e ngage directly with us.

Developed by the brilliant minds at Knowledgator, the library showcases the abilities of our chemical transformer models. Whether you're working on a research project, studying for an exam, or just exploring the chemical universe, Chemical-Converters is your go-to tool 🛠.

Models

The models` architecture is based on Google MT5 with certain modification to support different inputs and outputs. All available models are presented in the table:

Model	Accuracy	Size(MB)	Task
SMILES2IUPAC-canonical-small	75%	24	SMILES to IUPAC
SMILES2IUPAC-canonical-base	86.9%	180	SMILES to IUPAC
IUPAC2SMILES-canonical-small	88.9%	24	IUPAC to SMILES
IUPAC2SMILES-canonical-base	93.7%	180	IUPAC to SMILES

also, you can check the most resent models within the library:

from chemicalconverters import NamesConverter

print(NamesConverter.available_models())

{'knowledgator/SMILES2IUPAC-canonical-small': 'Small model for converting canonical SMILES to IUPAC with accuracy 75%, does not support isomeric or isotopic SMILES', 'knowledgator/SMILES2IUPAC-canonical-base': 'Medium model for converting canonical SMILES to IUPAC with accuracy 87%, does not support isomeric or isotopic SMILES', 'knowledgator/IUPAC2SMILES-canonical-small': 'Small model for converting IUPAC to canonical SMILES with accuracy 89%, does not support isomeric or isotopic SMILES', 'knowledgator/IUPAC2SMILES-canonical-base': 'Medium model for converting IUPAC to canonical SMILES with accuracy 94%, does not support isomeric or isotopic SMILES'}

Quickstart

Firstly, install the library:

pip install chemical-converters

SMILES to IUPAC

You can choose pretrained model from table in the section "Models", but we recommend to use model "knowledgator/SMILES2IUPAC-canonical-base".

! Preferred IUPAC style

To choose the preferred IUPAC style, place style tokens before your SMILES sequence.

Style Token	Description
`<BASE>`	The most known name of the substance, sometimes is the mixture of traditional and systematic style
`<SYST>`	The totally systematic style without trivial names
`<TRAD>`	The style is based on trivial names of the parts of substances

To perform simple translation, follow the example:

from chemicalconverters import NamesConverter

converter = NamesConverter(model_name="knowledgator/SMILES2IUPAC-canonical-base")
print(converter.smiles_to_iupac('CCO'))
print(converter.smiles_to_iupac(['<SYST>CCO', '<TRAD>CCO', '<BASE>CCO']))

['ethanol']
['ethanol', 'ethanol', 'ethanol']

Processing in batches:

from chemicalconverters import NamesConverter

converter = NamesConverter(model_name="knowledgator/SMILES2IUPAC-canonical-base")
print(converter.smiles_to_iupac(["<BASE>C=CC=C" for _ in range(10)], num_beams=1, 
                                process_in_batch=True, batch_size=1000))

['buta-1,3-diene', 'buta-1,3-diene'...]

Validation SMILES to IUPAC translations

It's possible to validate the translations by reverse translation into IUPAC and calculating Tanimoto similarity of two molecules fingerprints.

from chemicalconverters import NamesConverter

converter = NamesConverter(model_name="knowledgator/SMILES2IUPAC-canonical-base")
print(converter.smiles_to_iupac('CCO', validate=True))

['ethanol'] 1.0

The larger is Tanimoto similarity, the more is probability, that the prediction was correct.

You can also process validation manually:

from chemicalconverters import NamesConverter

validation_model = NamesConverter(model_name="knowledgator/IUPAC2SMILES-canonical-base")
print(NamesConverter.validate_iupac(input_sequence='CCO', predicted_sequence='ethanol', validation_model=validation_model))

1.0

!Note validation was not implemented in processing in batches.

IUPAC to SMILES

You can choose pretrained model from table in the section "Models", but we recommend to use model "knowledgator/IUPAC2SMILES-canonical-base".

To perform simple translation, follow the example:

from chemicalconverters import NamesConverter

converter = NamesConverter(model_name="knowledgator/IUPAC2SMILES-canonical-base")
print(converter.iupac_to_smiles('ethanol'))
print(converter.iupac_to_smiles(['ethanol', 'ethanol', 'ethanol']))

['CCO']
['CCO', 'CCO', 'CCO']

Processing in batches:

from chemicalconverters import NamesConverter

converter = NamesConverter(model_name="knowledgator/IUPAC2SMILES-canonical-base")
print(converter.iupac_to_smiles(["buta-1,3-diene" for _ in range(10)], num_beams=1, 
                                process_in_batch=True, batch_size=1000))

['<SYST>C=CC=C', '<SYST>C=CC=C'...]

Our models also predict IUPAC styles from the table:

Style Token	Description
`<BASE>`	The most known name of the substance, sometimes is the mixture of traditional and systematic style
`<SYST>`	The totally systematic style without trivial names
`<TRAD>`	The style is based on trivial names of the parts of substances

Citation

Coming soon.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
chemicalconverters		chemicalconverters
logos		logos
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chemical-Converters

Library for translating chemical names

Introduction

Models

Quickstart

SMILES to IUPAC

! Preferred IUPAC style

To perform simple translation, follow the example:

Processing in batches:

Validation SMILES to IUPAC translations

IUPAC to SMILES

To perform simple translation, follow the example:

Processing in batches:

Citation

About

Releases 1

Packages

Contributors 2

Languages

License

Knowledgator/chemical-converters

Folders and files

Latest commit

History

Repository files navigation

Chemical-Converters

Library for translating chemical names

Introduction

Models

Quickstart

SMILES to IUPAC

! Preferred IUPAC style

To perform simple translation, follow the example:

Processing in batches:

Validation SMILES to IUPAC translations

IUPAC to SMILES

To perform simple translation, follow the example:

Processing in batches:

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages