This project is a Variational Autoencoder (VAE)-based molecular SMILES string generator. It generates molecules composed of CHOH/CH2OH (referred to as A) and CH/CH2/CH3 (referred to as B) repeat units. The generated molecules are saturated and contain no rings.
The project consists of the following Python scripts:
VAE.py
: Defines the VAE model and includes functions for training and testing the model.generate.py
: Generates new SMILES strings by perturbing the latent space of the trained VAE.interpolate.py
: Generates interpolated SMILES strings between two given SMILES strings using the latent space of the trained VAE.synthetic_dataset.py
: Generates a synthetic dataset of SMILES strings based on specified constraints.
- Generates over 100,000 synthetic SMILES strings.
- Only A and B repeat units are included.
- No molecule contains more than six consecutive A repeat units.
- All molecules in the dataset are saturated and contain no rings.
-
Clone the repository:
git clone https://github.com/DaoyuanLi2816/Molecule-Generator.git cd Molecule-Generator
-
Install the required dependencies:
pip install -r requirements.txt
-
Ensure you have RDKit installed. RDKit is required for molecular operations. Installation instructions can be found here.
To generate a synthetic dataset of SMILES strings, run synthetic_dataset.py
:
python synthetic_dataset.py
This will create a CSV file named molecules.csv
containing the generated SMILES strings.
To train the VAE model, run VAE.py
:
python VAE.py
This will train the VAE model on the generated dataset and save the trained model as beta_tc_vae_model.pth
.
To generate new SMILES strings using the trained VAE model, run generate.py
:
python generate.py
This will output new SMILES strings generated by perturbing the latent space of the trained VAE.
To generate interpolated SMILES strings between two given SMILES strings, run interpolate.py
:
python interpolate.py
This will output SMILES strings that are interpolations between the two input SMILES strings in the latent space of the trained VAE.
If you would like to contribute to this project, please open an issue or submit a pull request. We welcome contributions from the community.
This project is licensed under the MIT License. See the LICENSE file for details.