This research presents AlloyBERT, a transformer encoder model tailored for predicting properties like elastic modulus and yield strength of alloys based on textual inputs.
Clone the repository
$ git clone https://github.com/cakshat/AlloyBERT.git
cd AlloyBERT
For this research, we utilized two primary datasets to explore the performance of transformer models compared to shallow machine learning models in predicting target property values with text inputs.
- Multi Principal Elemental Alloys (MPEA) dataset: This dataset, sourced from Citrine Informatics, contains mechanical properties of several alloys. We focused on predicting the experimental Young’s modulus, and the dataset comprises 1546 entries.
- Refractory Alloy Yield Strength (RAYS) dataset: This dataset includes experimental yield strength values for refractory alloys. With 813 entries, it provides alloy composition, testing temperature from previous literature, and data from the MPEA30–32 dataset. The dataset offers average yield strength values obtained from various processing methods.
Both the datasets can be found in the data folder as : cd data/MPEA/MPEA.csv
and cd data/ys_clean/ys_clean.csv
.
- Update the
config.yaml
file with desired parameters. - Run
python main.py
to train the model. - While pretraining make sure to set the configuration to pretrain.
- After pretraining, update the path of pretrained model and change mode to finetune.
- Our custom trained tokenizer which was used for training can be found in tokenizer folder and can be used if required.