Skip to content

aaronma2020/MSGO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MSGO

This repository provieds data and methods in the paper:
Pseudodata-based molecular structure generator to reveal unknown chemicals (under review)

Authors: Nanyang Yu, Zheng Ma, Qi Shao, Laihui Li, Xuebing Wang, and Si Wei*

Setup

Environment

Python: 3.7
Torch: 1.7.1

Data

We provied For Training, we use 30k+ pseudo smiles-specturm pairs generated by cfmid (you can download the raw smiles lists file here). For evaluation, we use 300+ real specturm to verify our method (download here). For evaluation in real samples,we use one LC–QTOF dataset for wastewater samples to verify our model (download here, code: gmas).

Model weights

We provide the MSGO model (pfas, code: 0bfg; lipid, code: 37it) trained use pseudo smiles-specturm pairs with whole methods mentioned in paper. you also can train you own model with other methods.

Training

You can replicate our experiment, including all the techniques:

python tools/train.py --id all_trick --user_precurso 1 -- use_mask 1 --use_formual 1

More options can be viewed in opt.py

Evaluation

Download the model weights in ckpts/pfas or ckpts/lipid, run

python tools/eval.py --log_path [ckpt/pfas or ckpts/lipid]

Predict real data

We provide example data in data/example.

For pfas, run :

python tools/eval_standard.py --log_path ckpts/pfas --real_csv ./data/example/pfas.csv --out_csv ./pfas_results.csv --beam_size 500 --polar neg

For lipid, run:

python tools/eval_standard.py --log_path ckpts/lipid --real_csv ./data/example/lipid.csv --out_csv ./lipid_results.csv --beam_size 300 --polar pos

Then you can obatin a results csv file inluding top 10 predicts.


Todos

  • Release model weights
  • Release pseudo and real data
  • Release training process

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages