Skip to content

A comprehensive repository dedicated to the collection and exploration of studies utilizing Large Language Models for molecular design, protein research, and material science.

License

Notifications You must be signed in to change notification settings

HHW-zhou/LLM4Mol

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 

Repository files navigation

LLM4Mol

LLM(Large Language Model)4Mol is a comprehensive repository dedicated to the collection and exploration of studies utilizing large language models for molecular design, protein research, and material science. This repository serves as a central hub for researchers, scientists, and enthusiasts interested in leveraging the power of language models for advancing our understanding and applications in these domains. Discover state-of-the-art techniques, novel approaches, and cutting-edge research papers that harness the potential of AI-powered language models in unraveling the complexities of Biomedical Text, RNA/DNA, Molecules, Peptides, Proteins, Antibody, and Materials. Join our vibrant community and contribute to the exciting advancements in the field of LLM4Mol!

🔔Updating ...

Recommendations and references

Generative AI and Deep Learning for molecular/drug design
https://github.com/AspirinCode/papers-for-molecular-design-using-DL

List of papers about Proteins Design using Deep Learning
https://github.com/Peldom/papers_for_protein_design_using_DL

Large Language Models in Chemistry
https://github.com/alxfgh/Large-Language-Models-in-Chemistry

Menu

LLM4Biomedical Text

  • Opportunities and Challenges for ChatGPT and Large Language Models in Biomedicine and Health [2023]
    Tian, Shubo, Qiao Jin, Lana Yeganova, Po-Ting Lai, Qingqing Zhu, Xiuying Chen, Yifan Yang et al.
    arXiv:2306.10070 (2023)

  • Large language models are universal biomedical simulators [2023]
    Schaefer, Moritz, Stephan Reichl, Rob ter Horst, Adele M. Nicolas, Thomas Krausgruber, Francesco Piras, Peter Stepper, Christoph Bock, and Matthias Samwald.
    bioRxiv (2023) | code

  • Fine-tuning large neural language models for biomedical natural language processing [2023]
    Tinn, Robert, Hao Cheng, Yu Gu, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon.
    Patterns 4.4 (2023) | code

  • A Platform for the Biomedical Application of Large Language Models [2023]
    Lobentanzer, Sebastian, and Julio Saez-Rodriguez.
    arXiv:2305.06488v2 | code

  • Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations [2023]
    Chen, Qingyu, Jingcheng Du, Yan Hu, Vipina Kuttichi Keloth, Xueqing Peng, Kalpana Raja, Rui Zhang, Zhiyong Lu, and Hua Xu.
    arXiv:2305.16326v1 | code

  • BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks [2023]
    Zhang, K., Yu, J., Yan, Z., Liu, Y., Adhikarla, E., Fu, S., ... & Sun, L.
    arXiv:2305.17100v1 | code

  • BioMedLM: a Domain-Specific Large Language Model for Biomedical Text [2022]
    Paper | code

LLM4Small Molecule

  • Empowering Molecule Discovery for Molecule-Caption Translation with LargeLanguage Models: A ChatGPT Perspective [2023]
    Jiatong Li, Yunqing Liu, Wenqi Fan, Xiao-Yong Wei, Hui Liu, Jiliang Tang, Qing Li
    arXiv:2306.06615 (2023) | code

  • Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language [2023]
    Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, Huajun Chen
    arXiv:2303.03363 (2023) | code

  • Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models [2023]
    Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, Huajun Chen
    arXiv:2306.08018v1 | code

  • MolReGPT: Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective [2023]
    Li, Jiatong, Yunqing Liu, Wenqi Fan, Xiao-Yong Wei, Hui Liu, Jiliang Tang, and Qing Li.
    arXiv:2306.06615v1 | code

LLM4RNA/DNA

  • HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution [2023]
    Eric Nguyen, Michael Poli, Marjan Faizi, Armin Thomas, Callum Birch-Sykes, Michael Wornow, Aman Patel, Clayton Rabideau, Stefano Massaroli, Yoshua Bengio, Stefano Ermon, Stephen A. Baccus, Chris Ré.
    arXiv:2306.15794v1

  • DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome [2021]
    Ji, Yanrong, Zhihan Zhou, Han Liu, and Ramana V. Davuluri.
    Bioinformatics 37.15 (2021) | code

LLM4Peptide

LLM4Protein

  • Protein-Protein Interaction Prediction is Achievable with Large Language Models [2023]
    Hallee, Logan, and Jason P. Gleghorn.
    bioRxiv (2023)

  • Prediction of virus-host association using protein language models and multiple instance learning [2023]
    Liu, Dan, Francesca Young, David L. Robertson, and Ke Yuan.
    bioRxiv (2023) | code

  • Large language models generate functional protein sequences across diverse families [2023]
    Madani, Ali, Ben Krause, Eric R. Greene, Subu Subramanian, Benjamin P. Mohr, James M. Holton, Jose Luis Olmos Jr et al.
    Nat Biotechnol (2023) | code

LLM4Antibody

  • On Pre-training Language Model for Antibody [2023]
    Wang, Danqing, Y. E. Fei, and Hao Zhou.
    ICLR (2023) | code

  • Efficient evolution of human antibodies from general protein language models [2023]
    Hie, Brian L., Varun R. Shanker, Duo Xu, Theodora UJ Bruun, Payton A. Weidenbacher, Shaogeng Tang, Wesley Wu, John E. Pak, and Peter S. Kim.
    Nat Biotechnol (2023) | code

  • AbLang: an antibody language model for completing antibody sequences [2022]
    Olsen, Tobias H., Iain H. Moal, and Charlotte M. Deane.
    Bioinformatics Advances (2022) | code

LLM4Clinical

  • Matching Patients to Clinical Trials with Large Language Models [2023]
    Jin, Qiao, Zifeng Wang, Charalampos S. Floudas, Jimeng Sun, and Zhiyong Lu.
    arXiv:2307.15051 (2023)

  • ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation [2023]
    Wang, Danqing, Y. E. Fei, and Hao Zhou.
    arXiv:2306.09968v1

LLM4Chemistry

  • ChemCrow: Augmenting large-language models with chemistry tools [2023]
    Bran, Andres M., Sam Cox, Andrew D. White, and Philippe Schwaller.
    arXiv:2304.05376 (2023) | code

LLM4Material

  • Large Language Models as Master Key: Unlocking the Secrets of Materials Science with GPT [2023]
    Xie, Tong, Yuwei Wa, Wei Huang, Yufei Zhou, Yixuan Liu, Qingyuan Linghu, Shaozhou Wang, Chunyu Kit, Clara Grazian, and Bram Hoex.
    arXiv:2304.02213v5

  • MatSciBERT: A materials domain language model for text mining and information extraction [2022]
    Gupta, Tanishq, Mohd Zaki, NM Anoop Krishnan, and Mausam.
    npj Comput Mater 8, 102 (2022) | code

About

A comprehensive repository dedicated to the collection and exploration of studies utilizing Large Language Models for molecular design, protein research, and material science.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published