BioGPT

This repository contains the implementation of BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining, by Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu.

News!

BioGPT-Large model with 1.5B parameters is coming, currently available on PubMedQA task with SOTA performance of 81% accuracy. See Question Answering on PubMedQA for evaluation.

Requirements and Installation

PyTorch version == 1.12.0
Python version == 3.10
fairseq version == 0.12.0:

git clone https://github.com/pytorch/fairseq
cd fairseq
git checkout v0.12.0
pip install .
python setup.py build_ext --inplace
cd ..

Moses

git clone https://github.com/moses-smt/mosesdecoder.git
export MOSES=${PWD}/mosesdecoder

fastBPE

git clone https://github.com/glample/fastBPE.git
export FASTBPE=${PWD}/fastBPE
cd fastBPE
g++ -std=c++11 -pthread -O3 fastBPE/main.cc -IfastBPE -o fast

sacremoses

pip install sacremoses

sklearn

pip install scikit-learn

Remember to set the environment variables MOSES and FASTBPE to the path of Moses and fastBPE respetively, as they will be required later.

Getting Started

Pre-trained models

We provide our pre-trained BioGPT model checkpoint along with fine-tuned checkpoints for downstream tasks

Model	Description	URL
BioGPT	Pre-trained BioGPT model checkpoint	link
BioGPT-Large	Pre-trained BioGPT-Large model checkpoint	link
BioGPT-QA-PubMedQA-BioGPT	Fine-tuned BioGPT for question answering task on PubMedQA	link
BioGPT-QA-PubMEDQA-BioGPT-Large	Fine-tuned BioGPT-Large for question answering task on PubMedQA	link
BioGPT-RE-BC5CDR	Fine-tuned BioGPT for relation extraction task on BC5CDR	link
BioGPT-RE-DDI	Fine-tuned BioGPT for relation extraction task on DDI	link
BioGPT-RE-DTI	Fine-tuned BioGPT for relation extraction task on KD-DTI	link
BioGPT-DC-HoC	Fine-tuned BioGPT for document classification task on HoC	link

Download them and extract them to the checkpoints folder of this project.

For example:

mkdir checkpoints
cd checkpoints
wget https://msramllasc.blob.core.windows.net/modelrelease/BioGPT/checkpoints/Pre-trained-BioGPT.tgz
tar -zxvf Pre-trained-BioGPT.tgz

Example Usage

Use pre-trained BioGPT model in your code:

import torch
from fairseq.models.transformer_lm import TransformerLanguageModel
m = TransformerLanguageModel.from_pretrained(
        "checkpoints/Pre-trained-BioGPT", 
        "checkpoint.pt", 
        "data",
        tokenizer='moses', 
        bpe='fastbpe', 
        bpe_codes="data/bpecodes",
        min_len=100,
        max_len_b=1024)
m.cuda()
src_tokens = m.encode("COVID-19 is")
generate = m.generate([src_tokens], beam=5)[0]
output = m.decode(generate[0]["tokens"])
print(output)

Use fine-tuned BioGPT model on KD-DTI for drug-target-interaction in your code:

import torch
from src.transformer_lm_prompt import TransformerLanguageModelPrompt
m = TransformerLanguageModelPrompt.from_pretrained(
        "checkpoints/RE-DTI-BioGPT", 
        "checkpoint_avg.pt", 
        "data/KD-DTI/relis-bin",
        tokenizer='moses', 
        bpe='fastbpe', 
        bpe_codes="data/bpecodes",
        max_len_b=1024,
        beam=1)
m.cuda()
src_text="" # input text, e.g., a PubMed abstract
src_tokens = m.encode(src_text)
generate = m.generate([src_tokens], beam=args.beam)[0]
output = m.decode(generate[0]["tokens"])
print(output)

For more downstream tasks, please see below.

Downstream tasks

See corresponding folder in examples:

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
data		data
examples		examples
scripts		scripts
src		src
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
inference.py		inference.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioGPT

News!

Requirements and Installation

Getting Started

Pre-trained models

Example Usage

Downstream tasks

Relation Extraction on BC5CDR

License

jqmcginnis/BioGPT

Folders and files

Latest commit

History

Repository files navigation

BioGPT

News!

Requirements and Installation

Getting Started

Pre-trained models

Example Usage

Downstream tasks

Relation Extraction on BC5CDR