PMC-CLIP

The dataset and checkpoint is available at Huggingface, Baidu Cloud(key: 3iqf).

PMC-CLIP

Usage

Repo Structure

src/:
    |--setup.py
    |--pmc_clip/
    |   |--loss/
    |   |--model/: PMC-CLIP model and variants
    |   |--model_configs/
    |   |--factory.py: Create model according to configs
    |   |--transform.py: data augmentation
    |--training/
    |   |--main.py
    |   |--scheduler.py: Learning rate scheduler
    |   |--train.py
    |   |--evaluate.py
    |   |--data.py
    |   |--params.py
docs/: project pages

1. Create Environment

conda create -n pmc_clip python=3.8
conda activate pmc_clip

pip install -r requirements.txt
# pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt

python setup.py develop  # install pmc_clip with dev mode

2. Prepare Dataset

Download from Huggingface, Baidu Cloud(key: 3iqf). Or follow the Pipeline of PMC-OA Development if you want to start from scratch.

3. Training

Single GPU

python -m training.main \
--dataset-type "csv" --csv-separator "," --save-frequency 5 \
--report-to tensorboard \
--train-data="path/to/train.csv" --val-data="path/to/valid.csv" \
--csv-img-key image --csv-caption-key caption \
--warmup 500 --batch-size=8 --lr=1e-4 --wd=0.1 --epochs=100 --workers=8 \
--model RN50_fusion4 --hugging-face --mlm --crop-scale 0.5

Multi GPU

CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --rdzv_endpoint=$HOSTE_NODE_ADDR -m training.main \
--dataset-type "csv" --csv-separator "," --save-frequency 5 \
--report-to tensorboard \
--train-data="path/to/train.csv" --val-data="path/to/valid.csv" \
--csv-img-key image --csv-caption-key caption \
--warmup 500 --batch-size=128 --lr=1e-4 --wd=0.1 --epochs=100 --workers=8 \
--model RN50_fusion4 --hugging-face --mlm --crop-scale 0.5

4. Evaluation

Load checkpoint and eval on 2k samples from testset.

python -m training.main \
--dataset-type "csv" --csv-separator "," --report-to tensorboard \
--val-data="path/to/test.csv" \
--csv-img-key image --csv-caption-key caption \
--batch-size=32 --workers=8 \
--model RN50_fusion4 --hugging-face --mlm --crop-scale 0.1 \
--resume /path/to/checkpoint.pt \
--test-2000

Also we provide automatic ways to load model weights from huggingface repo.

Model	URL
PMC_CLIP:beta	https://huggingface.co/datasets/axiong/pmc_oa_beta/blob/main/checkpoint.pt

Take PMC_CLIP:beta checkpoint as an example:

python -m training.main \
--dataset-type "csv" --csv-separator "," --report-to tensorboard \
--val-data="path/to/test.csv" \
--csv-img-key image --csv-caption-key caption \
--batch-size=32 --workers=8 \
--model RN50_fusion4 --hugging-face --mlm --crop-scale 0.1 \
--resume "PMC_CLIP:beta" \
--test-2000

Acknowledgement

The code is based on OpenCLIP and M3AE. We thank the authors for their open-sourced code and encourage users to cite their works when applicable.

Note that our code don't supported tools like horovod, wandb in OpenCLIP. But we keep the code from OpenCLIP for consistency.

Contribution

Please raise an issue if you need help, any contributions are welcomed.

TODO

Compatibility testing on more env settings
Support for horovod, wandb

Cite

@article{lin2023pmc,
  title={PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents},
  author={Lin, Weixiong and Zhao, Ziheng and Zhang, Xiaoman and Wu, Chaoyi and Zhang, Ya and Wang, Yanfeng and Xie, Weidi},
  journal={arXiv preprint arXiv:2303.07240},
  year={2023}
}

The paper has been accepted by MICCAI 2023.

@inproceedings{lin2023pmc,
  title={Pmc-clip: Contrastive language-image pre-training using biomedical documents},
  author={Lin, Weixiong and Zhao, Ziheng and Zhang, Xiaoman and Wu, Chaoyi and Zhang, Ya and Wang, Yanfeng and Xie, Weidi},
  booktitle={MICCAI},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
docs		docs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PMC-CLIP

Usage

1. Create Environment

2. Prepare Dataset

3. Training

4. Evaluation

Acknowledgement

Contribution

TODO

Cite

About

Releases

Packages

Contributors 2

Languages

License

WeixiongLin/PMC-CLIP

Folders and files

Latest commit

History

Repository files navigation

PMC-CLIP

Usage

1. Create Environment

2. Prepare Dataset

3. Training

4. Evaluation

Acknowledgement

Contribution

TODO

Cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages