MLMI VQA model

Explaining Medical Image Classifiers with Visual Question Answering Models:
a Visual Question Answering (VQA) Model trained on medical data

Description

Deep learning has shown promising potential for Medical Image Classification and Diagnosing. But added to the limitations of annotated training data in the medical domain, explanations for the models's predictions are also desired in this field of application.

Using Flamingo, a Visual Language Model for Few-Shot Learning, we leverage big pre-trained language models and vision encoders to build a new VQA model that can answers question for Xray images.

You can find out available pre-trained Models under the following link

Requirements

Python (Python >= 3.8)
Conda (Virtual Environment)

Datasets

For Backbone training used datasets are as follows:

Roco
[Mimic-CXR] (https://physionet.org/content/mimic-cxr/2.0.0/)

For Medical VQA

[Imageclef2019] You can download from here(https://github.com/Rodger-Huang/SYSU-HCP-at-ImageCLEF-VQA-Med-2021)
[VQA-RAD] (https://osf.io/89kps/)

Model Architecture

Using Flamingo's architecture elements, we built a model capable of taking an Xray image and any question as inputs in order to generate an answer to the asked question.

A simplified overview of our model architectures is given in the following figures :

Without Classification Head:

With Classification Head:

Training and Testing

Backbone Flamingo Training MIMIC-CXR

Hardware: 1 A40 GPU, 200 epochs with early stop on val loss (at around 140 each experiment)
Learning Rate (LR): 1e-4
LR Warmup: 863 Steps
Loss: Cross Entropy Loss

Fine-tune on VQA-RAD

Check out flamingo_clip_gpt2_vqa_rad_run.py

Hardware: 1 A40 GPU, 80 epochs with early stop on val loss (at around 40 each experiment)
Duration: ~30 mins
LR: 1e-5
LR Warmup: 30 Steps
Loss: Cross Entropy Loss
Testing: check out vqaRAD_flamingo_clip_gpt2_infer.ipynb:
- On identical answers (GT answer: “no”, predicted answer: “no” -> true positive)
- On embeddings: Used tokens before the last linear layer for GT and predicted answer → Cosine Similarity

Fine-tune on ImageCLEF

Check out flamingo_clip_gpt2_imageclef_run.py:

Hardware: 1 A40 GPU, 200 epochs with early stop on val loss (at around 110 each experiment)
Duration: ~3 hours
LR: 1e-4
LR Warmup: 30 Steps
Loss: Cross Entropy Loss
Testing: check out Imageclef_flamingo_clip_gpt2_playground.ipynb:
- Identical Answer on identical answers (Ground Truth answer: “no”, predicted answer: “no” -> true positive)
- Classification Accuracy
- Evaluation: Accuracy, BLEU score

Getting started

To make it easy for you to get started with our model, here's a list of recommended next steps:

Clone this repository into a local folder.

cd local/path
git clone https://gitlab.lrz.de/CAMP_IFL/diva/mlmi-vqa

Setup the python virtual environement using conda.

conda env create -f environment.yml
conda activate mlmi

Check the playground notebooks for usage examples

Demo and Deploy

You can check and try out our model in our demo page using the QR code To run the demo check demo_imageclef.ipynb

Future Work

Domain Specific Language Decoder
Domain Specific Tokenizer
Decoder with similar number of parameters to Chinchilla Language Family
Optimize current approach
Qualitative evaluation and comparison with other works
Visualization of Attention Maps

Contributing

At the moment we are still closed for contributions.

Acknowledgments

Authors: Fabian Scherer - Andrei Mancu - Alaeddine Mellouli - Çağhan Köksal

We thank the MLMI team and both Matthias Keicher and Kamilia Zaripova for their help and support.

License

Private Repository until further development.

Name		Name	Last commit message	Last commit date
Latest commit History 221 Commits
configs		configs
data		data
docs		docs
models		models
notebooks		notebooks
reports/figures		reports/figures
src		src
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
removeckpts.ipynb		removeckpts.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLMI VQA model

Description

Requirements

Table Of Contents

Datasets

Model Architecture

Without Classification Head:

With Classification Head:

Training and Testing

Backbone Flamingo Training MIMIC-CXR

Fine-tune on VQA-RAD

Fine-tune on ImageCLEF

Getting started

Demo and Deploy

Future Work

Contributing

Acknowledgments

License

About

Releases

Packages

Languages

caghankoksal/vqa-med

Folders and files

Latest commit

History

Repository files navigation

MLMI VQA model

Description

Requirements

Table Of Contents

Datasets

Model Architecture

Without Classification Head:

With Classification Head:

Training and Testing

Backbone Flamingo Training MIMIC-CXR

Fine-tune on VQA-RAD

Fine-tune on ImageCLEF

Getting started

Demo and Deploy

Future Work

Contributing

Acknowledgments

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages