MedCLIP

MedCLIP is a medical image captioning Deep Learning neural network based on the OpenAI CLIP architecture.

Table of Contents

Usage
Introduction
Explanation
- Loss
Tools Used
Future Work
Members and Acknowledgements
Achieved X by doing Y as measured by Z

Usage

Run main.ipynb on a Colab instance. Weights for the model are provided, so you don’t need to train again.

Introduction

CLIP is a beautiful hashing process.

Through encodings and transformations, CLIP learns relationships between natural language and images. The underlying model allows for either captioning of an image from a set of known captions, or searching an image from a given caption. With appropriate encoders, the CLIP model can be optimised for certain domain-specific applications. Our hope with MedCLIP is to help the radiologist emit diagnoses.

Explanation

CLIP works by encoding an image and a related caption into tensors. The model then optimises the last layer of the (transfer learnable) encoders to make both image and text encodings as similar as possible. (1. Contrastive Pretraining)

After the model is successfully trained, we can query it with new information. (2. Zero shot)

Take an input
Encode with the custom trained encoders
Find a match (image or text) from the known data set.
1. Go through each entry of the data set
2. Check similarity with the current input
3. Output the pairs resulting in most similarity
[Optionally] Measure the similarity between the real caption, and the guessed one.

Loss

The perfect relationship between encoded images and captions is described by their encoded representations being the same. This similarity can easily be measured by looking at the softmax between the dot product of the encoded inputs; a perfect encoding will yield the identity matrix.

Tools Used

The model was trained using a curated MedPix dataset that focuses on Magnetic Resonance, Computer Tomography and X-Ray scans. ClinicalBERT was used to encode the text and ResNet50 was used for the images.

Similarity between captions was measured using Rouge, Bleu, Meteor and Cider.

Future Work

Add new datasets; the more datasets the model has, the better the captioning performance (bigger space from where to choose a caption/image).

Some relevant datasets:

IU Chest X-Ray
ChestX-Ray 14
PEIR gross
BCIDR
CheXpert
MIMIC-CXR
PadChest
ICLEF caption
- Generate new captions instead of just looking them up. This will vastly improve accuracy.

Members and Acknowledgements

Repo Owner
Jose Javier Tlacuilo
Jorge Allan Gomez Mercado
Luis Soenksen

Achieved X by doing Y as measured by Z

Implemented a medical image captioning Deep Learning model by using the CLIP model, ResNet50 and ClinicalBERT. We obtained a 61% Rouge similarity rate on our implementation with the MedPix Dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
imagedir		imagedir
.gitattributes		.gitattributes
README.adoc		README.adoc
dataset.csv		dataset.csv
main.ipynb		main.ipynb
paper.pdf		paper.pdf
weights.pt		weights.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MedCLIP

Usage

Introduction

Explanation

Loss

Tools Used

Future Work

Members and Acknowledgements

Achieved X by doing Y as measured by Z

About

Languages

Mauville/MedCLIP

Folders and files

Latest commit

History

Repository files navigation

MedCLIP

Usage

Introduction

Explanation

Loss

Tools Used

Future Work

Members and Acknowledgements

Achieved X by doing Y as measured by Z

About

Topics

Resources

Stars

Watchers

Forks

Languages