recently accepted to MICCAI 2024.
This work employs self-supervised contrastive learning combined with a polyp-aware masked reconstruction task to learn a universal representation of polyps for image retrieval.
We provide trained models from our original experiments to allow others to reproduce our evaluation results (https://).
Install and activate conda, then create a conda environment for EndoFinder as follows:
# Create conda environment
conda create --name EndoFinder -c pytorch -c conda-forge \
pytorch torchvision cudatoolkit=11.3 \
"pytorch-lightning>=1.5,<1.6" lightning-bolts \
faiss python-magic pandas numpy
# Activate environment
conda activate EndoFinder
# Install Classy Vision and AugLy from PIP:
python -m pip install classy_vision augly
# Create environment
python3 -m virtualenv ./venv
# Activate environment
source ./venv/bin/activate
# Install dependencies in this environment
python -m pip install -r ./requirements.txt --extra-index-url https://download.pytorch.org/whl/cu113
This section describes how to use pretrained EndoFinder models for inference.
We recommend preprocessing images for inference either resizing the small edge to 224 or resizing the image to a square tensor.
from torchvision import transforms
normalize = transforms.Normalize(
mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225],
)
small_224 = transforms.Compose([
transforms.Resize(224),
transforms.ToTensor(),
normalize,
])
import torch
from EndoFinder.tools.img2jsonl import LoadModelSetting
from PIL import Image
model = LoadModelSetting.get_model(LoadModelSetting['vit_large_patch16'], "/path/to/EndoFinder.pth", use_hash=False)
img = Image.open("/path/to/image.png").convert('RGB')
batch = small_224(img).unsqueeze(0)
embedding = model(batch)[0, :]
To load model weight files, first construct the Model
object,
then load the weights using the standard torch.load
and load_state_dict
methods.
import torch
from EndoFinder.models import models_vit
model = models_vit.__dict__["vit_large_patch16"](use_hash=False)
checkpoint = torch.load("/path/to/EndoFinder.pth", map_location='cpu')
checkpoint_model = checkpoint['model']
model.load_state_dict(checkpoint_model, strict=False)
model.eval()
To reproduce evaluation results, see Evaluation.
For information on how to train EndoFinder models, see Training.
The code in this article was inspired by and references the following articles. Their code has been immensely helpful to me.
Masked Autoencoders Are Scalable Vision Learners
A Self-Supervised Descriptor for Image Copy Detection
If you find our codebase useful, please consider giving a star ⭐ and cite as:
@Article{EndoFinder2024,
title={EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis},
author={Ruijie Yang, Yan Zhu, Peiyao Fu, Yizhe Zhang, Zhihua Wang, Quanlin Li, Pinghong Zhou, Xian Yang, Shuo Wang},
journal={MICCAI},
year={2024}
}