Self-Supervised-Learning-for-Medical-Image-Analysis

Self-Supervised Learning for Medical Image Analysis

About

Collaborators:

Installation

Clone this repository:

git clone https://github.com/KarahanS/Self-Supervised-Learning-for-Medical-Image-Analysis.git

We have used Python 3.10 to obtain our results. Although the requirements should work with higher (and some lower) Python versions, you may want to ensure working Python 3.10 environment for reproducibility. Create a virtual environment and activate it:

python -m venv env
source env/bin/activate # linux
env\Scripts\activate    # windows

Install the required packages.

pip install -r requirements.txt

Usage & Configurations

Self-supervised training (or pretraining) is a technique in machine learning where a model is first trained on a large amount of unlabeled data to learn useful representations or features. Then the encoder part of the model is used in downstream tasks such as classification. This repository provides a framework for self-supervised learning on medical image datasets. The framework is designed to be modular and flexible, allowing for easy experimentation with different self-supervised methods and downstream tasks. Our research is focused on medical image analysis using MedMNIST dataset, but the framework can be used for any image dataset with proper dataloaders.

(See: downstream methods README)

python main.py --cfg-path <configuration path>

Terminology

Supervised pretraining: Training a model on a labeled dataset such as ImageNET.
Self-supervised pretraining: Training a model either from scratch or using a supervised pretrained model on a large amount of unlabeled data. For this phase, the terms training and pretraining are used interchangeably in the literature. We use the term pretraining in this repository.
Downstream task: Training a simple classifier model on a labeled dataset using the representations learned from self-supervised pretraining. The encoder part of the self-supervised model is used to extract features from the labeled dataset, and the classifier is trained on these features.

Choosing running mode

The user is expected to run main.py in either Pretraining or Downstream mode. This is specified in configurations by the inclusion of either the Training.Pretraining or Training.Downstream field. Both cannot be provided at the same time, and one of them must be provided.

The default configuration provided in the repository shows and explains the expected fields. Whenever a required field cannot be found in a given configuration, the default configuration file can provide its predetermined default value. This is except the Training.Pretraining and Training.Downstream fields, which are not retreived if not provided by the user. Their subfields, however, are retreived as normal as long as the user provides them in their configuration.

Datasets

Datasets fetched from MedMNIST and MIMeta will be stored in the directory specified in Dataset.path. Subfolders for each dataset medmnist/ and mimeta/ will be created in given path. The datasets are automatically downloaded in their subdirectories if Dataset.params.download is set to True.

Visualization

If logging is enabled, the log files will be stored under the path specified in Logging.path.
To visualize the logs using WandB, you will be prompted to enter your credentials at the beginning of a run. Then you can examine the logs on the official website of WandB.
To visualize the logs using Tensorboard, you can run the following command on a separate terminal (assuming you are in the same folder as main.py):

tensorboard --logdir=src/ssl/simclr/tensorboard

Then you can have a look at the logs on https://localhost:6006/.

Adding a new configuration field

New fields can be introduced in configurations and retrieved in code without having to modify anything else. However, for practicality, the following should be done:

The default value for the new field should be provided in the default configuration.
If not covered by existing code, retrieval from the default should be handled in Config.__init__ before _sanitize_cfg is called.
The given values should be checked with assertions in Config._sanitize_cfg. Paths should end with the delimiter /.
The desired data type of new values should be casted in Config._cast_cfg.
If new paths are added, they may be included in src.utils.constants.py, and initialized in configure_paths from src/utils/setup.py.

Name		Name	Last commit message	Last commit date
Latest commit History 170 Commits
scripts		scripts
src		src
zoo		zoo
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
delete_downstream_vits.sh		delete_downstream_vits.sh
delete_last_checkpoints.sh		delete_last_checkpoints.sh
eda.ipynb		eda.ipynb
id_downstream.sh		id_downstream.sh
main.py		main.py
main_knn.py		main_knn.py
main_linear.py		main_linear.py
main_solo.py		main_solo.py
process_downstream_runs.py		process_downstream_runs.py
requirements.txt		requirements.txt
run_knn.sh		run_knn.sh
run_linear.sh		run_linear.sh
run_method_pretraining.sh		run_method_pretraining.sh
test.sh		test.sh
validate_method_pretraining.sh		validate_method_pretraining.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Supervised-Learning-for-Medical-Image-Analysis

About

Collaborators:

Installation

Usage & Configurations

Terminology

Choosing running mode

Datasets

Visualization

Adding a new configuration field

About

Releases

Packages

Contributors 5

Languages

KarahanS/Self-Supervised-Learning-for-Medical-Image-Analysis

Folders and files

Latest commit

History

Repository files navigation

Self-Supervised-Learning-for-Medical-Image-Analysis

About

Collaborators:

Installation

Usage & Configurations

Terminology

Choosing running mode

Datasets

Visualization

Adding a new configuration field

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages