ValUES: A Framework for Systematic Validation of Uncertainty Estimation in Semantic Segmentation

Uncertainty estimation is an essential and heavily-studied component for the reliable application of semantic segmentation methods. While various studies exist claiming methodological advances on the one hand, and successful application on the other hand, the field is currently hampered by a gap between theory and practice leaving fundamental questions unanswered: Can data-related and modelrelated uncertainty really be separated in practice? Which components of an uncertainty method are essential for real-world performance? Which uncertainty method works well for which application? In this work, we link this research gap to a lack of systematic and comprehensive evaluation of uncertainty methods. Specifically, we identify three key pitfalls in current literature and present an evaluation framework that bridges the research gap by providing 1) a controlled environment for studying data ambiguities as well as distribution shifts, 2) systematic ablations of relevant method components, and 3) test-beds for the five predominant uncertainty applications: OoD-detection, active learning, failure detection, calibration, and ambiguity modeling. Empirical results on simulated as well as real-world data demonstrate how the proposed framework is able to answer the predominant questions in the field revealing for instance that 1) separation of uncertainty types works on simulated data but does not necessarily translate to real-world data, 2) aggregation of scores is a crucial but currently neglected component of uncertainty methods, 3) While ensembles are performing most robustly across the different downstream tasks and settings, test-time augmentation often constitutes a light-weight alternative.

Framework for systematic validation of uncertainty methods in segmentation. With our framework, we aim to overcome pitfalls in the current validation of uncertainty methods for semantic segmentation by satisfying the three requirements (R1-R3) for a systematic validation: We explicitly control for aleatoric and epistemic uncertainty in the data and references (R1). We define and validate four individual components C0-C3 of uncertainty methods (R2): First, one or multiple segmentation outputs are generated by the segmentation backbone (C0) and the prediction model (C1). Next, an uncertainty measure is applied (C2) producing an uncertainty heatmap, which can be aggregated using an aggregation strategy (C3). Finally, the real-world capabilities of methods need to be validated on various downstream tasks (R3).

Reproducibility

The following sections contain instructions on how to reproduce our results

Setup

The code is tested with python version 3.10.

Clone this repository
Install the requirements
```
pip install -r requirements.txt
```

Running instructions

Each of the following three folders contains separate README files with instructions how to reproduce the datasets, network training and inference, as well as experiment analysis, respectively:

datasets: This folder contains all the code to set up the datasets. This includes the used toy dataset (toy_data_generation), the lung nodule dataset (lidc-idri) and the GTA5 / Cityscapes dataset (gta_cityscapes).
uncertainty_modeling: This folder contains the main part for training and inference of the various uncertainty methods.
evaluation: This folder contains the code for analysis of the experiments that is done after the inference, e.g. aggregation of uncertaintiy maps, OoD detection performance, AURC etc.

Acknowledgements

Parts of this repository, esp. concerning the U-Net model training in uncertainty_modeling are based on the Pytorch Lightning Segmentation Example.

The code for the HRNet training is based on the Semantic Segmentation Repository.

This work was funded by Helmholtz Imaging (HI), a platform of the Helmholtz Incubator on Information and Data Science. This work is supported by the Helmholtz Association Initiative and Networking Fund under the Helmholtz AI platform grant (ALEGRA (ZT-I-PF-5-121))

This library is developed and maintained by the Interactive Machine Learning Group of Helmholtz Imaging and the DKFZ.

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
datasets		datasets
evaluation		evaluation
figures		figures
test		test
uncertainty_modeling		uncertainty_modeling
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ValUES: A Framework for Systematic Validation of Uncertainty Estimation in Semantic Segmentation

Reproducibility

Setup

Running instructions

Acknowledgements

About

Releases

Packages

Languages

License

IML-DKFZ/values

Folders and files

Latest commit

History

Repository files navigation

ValUES: A Framework for Systematic Validation of Uncertainty Estimation in Semantic Segmentation

Reproducibility

Setup

Running instructions

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages