Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

Reed, Colorado J.; Gupta, Ritwik; Li, Shufan; Brockman, Sarah; Funk, Christopher; Clipp, Brian; Keutzer, Kurt; Candido, Salvatore; Uyttendaele, Matt; Darrell, Trevor

Computer Science > Computer Vision and Pattern Recognition

arXiv:2212.14532v3 (cs)

[Submitted on 30 Dec 2022 (v1), revised 6 Apr 2023 (this version, v3), latest version 22 Sep 2023 (v4)]

Title:Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

Authors:Colorado J. Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, Trevor Darrell

View PDF

Abstract:Large, pretrained models are commonly finetuned with imagery that is heavily augmented to mimic different conditions and scales, with the resulting models used for various tasks with imagery from a range of spatial scales. Such models overlook scale-specific information in the data for scale-dependent domains, such as remote sensing. In this paper, we present Scale-MAE, a pretraining method that explicitly learns relationships between data at different, known scales throughout the pretraining process. Scale-MAE pretrains a network by masking an input image at a known input scale, where the area of the Earth covered by the image determines the scale of the ViT positional encoding, not the image resolution. Scale-MAE encodes the masked image with a standard ViT backbone, and then decodes the masked image through a bandpass filter to reconstruct low/high frequency images at lower/higher scales. We find that tasking the network with reconstructing both low/high frequency images leads to robust multiscale representations for remote sensing imagery. Scale-MAE achieves an average of a $2.4 - 5.6\%$ non-parametric kNN classification improvement across eight remote sensing datasets compared to current state-of-the-art and obtains a $0.9$ mIoU to $1.7$ mIoU improvement on the SpaceNet building segmentation transfer task for a range of evaluation scales.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2212.14532 [cs.CV]
	(or arXiv:2212.14532v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2212.14532

Submission history

From: Ritwik Gupta [view email]
[v1] Fri, 30 Dec 2022 03:15:34 UTC (2,185 KB)
[v2] Mon, 2 Jan 2023 23:50:15 UTC (2,185 KB)
[v3] Thu, 6 Apr 2023 10:15:47 UTC (1,341 KB)
[v4] Fri, 22 Sep 2023 02:34:12 UTC (1,977 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators