Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

Chen, Yunpeng; Fan, Haoqi; Xu, Bing; Yan, Zhicheng; Kalantidis, Yannis; Rohrbach, Marcus; Yan, Shuicheng; Feng, Jiashi

Computer Science > Computer Vision and Pattern Recognition

arXiv:1904.05049 (cs)

[Submitted on 10 Apr 2019 (v1), last revised 18 Aug 2019 (this version, v3)]

Title:Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

Authors:Yunpeng Chen, Haoqi Fan, Bing Xu, Zhicheng Yan, Yannis Kalantidis, Marcus Rohrbach, Shuicheng Yan, Jiashi Feng

View PDF

Abstract:In natural images, information is conveyed at different frequencies where higher frequencies are usually encoded with fine details and lower frequencies are usually encoded with global structures. Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies. In this work, we propose to factorize the mixed feature maps by their frequencies, and design a novel Octave Convolution (OctConv) operation to store and process feature maps that vary spatially "slower" at a lower spatial resolution reducing both memory and computation cost. Unlike existing multi-scale methods, OctConv is formulated as a single, generic, plug-and-play convolutional unit that can be used as a direct replacement of (vanilla) convolutions without any adjustments in the network architecture. It is also orthogonal and complementary to methods that suggest better topologies or reduce channel-wise redundancy like group or depth-wise convolutions. We experimentally show that by simply replacing convolutions with OctConv, we can consistently boost accuracy for both image and video recognition tasks, while reducing memory and computational cost. An OctConv-equipped ResNet-152 can achieve 82.9% top-1 classification accuracy on ImageNet with merely 22.2 GFLOPs.

Comments:	Accepted to ICCV 2019
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1904.05049 [cs.CV]
	(or arXiv:1904.05049v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1904.05049

Submission history

From: Yunpeng Chen [view email]
[v1] Wed, 10 Apr 2019 08:15:00 UTC (387 KB)
[v2] Tue, 30 Apr 2019 11:25:14 UTC (392 KB)
[v3] Sun, 18 Aug 2019 08:21:46 UTC (511 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators