DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection

Fang, Hao-Shu; Xie, Yichen; Shao, Dian; Lu, Cewu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2010.01005 (cs)

[Submitted on 2 Oct 2020 (v1), last revised 19 Jan 2021 (this version, v2)]

Title:DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection

Authors:Hao-Shu Fang, Yichen Xie, Dian Shao, Cewu Lu

View PDF

Abstract:Recent years, human-object interaction (HOI) detection has achieved impressive advances. However, conventional two-stage methods are usually slow in inference. On the other hand, existing one-stage methods mainly focus on the union regions of interactions, which introduce unnecessary visual information as disturbances to HOI detection. To tackle the problems above, we propose a novel one-stage HOI detection approach DIRV in this paper, based on a new concept called interaction region for the HOI problem. Unlike previous methods, our approach concentrates on the densely sampled interaction regions across different scales for each human-object pair, so as to capture the subtle visual features that is most essential to the interaction. Moreover, in order to compensate for the detection flaws of a single interaction region, we introduce a novel voting strategy that makes full use of those overlapped interaction regions in place of conventional Non-Maximal Suppression (NMS). Extensive experiments on two popular benchmarks: V-COCO and HICO-DET show that our approach outperforms existing state-of-the-arts by a large margin with the highest inference speed and lightest network architecture. We achieved 56.1 mAP on V-COCO without addtional input. Our code is publicly available at: this https URL

Comments:	Paper is accepted. Code available at: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2010.01005 [cs.CV]
	(or arXiv:2010.01005v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2010.01005

Submission history

From: Haoshu Fang [view email]
[v1] Fri, 2 Oct 2020 13:57:58 UTC (32,432 KB)
[v2] Tue, 19 Jan 2021 16:48:22 UTC (32,432 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators