Author: Bangguo Yu, Fengyu Zhou, Ke Chen, Zhiyong Yang
Affiliation: Shandong University
Recently, visual target navigation is of vital importance and non-trivial for autonomous robots to find an object in indoor environments as it is closely related to the cognitive reasoning ability. Classical and learning-based approaches are fundamental components of navigation which have been investigated thoroughly by researchers. However, the performance of methods in unknown scenes is still the core challenge due to the difficulty in learning navigational policy and complicated representation of the scene memory and priors in large indoor scenes. Hence, we propose to use Relational Graph Convolutional Networks for incorporating the 3D knowledge graphs into a novel framework for visual target navigation. In the proposed framework, the semantic map is built as scene memory and the 3D scene graph is encoded as scene priors. Based on the real-time map and explicit scene priors, the navigational policy is learned to sample the goal using deep reinforcement learning in a new task dataset. Experimental results demonstrate that our framework significantly outperforms existing state-of-the-art methods in the target navigation task. We also compare our model with human subjective decisions.