Publication
Learning to Fuse: A Deep Learning Approach to Visual-Inertial Camera Pose Estimation
Jason Raphael Rambach; Aditya Tewari; Alain Pagani; Didier Stricker
In: Proceedings of IEEE International Symposium on Mixed and Augmented Reality |. IEEE International Symposium on Mixed and Augmented Reality (ISMAR-2016), September 19-23, Merida, Mexico, IEEE, 9/2016.
Abstract
Camera pose estimation is the cornerstone of Augmented Reality
applications. Pose tracking based on camera images exclusively has
been shown to be sensitive to motion blur, occlusions, and illumination
changes. Thus, a lot of work has been conducted over the last
years on visual-inertial pose tracking using acceleration and angular
velocity measurements from inertial sensors in order to improve
the visual tracking. Most proposed systems use statistical filtering
techniques to approach the sensor fusion problem, that require
complex system modelling and calibrations in order to perform adequately.
In this work we present a novel approach to sensor fusion
using a deep learning method to learn the relation between camera
poses and inertial sensor measurements. A long short-term memory
model (LSTM) is trained to provide an estimate of the current pose
based on previous poses and inertial measurements. This estimate
is then appropriately combined with the output of a visual tracking
system using a linear Kalman Filter to provide a robust final
pose estimate. Our experimental results confirm the applicability
and tracking performance improvement gained from the proposed
sensor fusion system.