Publication

Visual Search Target Inference Using Bag of Deep Visual Words

Sven Stauden; Michael Barz; Daniel Sonntag

In: Frank Trollmann; Anni-Yasmin Turhan (Hrsg.). KI 2018: Advances in Artificial Intelligence - 41st German Conference on AI. German Conference on Artificial Intelligence (KI-2018), September 24-28, Berlin, Germany, Springer, 8/2018.

Abstract

Visual Search target inference subsumes methods for predicting the target object through eye tracking. A person intents to find an object in a visual scene which we predict based on the fixation behavior. Knowing about the search target can improve intelligent user interaction. In this work, we implement a new feature encoding, the Bag of Deep Visual Words, for search target inference using a pre-trained convolutional neural network (CNN). Our work is based on a recent approach from the literature that uses Bag of Visual Words, common in computer vision applications. We evaluate our method using a gold standard dataset. The results show that our new feature encoding outperforms the baseline from the literature, in particular, when excluding fixations on the target.

2018_Visual_Search_Target_Inference_Using_Bag_of_Deep_Visual_Words.pdf (pdf, 2 MB )