Automatic Judgement of Neural Network-Generated Image Captions

Rajarshi Biswas, Aditya Mogadala, Michael Barz, Daniel Sonntag, Dietrich Klakow

In: Carlos Martin-Vide, Matthew Purver, Senja Pollak (Hrsg.). Statistical Language and Speech Processing - 7th International Conference, Proceedings. International Conference on Statistical Language and Speech Processing (SLSP-2019) 7th befindet sich SLSP is a yearly conference series aimed at promoting and displaying excellent research on the wide spectrum of statistical methods that are currently in use in computational language or speech processing. October 14-16 Ljubljana Slovenia Seiten 261-272 Lecture Notes in Computer Science / Lecture Notes in Artificial Intelligence (LNCS) 11816 ISBN 978-3-030-31372-2 Springer Heidelberg 9/2019.


Manual evaluation of individual results of natural language generation tasks is one of the bottlenecks. It is very time consuming and expensive if it is, for example, crowdsourced. In this work, we address this problem for the specific task of automatic image captioning. We automatically generate human-like judgements on grammatical correctness, image relevance and diversity of the captions obtained from a neural image caption generator. For this purpose, we use pool-based active learning with uncertainty sampling and represent the captions using fixed size vectors from Google’s Universal Sentence Encoder. In addition, we test common metrics, such as BLEU, ROUGE, METEOR, Levenshtein distance, and n-gram counts and report F1 score for the classifiers used under the active learning scheme for this task. To the best of our knowledge, our work is the first in this direction and promises to reduce time, cost, and human effort.


Weitere Links

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence