Skip to main content Skip to main navigation

Publikation

Synthetic Data and Active Learning for Efficient Object Detection

Hooman Tavakoli Ghinani; Nimesh Singh; Tatjana Legler; Achim Wagner; Martin Ruskowski
In: Janis Grabis; Yves Wautelet (Hrsg.). Advanced Information Systems Engineering Workshops - CAiSE 2025 Workshops - Proceedings. Workshop on Generation of Synthetic Datasets for Information Systems (GenSyn-2025), located at CaiSE 2025, June 16, Vienna, Austria, Pages 338-350, Lecture Notes in Business Information Processing (LNBIP), Vol. 556, ISBN 978-3-031-94931-9, Springer Nature, Cham, Switzerland, 2025.

Zusammenfassung

With advancements in data availability and computational power, AI adoption has surged across science and technology. Vision-based methodologies have expanded, particularly in industrial applications, from assembly lines to human-robot interaction. Synthetic data generation in controlled environments enables dataset creation and mitigates challenges like labor-intensive labeling. While synthetic datasets are crucial for object detection training, the domain gap remains a key challenge. This paper explores a two-phase training strategy: first, pretraining with synthetic data, followed by fine-tuning using Active Learning (AL) with uncertainty sampling. We extensively evaluate this approach using well-established benchmarks and YOLOv11 as the detection framework. Additionally, we introduce an industrial Truck dataset, featuring CAD-generated and 3D-printed components of a Truck and Glue-gun. Our findings show that combining synthetic data with AL significantly reduces real data requirements while achieving superior precision, especially when real data is limited.