Skip to main content Skip to main navigation

Publication

Optimization strategies for neural network deployment on FPGA: An energy-efficient real-time face detection use case

Mhd Rashed Al Koutayni; Gerd Reis; Didier Stricker
In: Internet of Things, Vol. 33, Pages 1-16, Elsevier, 9/2025.

Abstract

Field programmable gate arrays (FPGAs) are considered promising platforms for accelerating deep neural networks (DNNs) due to their parallel processing capabilities and energy efficiency. However, Deploying DNNs on FPGA platforms for computer vision tasks presents unique challenges, such as limited computational resources, constrained power budgets, and the need for real-time performance. This work presents a set of optimization methodologies to enhance the efficiency of real-time DNN inference on FPGA system-on-a-chip (SoC) platforms. These optimizations include architectural modifications, fixed-point quantization, computation reordering, and parallelization. Additionally, hardware/software partitioning is employed to optimize task allocation between the processing system (PS) and programmable logic (PL), along with system integration and interface configuration. To validate these strategies, we apply them to a baseline face detection DNN (FaceBoxes) as a use case. The proposed techniques not only improve the efficiency of FaceBoxes on FPGA but also provide a roadmap for optimizing other DNN-based applications for resource-constrained platforms. Experimental results on the AMD Xilinx ZCU102 board with VGA resolution (480 × 640 × 3) input demonstrate a significant increase in efficiency, achieving real-time performance while substantially reducing dynamic energy consumption.

Projects

More links