CNN Inference Accelerator Optimized for AI Applications for Object Detection
Abstract
The advancement of state-of-the-art technology has dramatically impacted the Deep Neural Networks (DNNs) field, especially Convolutional Neural Networks (CNNs). As the demand for AI applications on mobile devices grows, power-hungry GPUs are no longer viable for mobile AI applications. Instead, there is a growing research trend towards compact and low power Neural Processing Units (NPU) to solve current problems. This article presents an efficient architecture of a CNN inference accelerator that is optimized for AI applications on mobile devices. We propose two architectural enhancements and two optimization methods to improve the existing CNN accelerator [17]. To evaluate our work, we implemented it on a Zynq UltraScale+ MPSoC ZCU 102 Evaluation Board and verified it with the YOLOv5-nano model used for CNN object detection. Experimental results show that we reduced resource utilization by 7.31% for LUTs, 22.29% for FFs, and 3.90% for DSPs. In our vivado simulation, we accelerated the inference time by 33.5%. With the reduced resource usage described above, we have implemented an accelerator with no loss of accuracy but better resource usage and speed.