CNN Inference Accelerator Optimized for AI Applications for Object Detection

Bogeun Jung; Wooyoung Park; Hyungwon Kim

doi:10.23075/jicas.2024.10.3.002

Bogeun Jung Chungbuk National University
Wooyoung Park Chungbuk National University
Hyungwon Kim Chungbuk National University,

DOI: https://doi.org/10.23075/jicas.2024.10.3.002

Keywords: Convolutional Neural Network, inference accelerator, Neural Processing Unit, YOLOv5-nano model

Abstract

The advancement of state-of-the-art technology has dramatically impacted the Deep Neural Networks (DNNs) field, especially Convolutional Neural Networks (CNNs). As the demand for AI applications on mobile devices grows, power-hungry GPUs are no longer viable for mobile AI applications. Instead, there is a growing research trend towards compact and low power Neural Processing Units (NPU) to solve current problems. This article presents an efficient architecture of a CNN inference accelerator that is optimized for AI applications on mobile devices. We propose two architectural enhancements and two optimization methods to improve the existing CNN accelerator [17]. To evaluate our work, we implemented it on a Zynq UltraScale+ MPSoC ZCU 102 Evaluation Board and verified it with the YOLOv5-nano model used for CNN object detection. Experimental results show that we reduced resource utilization by 7.31% for LUTs, 22.29% for FFs, and 3.90% for DSPs. In our vivado simulation, we accelerated the inference time by 33.5%. With the reduced resource usage described above, we have implemented an accelerator with no loss of accuracy but better resource usage and speed.

Author Biographies

Bogeun Jung, Chungbuk National University

Bogeun Jung received his B.S. degree in electronics engineering from Chungbuk National University, Cheongju, South Korea, in 2024. He is currently a Master’s student in the MSIS lab at Chungbuk National University.

His research interests include neural network processor SoC design, CNN optimization, and AI Accelerators.

Wooyoung Park, Chungbuk National University

Wooyoung Park is currently a senior in the Department of electronics engineering at Chungbuk National University, Cheongju, Korea. He is an integrated Bachelor's and Master's course student in the MSIS lab at Chungbuk National University.

His research interests include neural network processor SoC design, CNN optimization, and AI Accelerators.

Hyungwon Kim, Chungbuk National University,

Hyungwon Kim received the B.S.

and M.S. degrees in electrical engineering from the Korea Advanced Institute of Science and

Technology in 1991 and 1993, respectively, and the Ph.D. degree

in electrical engineering and computer science from the University of Michigan, Ann Arbor, MI, USA, in 1999. In 1999, he joined Synopsys Inc., Mountain View, CA, USA, where he developed electronic design automation software. In 2001, he joined Broadcom Inc., San Jose, CA, USA, where he developed various network chips, including a WiFi gateway router chip, a network processor for 3G, and 10 gigabit ethernet chips. In 2005, he founded Xronet, Inc., a Korea-based wireless chip maker, where he managed the company as CEO to successfully develop and commercialize wireless baseband and RF chips and software, including WiMAX chips supporting IEEE802.16e and WiFi chips supporting IEEE802.11a/b/g/n. Since 2013, he has been with Chungbuk National University, Cheongju, South Korea, where he is currently an Associate Professor with the Department of Electronics Engineering. His current research focuses on neural network processor SoCs, CNN optimization, object recognition for autonomous driving, V2X network and security, sensor read-out circuits, touch screen controller SoCs, and wireless sensor networks.

Homepage : https://www.cbnu.msislab.com/