Optimizing half precision Winograd convolution on ARM many-core processors

Dedong Xie; Zhen Jia; Zili Zhang; Xin Jin

Publication

Optimizing half precision Winograd convolution on ARM many-core processors

By Dedong Xie, Zhen Jia, Zili Zhang, Xin Jin

2022

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Convolutional Neural Networks (CNNs) are widely used in real world applications, e.g, computer vision. Winograd based convolution are usually applied due to its low computation complexity. For the underling hardware, ARM many-core CPUs, by their price performance, are favored by cloud providers like Amazon Web Services (AWS). However, existing Winograd convolution implementations for ARM architecture are mostly optimized for mobile devices, and usually can not fully utilize hardware resources of many-core processors.

In this paper, we propose HAWC, an optimized half precision floating-point (FP16) Winograd convolution implementation for ARM many-core processors. HAWC employs a series of optimization methods, which are suitable for ARM NEON architecture, and assembles them as a whole solution to improve performance. Our evaluations show that the HAWC achieves on average 10.74× and up to 27.56× speedup on representative convolution layers over state-of-the-art solutions.

Optimizing half precision Winograd convolution on ARM many-core processors

Latest news

Work with us