Convolutional Neural Networks (CNNs) are widely used in real world applications, e.g, computer vision. Winograd based convolution are usually applied due to its low computation complexity. For the underling hardware, ARM many-core CPUs, by their price performance, are favored by cloud providers like Amazon Web Services (AWS). However, existing Winograd convolution implementations for ARM architecture are mostly optimized for mobile devices, and usually can not fully utilize hardware resources of many-core processors.
In this paper, we propose HAWC, an optimized half precision floating-point (FP16) Winograd convolution implementation for ARM many-core processors. HAWC employs a series of optimization methods, which are suitable for ARM NEON architecture, and assembles them as a whole solution to improve performance. Our evaluations show that the HAWC achieves on average 10.74× and up to 27.56× speedup on representative convolution layers over state-of-the-art solutions.
Optimizing half precision Winograd convolution on ARM many-core processors
2022
Research areas