ProxQuant: Quantized neural networks via proximal operators
2018
Deep neural networks are often desired in environments with limited memory and computational power (such as mobile devices), where it is beneficial to perform model quantization – training networks with low-precision weights. A key mechanism commonly used in training quantized nets is the straight-through gradient method, which enables back-propagation through the quantization mapping. Despite its success, little is understood why the straight-through gradient method works, especially in low-bit scenarios such as training binary networks. We propose an alternative approach, ProxQuant, that formulates quantized network training as a regularized learning problem instead and optimizes via the prox-gradient method. ProxQuant does back-propagation on the underlying full-precision vector and applies an efficient prox operator in between stochastic gradient steps to encourage quantizedness. For quantizing ResNets and LSTMs, ProxQuant outperforms state-of-the-art results on binary quantization and is on par with state-of-the-art on multi-bit quantization. We further show that ProxQuant suffers from less optimization instability in the binary case through a sign change experiment. Our results challenge the indispensability of the straight-through gradient method and demonstrate that ProxQuant is a powerful alternative
Research areas