Herring: Rethinking the parameter server at scale for the cloud

Indu Thangakrishnan; Derya Cavdar; Can Karakus; Piyush Ghai; Yauheni Selivonchyk; Cory Pruce

Publication

Herring: Rethinking the parameter server at scale for the cloud

By Indu Thangakrishnan, Derya Cavdar, Can Karakus, Piyush Ghai, Yauheni Selivonchyk, Cory Pruce

2020

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Training large deep neural networks is time consuming and may take days or even weeks to complete. Although parameter-server-based approaches were initially popular in distributed training, scalability issues led the field to move towards all-reduce-based approaches. Recent developments in cloud networking technologies, however, such as the Elastic Fabric Adapter (EFA) and Scalable Reliable Datagram (SRD), motivate a re-thinking of the parameter-server approach to address its fundamental inefficiencies. To this end, we introduce a novel communication library, Herring, which is designed to alleviate the performance bottlenecks in parameter-server-based training. We show that gradient reduction with Herring is twice as fast as all-reduce-based methods. We further demonstrate that training deep learning models like BERTlarge using Herring outperforms all-reduce-based training, achieving 85% scaling efficiency on large clusters with up to 2048 NVIDIA V100 GPUs without accuracy drop.

Herring: Rethinking the parameter server at scale for the cloud

Latest news

Work with us