Learning when to trust which teacher for weakly supervised ASR

Aakriti Agrawal; Milind Rao; Anit Kumar Sahu; Gopinath (Nath) Chennupati; Andreas Stolcke

Publication

Learning when to trust which teacher for weakly supervised ASR

By Aakriti Agrawal, Milind Rao, Anit Kumar Sahu, Gopinath (Nath) Chennupati, Andreas Stolcke

2023

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Automatic speech recognition (ASR) training can utilize multiple experts as teacher models, each trained on a specific domain or accent. Teacher models may be opaque in nature since their architecture may be not be known or their training cadence is different from that of the student ASR model. Still, the student models are updated incrementally using the pseudo-labels generated independently by the expert teachers. In this paper, we exploit supervision from multiple domain experts in training student ASR models. This training strategy is especially useful in scenarios where few or no human transcriptions are available. To that end, we propose a Smart-Weighter mechanism that selects an appropriate expert based on the input audio, and then trains the student model in an unsupervised setting. We show the efficacy of our approach using LibriSpeech and LibriLight benchmarks and find an improvement of 4 to 25% over baselines that uniformly weight all the experts, use a single expert model, or combine experts using ROVER.

Learning when to trust which teacher for weakly supervised ASR

Latest news

Work with us