Towards accurate and real-time end-of-speech estimation

By Yifeng Fan, Colin Vaz, Di He, Jahn Heymann, Viet Anh Trinh, Zhe Zhang, Venkatesh Ravichandran
2023
Download Copy BibTeX
Copy BibTeX
We introduce a variant of the endpoint (EP) detection problem in automatic speech recognition (ASR), which we call the end-of-speech (EOS) estimation. Given an utterance, EOS estimation aims to identify the timestamp when the utterance waveform has fully decayed and is then used to measure the EP latency. Accurate EOS estimation is difficult in large-scale streaming audio scenarios due to the hefty traffic and hardware limitations. To this end, we develop an efficient and accurate framework by performing force alignment on the 1- best ASR hypothesis. In particular, we propose to use binarized states sequences for alignment, which yields an EOS estimation robust to ASR hypothesis, and the estimation error is reduced by 28% compared to aligning on phoneme states. In addition, we further observe a 30% error reduction by applying the intermediate-stage embeddings of the encoder as additional features to compute the binary probabilities.
Research areas

Latest news

IN, TS, Hyderabad
Welcome to the Worldwide Returns & ReCommerce team (WWR&R) at Amazon.com. WWR&R is an agile, innovative organization dedicated to ‘making zero happen’ to benefit our customers, our company, and the environment. Our goal is to achieve the three zeroes: zero cost of returns, zero waste, and zero defects. We do this by developing products and driving truly innovative operational excellence to help customers keep what they buy, recover returned and damaged product value, keep thousands of tons of waste from landfills, and create the best customer returns experience in the world. We have an eye to the future – we create long-term value at Amazon by focusing not just on the bottom line, but on the planet. We are building the most sustainableRead more