-
ACL 20232023Large language models trained on code have shown great potential to increase productivity of software developers. Several execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems. Nevertheless, it is expensive to perform the same evaluation on complex real-world projects considering the execution cost. On the contrary, static
-
WACV 2023 Workshop on Pretraining Large Vision and Multimodal Models2023Scaling up weakly-supervised datasets has shown to be highly effective in the image-text domain and has contributed to most of the recent state-of-the-art computer vision and multimodal neural networks. However, existing large-scale video-text datasets and mining techniques suffer from several limitations, such as the scarcity of aligned data, the lack of diversity in the data, and the difficulty of collecting
-
Interspeech 20232023Neural transducer ASR models achieve state of the art accuracy on many tasks, however rare word recognition poses a particular challenge as models often fail to recognise words that occur rarely, or not at all, in the training data. Methods of contextual biasing, where models are dynamically adapted to bias their outputs towards a given list of relevant words and phrases, have been shown to be effective
-
Interspeech 20232023Conformer is an extension of transformer-based neural ASR models whose fundamental component is the self-attention module. In this paper, we show that we can remove the self-attention module from Conformer and achieve the same or even better recognition performance for utterances whose length is up to around 10 seconds. This is particularly important for streaming interactive voice assistants as input is
-
Interspeech 20232023Contextual biasing (CB) is an effective approach for contextualising hidden features of neural transducer ASR models to improve rare word recognition. CB relies on relatively large quantities of relevant human annotated natural speech during training, limiting its effectiveness in low-resource scenarios. In this work, we propose a novel approach that reduces the reliance on real speech by using synthesised
Related content
-
October 02, 2020Scientist leads team in London focused on improving voice-shopping experiences with Alexa.
-
September 28, 2020Hear Tur discuss his experience from his work on DARPA programs, how he’s seen the field of conversational AI evolve, and more.
-
September 24, 2020A combination of audio and visual signals guide the device’s movement, so the screen is always in view.
-
September 24, 2020Adjusting prosody and speaking style to conversational context is a first step toward “concept-to-speech”.
-
September 24, 2020Natural turn-taking uses multiple cues — acoustic, linguistic, and visual — to help Alexa interact more naturally, without the need to repeat the wake word.
-
September 24, 2020Deep learning and reasoning enable customers to explicitly teach Alexa how to interpret their novel requests.