Search - Amazon Science

18,308 results found

Sort

Continual learning for multi-dialect acoustic models

Brady Houston, Katrin Kirchhoff

Interspeech 2020

2020

Using data from multiple dialects has shown promise in improving neural network acoustic models. While such training can improve the performance of an acoustic model on a single dialect, it can also produce a model capable of good performance on multiple dialects. However, training an acoustic model on pooled data from multiple dialects takes a significant amount of time and computing resources, and it

Machine learning
Dynamic prosody generation for speech synthesis using linguistics-driven acoustic embedding selection

Shubhi Tyagi, Marco Nicolis, Jonas Rohnke, Thomas Drugman, Jaime Lorenzo Trueba

Interspeech 2020

2020

Recent advances in Text-to-Speech (TTS) have improved quality and naturalness to near-human capabilities. But something which is still lacking in order to achieve human-like communication is the dynamic variations and adaptability of human speech in more complex scenarios. This work attempts to solve the problem of achieving a more dynamic and natural intonation in TTS systems, particularly for stylistic

Related: More-natural prosody for synthesized speech

Conversational AI
Incremental few-shot meta-learning via indirect discriminant alignment

Qing Liu, Orchid Majumder, Alessandro Achille, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto

ECCV 2020

2020

We propose a method to train a model so it can learn new classification tasks while improving with each task solved. This amounts to combining meta-learning with incremental learning. Different tasks can have disjoint classes, so one cannot directly align different classifiers as done in model distillation. On the other hand, simply aligning features shared by all classes does not allow the base model sufficient

Computer vision
Can you read me now? Content aware rectification using angle supervision

Amir Markovitz, Inbal Lavi, Or Perel, Shai Mazor, Roee Litman

ECCV 2020

2020

The ubiquity of smartphone cameras has led to more and more documents being captured by cameras rather than scanned. Unlike flatbed scanners, photographed documents are often folded and crumpled, resulting in large local variance in text structure. The problem of document rectification is fundamental to the Optical Character Recognition (OCR) process on documents, and its ability to overcome geometric distortions

Computer vision
European Conference on Computer Vision (ECCV)
Building product graphs automatically

Xin Luna Dong

August 13, 2020

Automated system tripled the number of facts in a product graph.

Information and knowledge management
Amazon's Machine Learning University is making its online courses available to the public

Douglas Gantenbein

August 12, 2020

Classes previously only available to Amazon employees will now be available to the community.

Machine learning
Interspeech

Interspeech is the world's largest and most comprehensive conference on the science and technology of spoken language processing.
How Marinus Analytics uses knowledge graphs powered by Amazon Neptune to combat human trafficking

Staff writer

August 11, 2020

Traffic Jam leverages machine learning technologies from Amazon Web Services to find patterns in ads posted by sexual traffickers on the internet every day.

Cloud and systems
Human Trafficking
Teaching computers to recognize humor

David Carmel

August 10, 2020

Detecting comic product-related questions could improve customer engagement and Amazon recommendations.

Conversational AI
Yaroslav Nechaev

Applied Scientist
Towards an ASR error robust spoken language understanding system

Weitong Ruan, Yaroslav Nechaev, Luoxin Chen, Chengwei Su, Imre Kiss

Interspeech 2020

2020

A modern Spoken Language Understanding (SLU) system usually contains two sub-systems, Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU),where ASR transforms voice signal to text form and NLU provides intent classiﬁcation and slot ﬁlling from the text. In practice,such decoupled ASR/NLU design facilitates fast model iteration for both components. However, this makes downstream NLU

Conversational AI
Arnaud Joly

Applied Scientist
CopyCat: Many-to-many fine-grained prosody transfer for neural text-to-speech

Sri Karlapati, Alexis Moinet, Arnaud Joly, Viacheslav Klimkov, Daniel Sáez-Trigueros, Thomas Drugman

Interspeech 2020

2020

Prosody Transfer (PT) is a technique that aims to use the prosody from a source audio as a reference while synthesizing speech. Fine-grained PT aims at capturing prosodic aspects like rhythm, emphasis, melody, duration, and loudness, from a source audio at a very granular level and transferring them when synthesizing speech in a different target speaker’s voice. Cur-rent approaches for fine-grained PT suffer

Related: More-natural prosody for synthesized speech

Conversational AI
Sri Karlapati

Applied Scientist
Improved training strategies for end-to-end speech recognition in digital voice assistants

Hitesh Tulsiani, Ashtosh Sapru, Harish Arsikere, Surabhi Punjabi, Sri Garimella

Interspeech 2020

2020

The speech recognition training data corresponding to digital voice assistants is dominated by wake-words. Training endto-end (E2E) speech recognition models without careful attention to such data results in sub-optimal performance as models prioritize learning wake-words. To address this problem, we propose a novel discriminative initialization strategy by introducing a regularization term to penalize

Conversational AI
Metadata-aware end-to-end keyword spotting

Hongyi Liu, Apurva Abhyankar, Yuriy Mishchenko, Thibaud Sénéchal, Gengshen Fu, Brian Kulis, Noah Stein, Anish Shah, Shiv Naga Prasad Vitaladevuni

Interspeech 2020

2020

As a crucial part of Alexa products, our on-device keyword spotting system detects the wakeword in conversation and initiates subsequent user-device interactions. Convolutional neural networks (CNNs) have been widely used to model the relationship between time and frequency in the audio spectrum. However, it is not obvious how to appropriately leverage the rich descriptive information from device state

Related: Amazon Alexa’s new wake word research at Interspeech

Conversational AI
Apurva Abhyankar
Acoustic scene analysis with multi-head attention networks

Weimin Wang, Weiran Wang, Ming Sun, Chao Wang

Interspeech 2020

2020

Acoustic Scene Classiﬁcation (ASC) is a challenging task, as a single scene may involve multiple events that contain complex sound patterns. For example, a cooking scene may contain several sound sources including silverware clinking, chopping, frying, etc. What complicates ASC more is that classes of different activities could have overlapping sounds patterns (e.g. both cooking and dishwashing could have

Conversational AI

...

819

820

821

...

916

Search results

Work with us