Customer-obsessed science
Research areas
-
December 1, 20258 min read“Network language models” will coordinate complex interactions among intelligent components, computational infrastructure, access points, data centers, and more.
-
-
November 20, 20254 min read
-
October 20, 20254 min read
-
October 14, 20257 min read
Featured news
-
2025Recent advancements in language-guided diffusion models for image editing are often bottle-necked by cumbersome prompt engineering to precisely articulate desired changes. An intuitive alternative calls on guidance from in-the-wild image exemplars to help users bring their imagined edits to life. Contemporary exemplar-based editing methods shy away from leveraging the rich latent space learnt by pre-existing
-
2025One common approach for question answering over speech data is to first transcribe speech using automatic speech recognition (ASR) and then employ text-based retrieval-augmented generation (RAG) on the transcriptions. While this cascaded pipeline has proven effective in many practical settings, ASR errors can propagate to the retrieval and generation steps. To overcome this limitation, we introduce SpeechRAG
-
ICSE 20252025Pricing agreements at AWS define how customers are billed for usage of services and resources. A pricing agreement consists of a complex sequence of terms that can include free tiers, savings plans, credits, volume discounts, and other similar features. To ensure that pricing agreements reflect the customers’ intentions, we employ a protocol that runs a set of validations that check all pricing agreements
-
2025Following the great progress in text-conditioned image generation there is a dire need for establishing clear comparison benchmarks. Unfortunately, assessing performance of such models is highly subjective and notoriously difficult. Current automatic assessment of generated images quality and their alignment to text are approximate at best while human assessment is subjective, poorly calibrated and not
-
2025Despite recent advancements in speech processing, zero-resource speech translation (ST) and automatic speech recognition (ASR) remain challenging problems. In this work, we propose to leverage a multilingual Large Language Model (LLM) to perform ST and ASR in languages for which the model has never seen paired audio-text data. We achieve this by using a pre-trained multilingual speech encoder, a multilingual
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all