Customer-obsessed science
Research areas
-
January 14, 2025Key exchange protocols and authentication mechanisms solve distinct problems and must be integrated in a secure communication system.
-
December 24, 2024
-
December 24, 2024
-
-
Featured news
-
ICASSP 20252025Audio-Visual Speech-to-Speech Translation (AVS2S) typically prioritizes improving translation quality and naturalness. However, an equally critical aspect in audio-visual content is lip-synchrony—ensuring that the movements of the lips match the spoken content—essential for maintaining realism in dubbed videos. Despite its importance, the inclusion of lip-synchrony constraints in AVS2S models has been largely
-
ICASSP 20252025We propose a lightweight neural front-end framework for on-device speech generation and highlight its benefits towards low-resource language scaling. While data-driven models have shown potential in front-end literature, especially since they can enable fast language expansion, they are often extremely large and of high latency. There is limited work focusing on their usability in real-time settings, and
-
ICASSP 20252025Self-supervised pretraining has transformed speech representation learning, enabling models to generalize across various downstream tasks. However, empirical studies have highlighted two notable gaps. First, different speech tasks require varying levels of acoustic and semantic information, which are encoded at different layers within the model. This adds the extra complexity of layer selection on downstream
-
ICASSP 20252025Speaker Diarization (SD) is a crucial component of modern end-to-end ASR pipelines. Traditional SD systems, which are typically audio-based and operate independently of ASR, often introduce speaker errors, particularly during speaker transitions and overlapping speech. Recently, language models including fine-tuned large language models (LLMs) have shown to be effective as a second-pass speaker error corrector
-
2025The field of indoor monocular 3D object detection is gaining significant attention, fueled by the increasing demand in VR/AR and robotic applications. However, its advancement is impeded by the limited availability and diversity of 3D training data, owing to the labor-intensive nature of 3D data collection and annotation processes. In this paper, we present V-MIND (Versatile Monocular INdoor Detector),
Academia
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all