Customer-obsessed science
![Amazon Science homepage.jpeg](https://assets.amazon.science/dims4/default/d4c4a52/2147483647/strip/true/crop/1385x1200+208+0/resize/435x377!/quality/90/?url=http%3A%2F%2Famazon-topics-brightspot.s3.amazonaws.com%2Fscience%2F48%2F0f%2F1db2f1004b82a99a0175ff391d53%2Famazon-science-homepage.jpeg)
![EchoFrame_Animated_121124 (1).gif](https://assets.amazon.science/dims4/default/9262b5a/2147483647/strip/true/crop/646x563+177+0/resize/218x190!/quality/90/?url=http%3A%2F%2Famazon-topics-brightspot.s3.amazonaws.com%2Fscience%2F8a%2F79%2Fad103fb544aaa2fbdf0745c366f1%2Fechoframe-animated-121124-1.gif)
Research areas
-
February 06, 2025Novel training procedure and decoding mechanism enable model to outperform much larger foundation model prompted to perform the same task.
-
-
December 24, 2024
-
December 24, 2024
-
Featured news
-
2025In various video-language learning tasks, the challenge of achieving cross-modality alignment with multi-grained data persists. We propose a method to tackle this challenge from two crucial perspectives: data and modeling. Given the absence of a multi-grained video-text pretraining dataset, we introduce a Granularity EXpansion (GEX) method with Integration and Compression operations to expand the granularity
-
3DV 20252025Current image-to-3D approaches suffer from high computational costs and lack scalability for high-resolution outputs. In contrast, we introduce a novel framework to directly generate explicit surface geometry and texture using multi-view 2D depth and RGB images along with 3D Gaussian features using a repurposed Stable Diffusion model. We introduce a depth branch into U-Net for efficient and high quality
-
2025Invoices and receipts submitted by employees are visually rich documents (VRDs) with textual, visual and layout information. To protect against the risk of fraud and abuse, it is crucial for organizations to efficiently extract desired information from submitted receipts. This helps in the assessment of key factors such as appropriateness of the expense claim, adherence to spending and transaction policies
-
2025Computing a comprehensive and robust visual representation of an arbitrary object or category of objects is a complex problem. The difficulty increases when one starts from a set of uncalibrated images obtained from different sources. We propose a self-supervised approach, Multi-Image Latent Embedding (MILE), which computes a single representation from such an image set. MILE operates incrementally, considering
-
Findings of EMNLP 20242024In a plethora of recent work, large language models (LLMs) demonstrated impressive reasoning ability, but many proposed downstream reasoning tasks only focus on final answers. Two fundamental questions persist: 1) how consistent is the reasoning, and 2) can models detect unreliable reasoning? In this paper, we investigate self-contradictory (SELF-CONTRA) reasoning, where the model reasoning does not support
Academia
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all