Computer vision

Helping devices see and understand our visual world.

Structured human assessment of text-to-image generative models

Ciprian Corneanu, Qianli Feng, Aleix Martinez

WACV 2025

2025

Following the great progress in text-conditioned image generation there is a dire need for establishing clear comparison benchmarks. Unfortunately, assessing performance of such models is highly subjective and notoriously difficult. Current automatic assessment of generated images quality and their alignment to text are approximate at best while human assessment is subjective, poorly calibrated and not

Computer vision
Ultra-low complexity neural networks for next generation video decoding

Kiran Misra, Shashwat Ranjan Chaurasia, Andrew Segall, Byeongdoo Choi

DCC 2025

2025

Video compression enables the transmission of video content at low rates and high qualities to our customers. In this paper, we consider the problem of embedding a neural network directly into a video decoder. This requires a design capable of operating at latencies low enough to decode tens to hundreds of high-resolution images per second. And, additionally, a network with a complexity suitable for implementation

Computer vision
LLaVA-RE: Binary image-text relevancy evaluation with multimodal large language model

Tao Sun, Oliver Liu, JinJin Li, Lan Ma

COLING 2025 Workshop on Evaluation of Multi-Modal Generation

2025

Multimodal generative AI usually involves generating image or text responses given inputs in another modality. The evaluation of image-text relevancy is essential for measuring response quality or ranking candidate responses. In particular, binary relevancy evaluation, i.e., “Relevant” vs. “Not Relevant”, is a fundamental problem. However, this is a challenging task considering that texts have diverse formats

Computer vision
MVAD: A multiple visual artifact detector for video streaming

Chen Feng, Duolikun Danier, Fan Zhang, Alex Mackin, Andy Collins, David Bull

WACV 2025

2025

Visual artifacts are often introduced into streamed video content, due to prevailing conditions during content production and delivery. Since these can degrade the quality of the user’s experience, it is important to automatically and accurately detect them in order to enable effective quality measurement and enhancement. Existing detection methods often focus on a single type of artifact and/or determine

Computer vision
DreamBlend: Advancing personalized fine-tuning of text-to-image diffusion models

Shwetha Ram, Tal Neiman, Qianli Feng, Andrew Stuart, Son Tran, Trishul Chilimbi

WACV 2025

2025

Given a small number of images of a subject, personalized image generation techniques can fine-tune large pre-trained text-to-image diffusion models to generate images of the subject in novel contexts, conditioned on text prompts. In doing so, a trade-off is made between prompt fidelity, subject fidelity and diversity. As the pre-trained model is fine-tuned, earlier checkpoints synthesize images with low

Computer vision

How a passion for reinforcement learning guided Alexander Long’s trajectory

Mariana Lenharo

June 24, 2022

The field motivated him to pursue a PhD, which eventually led him to Amazon.

Computer vision
Former Amazon intern Karsten Roth wins EMVA young professional award

Staff writer

June 23, 2022

EMVA Young Professional Award honors “outstanding and innovative work of a student or a young professional in the field of machine vision or image processing.”

Computer vision
Prime Video's work on 3-D scene reconstruction, image representation

Raffay Hamid

June 22, 2022

CVPR papers examine the recovery of 3-D information from camera movement and learning general representations from weakly annotated data.

Computer vision
Amelia Hayson

Olga Moskvyak’s journey into the world of science

Mariana Lenharo

June 21, 2022

How she moved across the world to discover a passion for (and a career in) machine learning.

Computer vision
Anton van den Hengel’s journey from intellectual property law to computer vision pioneer

Sean O'Neill

June 20, 2022

Amazon’s director of applied science in Adelaide, Australia, believes the economic value of computer vision has “gone through the roof".

Computer vision
CVPR: Understanding images means understanding the world

Larry Hardesty

June 16, 2022

Senior principal scientist Aleix M. Martinez on why computer vision research has only begun to scratch the surface.

Computer vision

Computer vision

Recent publications

Related content

Work with us