Computer vision

A quick guide to Amazon's 20-plus papers at CVPR 2023

Image segmentation, multimodal models, and innovative machine learning methods are among the Amazon researchers' areas of focus.

By Staff writer

June 20, 2023

Amazon's papers at this year's Computer Vision and Pattern Recognition Conference (CVPR), sorted by research topic.

3-D perception

Implicit surface contrastive clustering for LiDAR point clouds
Zaiwei Zhang, Min Bai, Erran Li

Anomaly classification

WinCLIP: Zero-/few-shot anomaly classification and segmentation
Jongheon Jeong, Yang Zou, Taewan Kim, Dongqing Zhang, Avinash Ravichandran, Onkar Dabeer

Data annotation

CVPR highlight*:
HandsOff: Labeled dataset generation with no additional human annotations
Austin Xu, Mariya Vasileva, Achal Dave, Arjun Seshadri

Knowledge combination to learn rotated detection without rotated annotation
Tianyu Zhu, Bryce Ferenczi, Pulak Purkait, Tom Drummond, Hamid Rezatofighi, Anton van den Hengel

Rotated bounding boxes.png — From "Knowledge combination to learn rotated detection without rotated annotation".

Image generation

FlexNeRF: Photorealistic free-viewpoint rendering of moving humans from sparse views
Vinoj Jayasundara, Amit Agrawal, Nicolas Heron, Abhinav Shrivastava, Larry Davis

LEMaRT: Label-efficient masked region transform for image harmonization
Sheng Liu, Cong Phuoc Huynh, Cong Chen, Maxim Arap, Raffay Hamid

Image segmentation

Network-free, unsupervised semantic segmentation with synthetic images
Qianli Feng, Raghudeep Gadde, Wentong Liao, Eduard Ramon Maldonado, Aleix Martinez

PolyFormer: Referring image segmentation as sequential polygon generation
Jiang Liu, Hui Ding, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha

Spatio-Temporal Pixel-Level contrastive learning-based source-free domain adaptation for video semantic segmentation
Shao-Yuan Lo, Poojankumar Oza, Sumanth Chennupati, Alejandro Galindo, Vishal M. Patel

Video segmentation.png — From "Spatio-temporal pixel-level contrastive learning-based source-free domain adaptation for video semantic segmentation".

Machine learning

A meta-learning approach to predicting performance and data requirements
Achin Jain, Gurumurthy Swaminathan, Paolo Favaro, Hao Yang, Avinash Ravichandran, Hrayr Harutyunyan, Alessandro Achille, Onkar Dabeer, Bernt Schiele, Ashwin Swaminathan, Stefano Soatto

Leveraging inter-rater agreement for classification in the presence of noisy labels
Maria Sofia Bucarelli, Lucas Cassano, Federico Siciliano, Amin Mantrach, Fabrizio Silvestri

Train/test-time adaptation with retrieval
Luca Zancato, Alessandro Achille, Tian Yu Liu, Matthew Trager, Pramuditha Perera, Stefano Soatto

Multimodal models

Dynamic inference with grounding based vision and language models
Burak uzkent, Amanmeet Garg, Wentao Zhu, Keval Doshi, Jingru Yi, Andy Wang, Mohamed Omar

GIVL: Improving geographical inclusivity of vision-language models with pre-training methods
Da Yin, Feng Gao, Govind Thattai, Michael Johnston, Kai-Wei Chang

Grounding counterfactual explanation of image classifier to textual concept space
Siwon Kim, Jinoh Oh, Sungjin Lee, Seunghak Yu, Jae Do, Tara Taghavi

Understanding and constructing latent modality structures in multi-modal representation learning
Qian Jiang, Changyou Chen, Han Zhao, Liqun Chen, Qing Ping, Son Tran, Yi Xu, Belinda Zeng, Trishul Chilimbi

Latent modality structures.png — From "Understanding and constructing latent modality structures in multi-modal representation learning".

Object detection

ScaleDet: A scalable multi-dataset object detector
Yanbei Chen, Manchen Wang, Abhay Mittal, Zhenlin Xu, Paolo Favaro, Joe Tighe, Davide Modolo

Product characterization

Learning attribute and class-specific representation duet for fine-grained fashion analysis
Yang (Andrew) Jiao, Yan Gao, Jingjing Meng, Jin Shang, Yi Sun

SkiLL: Skipping Color and Label Landscape: self supervised design representations for products in e-commerce
Vinay Kumar Verma, Dween Rabius Sanny, Prateek Sircar, Shreyas Sunil Kulkarni, Deepak Gupta, Abhishek Singh

Video understanding

Movies2Scenes: Using movie metadata to learn scene representation
Shixing Chen, Chun-Hao Liu, Xiang Hao, Xiaohan Nie, Maxim Arap, Raffay Hamid

Selective structured state-spaces for long-form video understanding
Jue Wang, Wentao Zhu, Pichao Wang, Xiang Yu, Linda Liu, Mohamed Omar, Raffay Hamid

* distinction accorded to the top 10% of papers accepted to the conference

About the Author

Staff writer

A quick guide to Amazon's 20-plus papers at CVPR 2023

Image segmentation, multimodal models, and innovative machine learning methods are among the Amazon researchers' areas of focus.

3-D perception

Anomaly classification

Data annotation

Image generation

Image segmentation

Machine learning

Multimodal models

Object detection

Product characterization

Video understanding

Related content

Work with us