Prompting foundational models for omni-supervised instance segmentation
2024
Pixel-level mask annotation costs are a major bottleneck in training deep neural networks for instance segmentation. Recent promptable foundation models like the Segment Anything Model (SAM) and GroundedDINO (GDino) have shown impressive zero-shot performance in segmentation and object detection benchmarks. While these models are not capable of performing inference without prompts, they are ideal for omnisupervised learning, where weak labels are used to derive supervisory signals for complex tasks. In our work, we use SAM and GDino as teacher models and prompt them with weak annotations to create high-quality pseudomasks. These pseudomasks are then used to train student instance segmentation models, which do not require prompts at inference time. We explore various weak annotations, such as bounding boxes, points, and image-level class labels, and show that a student model can achieve roughly 95% of a fully-supervised model’s performance while reducing annotation costs by 7⇥. We show the effectiveness of our approach on challenging instance segmentation benchmarks such as COCO [15], ADE20K [30], Cityscapes [9]. Our approach can be used to reduce annotation cost to train instance segmentation models, making it more accessible to a wider range of applications.
Research areas