Authors: Dylan Li, Gyungin Shin
Abstract: Unsupervised instance segmentation aims to segment distinct object instances
in an image without relying on human-labeled data. This field has recently seen
significant advancements, partly due to the strong local correspondences
afforded by rich visual feature representations from self-supervised models
(e.g., DINO). Recent state-of-the-art approaches use self-supervised features
to represent images as graphs and solve a generalized eigenvalue system (i.e.,
normalized-cut) to generate foreground masks. While effective, this strategy is
limited by its attendant computational demands, leading to slow inference
speeds. In this paper, we propose Prompt and Merge (ProMerge), which leverages
self-supervised visual features to obtain initial groupings of patches and
applies a strategic merging to these segments, aided by a sophisticated
background-based mask pruning technique. ProMerge not only yields competitive
results but also offers a significant reduction in inference time compared to
state-of-the-art normalized-cut-based approaches. Furthermore, when training an
object detector using our mask predictions as pseudo-labels, the resulting
detector surpasses the current leading unsupervised model on various
challenging instance segmentation benchmarks.
Source: http://arxiv.org/abs/2409.18961v1