Authors: Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran, Jitendra Malik
Abstract: In this paper, we argue that iterative computation with diffusion models
offers a powerful paradigm for not only generation but also visual perception
tasks. We unify tasks such as depth estimation, optical flow, and segmentation
under image-to-image translation, and show how diffusion models benefit from
scaling training and test-time compute for these perception tasks. Through a
careful analysis of these scaling behaviors, we present various techniques to
efficiently train diffusion models for visual perception tasks. Our models
achieve improved or comparable performance to state-of-the-art methods using
significantly less data and compute. To use our code and models, see
https://scaling-diffusion-perception.github.io .
Source: http://arxiv.org/abs/2411.08034v1