Authors: Susung Hong
Abstract: Conditional diffusion models have shown remarkable success in visual content
generation, producing high-quality samples across various domains, largely due
to classifier-free guidance (CFG). Recent attempts to extend guidance to
unconditional models have relied on heuristic techniques, resulting in
suboptimal generation quality and unintended effects. In this work, we propose
Smoothed Energy Guidance (SEG), a novel training- and condition-free approach
that leverages the energy-based perspective of the self-attention mechanism to
enhance image generation. By defining the energy of self-attention, we
introduce a method to reduce the curvature of the energy landscape of attention
and use the output as the unconditional prediction. Practically, we control the
curvature of the energy landscape by adjusting the Gaussian kernel parameter
while keeping the guidance scale parameter fixed. Additionally, we present a
query blurring method that is equivalent to blurring the entire attention
weights without incurring quadratic complexity in the number of tokens. In our
experiments, SEG achieves a Pareto improvement in both quality and the
reduction of side effects. The code is available at
\url{https://github.com/SusungHong/SEG-SDXL}.
Source: http://arxiv.org/abs/2408.00760v1