Theoretical Benefit and Limitation of Diffusion Language Model

Authors: Guhao Feng, Yihan Geng, Jian Guan, Wei Wu, Liwei Wang, Di He

Abstract: Diffusion language models have emerged as a promising approach for text
generation. One would naturally expect this method to be an efficient
replacement for autoregressive models since multiple tokens can be sampled in
parallel during each diffusion step. However, its efficiency-accuracy trade-off
is not yet well understood. In this paper, we present a rigorous theoretical
analysis of a widely used type of diffusion language model, the Masked
Diffusion Model (MDM), and find that its effectiveness heavily depends on the
target evaluation metric. Under mild conditions, we prove that when using
perplexity as the metric, MDMs can achieve near-optimal perplexity in sampling
steps regardless of sequence length, demonstrating that efficiency can be
achieved without sacrificing performance. However, when using the sequence
error rate–which is important for understanding the “correctness” of a
sequence, such as a reasoning chain–we show that the required sampling steps
must scale linearly with sequence length to obtain “correct” sequences, thereby
eliminating MDM’s efficiency advantage over autoregressive models. Our analysis
establishes the first theoretical foundation for understanding the benefits and
limitations of MDMs. All theoretical findings are supported by empirical
studies.

Source: http://arxiv.org/abs/2502.09622v1

About the Author

Leave a Reply

Your email address will not be published. Required fields are marked *

You may also like these