Authors: Yifan Sun, Yuhang Li, Yue Zhang, Yuchen Jin, Huan Zhang
Abstract: Open-source Large Language Models (LLMs) have recently demonstrated
remarkable capabilities in natural language understanding and generation,
leading to widespread adoption across various domains. However, their
increasing model sizes render local deployment impractical for individual
users, pushing many to rely on computing service providers for inference
through a blackbox API. This reliance introduces a new risk: a computing
provider may stealthily substitute the requested LLM with a smaller, less
capable model without consent from users, thereby delivering inferior outputs
while benefiting from cost savings. In this paper, we formalize the problem of
verifiable inference for LLMs. Existing verifiable computing solutions based on
cryptographic or game-theoretic techniques are either computationally
uneconomical or rest on strong assumptions. We introduce SVIP, a secret-based
verifiable LLM inference protocol that leverages intermediate outputs from LLM
as unique model identifiers. By training a proxy task on these outputs and
requiring the computing provider to return both the generated text and the
processed intermediate outputs, users can reliably verify whether the computing
provider is acting honestly. In addition, the integration of a secret mechanism
further enhances the security of our protocol. We thoroughly analyze our
protocol under multiple strong and adaptive adversarial scenarios. Our
extensive experiments demonstrate that SVIP is accurate, generalizable,
computationally efficient, and resistant to various attacks. Notably, SVIP
achieves false negative rates below 5% and false positive rates below 3%, while
requiring less than 0.01 seconds per query for verification.
Source: http://arxiv.org/abs/2410.22307v1