Authors: Qin Liu, Fei Wang, Chaowei Xiao, Muhao Chen
Abstract: Existing preference alignment is a one-size-fits-all alignment mechanism,
where the part of the large language model (LLM) parametric knowledge with
non-preferred features is uniformly blocked to all the users. However, this
part of knowledge can be useful to advanced users whose expertise qualifies
them to handle these information. The one-size-fits-all alignment mechanism
undermines LLM’s utility for these qualified users. To address this problem, we
propose SudoLM, a framework that lets LLMs learn access control over specific
parametric knowledge for users with different credentials via authorization
alignment. SudoLM allows authorized users to unlock their access to all the
parametric knowledge with an assigned SUDO key while blocking access to
non-qualified users. Experiments on two application scenarios demonstrate that
SudoLM effectively controls the user’s access to the parametric knowledge and
maintains its general utility.
Source: http://arxiv.org/abs/2410.14676v1