Authors: Haoyang Li, Yuchen Hu, Chen Chen, Eng Siong Chng
Abstract: High-fidelity speech enhancement often requires sophisticated modeling to
capture intricate, multiscale patterns. Standard activation functions, while
introducing nonlinearity, lack the flexibility to fully address this
complexity. Kolmogorov-Arnold Networks (KAN), an emerging methodology that
employs learnable activation functions on graph edges, present a promising
alternative. This work investigates two novel KAN variants based on rational
and radial basis functions for speech enhancement. We integrate the rational
variant into the 1D CNN blocks of Demucs and the GRU-Transformer blocks of
MP-SENet, while the radial variant is adapted to the 2D CNN-based decoders of
MP-SENet. Experiments on the VoiceBank-DEMAND dataset show that replacing
standard activations with KAN-based activations improves speech quality across
both the time-domain and time-frequency domain methods with minimal impact on
model size and FLOP, underscoring KAN’s potential to improve speech enhancement
models.
Source: http://arxiv.org/abs/2412.17778v1