Towards Leveraging Contrastively Pretrained Neural Audio Embeddings for Recommender Tasks

Authors: Florian Grötschla, Luca Strässle, Luca A. Lanzendörfer, Roger Wattenhofer

Abstract: Music recommender systems frequently utilize network-based models to capture
relationships between music pieces, artists, and users. Although these
relationships provide valuable insights for predictions, new music pieces or
artists often face the cold-start problem due to insufficient initial
information. To address this, one can extract content-based information
directly from the music to enhance collaborative-filtering-based methods. While
previous approaches have relied on hand-crafted audio features for this
purpose, we explore the use of contrastively pretrained neural audio embedding
models, which offer a richer and more nuanced representation of music. Our
experiments demonstrate that neural embeddings, particularly those generated
with the Contrastive Language-Audio Pretraining (CLAP) model, present a
promising approach to enhancing music recommendation tasks within graph-based
frameworks.

Source: http://arxiv.org/abs/2409.09026v1