Member-only story
Redefining Music AI: The Power of Sony’s SoniDo as a Versatile Foundation Model
A foundation model refers to a pre-trained model developed on extensive datasets, designed to be versatile and adaptable for a range of downstream tasks. These models have garnered widespread attention and are increasingly integrated into everyday applications. However, the field of music production lacks a powerful foundation model capable of addressing diverse downstream music tasks.
In a new paper Music Foundation Model as Generic Booster for Music Downstream Tasks, a Sony research team presents SoniDo, a groundbreaking music foundation model (MFM). SoniDo is designed to extract hierarchical features from target music samples, offering a robust framework for improving the effectiveness and accessibility of music processing.
SoniDo employs a generative architecture based on a multi-level transformer coupled with a hierarchical encoder. Through careful preprocessing, its intermediate representations are utilized as features for task-specific models across various music-related tasks, enhanced by data augmentation techniques.