Most music generative models generate a single music mixture. To allow for more flexible and controllable generation, the Multi-Source Diffusion Model (MSDM) has been proposed to model music mixture as different instrument sources (piano, drums, bass, and guitar), which allows the generation of music composition by generating different instruments. Despite its capabilities, MSDM is unable to generate songs with rich melodies and often generates empty sounds. Also, waveform diffusion introduces significant Gaussian noise artifacts, which compromises audio quality. In response, we introduce a multi-source latent diffusion model (MSLDM) that employs Variational Autoencoders (VAEs) to encode each instrumental source into a distinct latent representation. By training a VAE on all music sources, we efficiently capture each source's unique characteristics in a "source latent" that our diffusion model models jointly. This approach significantly enhances the total and partial generation of music by leveraging the VAE’s latent compression and noise-robustness. The compressed source latent also facilitates more efficient generation. Subjective listening tests and Fréchet Audio Distance (FAD) scores confirm that our model outperforms MSDM, showcasing its practical and enhanced applicability in music generation systems.
Here are some audio samples, the models configurations are shown below:
Total Generation Sample 1
Methods | Bass | Drums | Piano | Guitar | Mixture |
---|---|---|---|---|---|
MSDM [1] | |||||
ISLDM | |||||
MSLDM (ours) | |||||
MSLDM-Large (Ours) | |||||
MixLDM |
Total Generation Sample 2
Methods | Bass | Drums | Piano | Guitar | Mixture |
---|---|---|---|---|---|
MSDM [1] | |||||
ISLDM | |||||
MSLDM (ours) | |||||
MSLDM-Large (Ours) | |||||
MixLDM |
Partial Generation, condition on piano and guitar, generate bass and drums
Methods | Condition | Generated | Mixture |
---|---|---|---|
MSDM [1] | |||
ISLDM | |||
MSLDM | |||
MSLDM-Large |
Partial Generation, condition on drums and guitar, generate bass and piano
Methods | Condition | Generated | Mixture |
---|---|---|---|
MSDM [1] | |||
ISLDM | |||
MSLDM | |||
MSLDM-Large |
Partial Generation, condition on bass and guitar, generate drums and piano
Methods | Condition | Generated | Mixture |
---|---|---|---|
MSDM [1] | |||
ISLDM | |||
MSLDM | |||
MSLDM-Large |
[1] Giorgio Mariani, Irene Tallini, Emilian Postolache, Michele Mancusi, Luca Cosmo, Emanuele Rodolà, “Multi-Source Diffusion Models for Simultaneous Music Generation and Separation ”, 2024.