Total Generation
Audio Conditioning Generation










Text Conditioning Generation
Arrangement Generation
Arrangement Generation B: Bass




Arrangement Generation D: Drums




Arrangement Generation G: Guitar




Arrangement Generation P: Piano




Arrangement Generation BD: Bass and Drums




Arrangement Generation BG: Bass and Guitar




Arrangement Generation BP: Bass and Piano




Arrangement Generation DG: Drums and Guitar




Arrangement Generation PD: Drums and Piano




Arrangement Generation GP: Guitar and Piano




Arrangement Generation BDG: Bassm Drum and Guitar




Arrangement Generation BDP: Bassm Drum and Piano




Arrangement Generation BGP: Bassm Guitar and Piano




Arrangement Generation DGP: Drums Guitar and Piano




P.S. Source Seperation
Seperations with Dirac algorithm
Seperations with Gaussian algorithm
Source Separation
Audio source separation refers to the process of isolating individual sound elements from a mixture of sounds. This method is critical in numerous areas, particularly in music production, where it facilitates the isolation of single instruments from a complete mix. The primary challenge involves accurately identifying and extracting the intended source without introducing noise or degrading the audio quality.
MSDM[1] introduced a diffusion-based multi source generative model
that is capable of both music synthesis and source separation within a singular framework, operating directly on raw waveform.
This model employs a diffusion-based generative approach, trained via denoising score-matching[2],
to learn the priors of stems that share contextual relationships. The fundamental principle of score matching
is to approximate the "score" function of the target distribution
The MSDM framework defines source separation as a particular case of conditional generation, wherein the model is required
to estimate the score of the posterior distribution with given mixture
Algorithm | Bass | Drums | Guitar | Piano |
---|---|---|---|---|
MT-MusicLDM (Dirac) | 3.26 | 3.63 | 2.79 | 2.58 |
MT-MusicLDM (Gaussian) | 3.23 | 3.07 | 1.97 | 2.53 |
MSDM (Dirac) | 17.12 | 18.68 | 15.38 | 14.73 |
MSDM (Gaussian) | 13.93 | 17.92 | 14.19 | 12.11 |
References
- Giorgio Mariani and Irene Tallini and Emilian Postolache and Michele Mancusi and Luca Cosmo and Emanuele Rodolà: Multi-Source Diffusion Models for Simultaneous Music Generation and Separation, arXiv:2302.02257, 2024. ↩
- Song, Yang and Ermon, Stefano: Generative Modeling by Estimating Gradients of the Data Distribution, NIPS, 2019. ↩
- Yang Song, Jascha Sohl{-}Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole: Score-Based Generative Modeling through Stochastic Differential Equations, ICLR, 2021. ↩
- Tero Karras and Miika Aittala and Timo Aila and Samuli Laine: Elucidating the Design Space of Diffusion-Based Generative Models, arXiv:2206.00364, 2022. ↩
- Jiaming Song, Chenlin Meng, Stefano Ermon: Denoising Diffusion Implicit Models, ICLR, 2021. ↩