Ronny_Jotten t1_j0cizez wrote on December 15, 2022 at 5:33 PM

Your theories are somewhat naive. Large companies like Google have no problem getting access to all the music they want. And nobody tries to "hide the fact that they trained their model with copyrighted material". The current state of AI training seems to be that copyright is irrelevant, and it's fair use - though we'll see whether that holds up in court. Nearly everything in LAION is copyrighted images scraped from the web, and they are used without permission for training. Furthermore, anyone can use the Million Song Dataset, and get access to the actual tracks through an API.

Million-song dataset: take it, it’s free | Ars Technica

On the other hand, the idea of turning audio into a 2D spectrogram image, and using the same tools as image-generating AIs, is also naive. Music generation requires a very different approach. There are a multitude of AI music-generation projects, some using GANs. So far, the results have not been as astonishing as the image generators. But that's only a matter of degree, and probably a matter of time.

happyhammy OP t1_j0ee3fz wrote on December 16, 2022 at 1:00 AM

I was very pleasantly surprised to see the release of https://www.riffusion.com/ today. I'd say it's the best music generation to date and they are using the 2d spectrogram approach.

What's also interesting is they're not telling us what dataset they trained the model with.

Ronny_Jotten t1_j0hgi63 wrote on December 16, 2022 at 5:49 PM

It depends what you mean by "AI", but there are already generative music systems that produce far better music than that.

Spectral analysis/resynthesis is certainly important. There have long been tools like MetaSynth that let you do image processing of spectrograms. It's interesting that the "riffusion" project works at all, and it's a valuable piece of research. I can imagine the technique being useful for musicians as a way to generate novel sounds to be incorporated in larger compositions.

But it's difficult to see how it can be used successfully on entire, already-mixed-down pieces, to generate a complete piece of music in that way. Although it can produce some interesting and strange loops, it's hard to call the output that riffusion produces "music" in the sense of an overall composition, and I'm skeptical that this basic technique can be tweaked to do so. I could be wrong, but I still think it's a naive approach, and any actually listenable music-generation system will be based on rather different principles.