Single_Instruction45 t1_j0dbqyb wrote on December 15, 2022 at 8:37 PM

I've been researching how to make good music with AI and the following points come up constantly.

Firstly, music generation is very different than image generation. When generating an image you have one idea or concept to generate (I'm simplifying, but bear with me) while generating music you have many ideas generated at the same time (melody, rhythm tracks, harmony) all progressing in time.

Secondly, music is mostly a symbolic language that translates to sound. When trying to capture an audio file most of this symbolic data is hard to retrieve in its original form. There are good algorithms to translate this stuff to midi, but we are far from perfect on that side.

When comparing music generation to text generation, the context of words in a sentence only has to compare the elements that are before and after. For music, this is much more complex as we have to consider polyphony, other instruments as well as harmony. This is a much harder problem to tackle.

Finally, as many others have indicated measuring how good music is a very subjective task.

Taking that all into account, I feel that the research for music generation in the AI world is lacking, but I feel that tackling this problem is a very hard one and might produce AI architectures even better than what we have now. That's why I believe a lot more research should be done on this subject.