In March, we saw the launch of a “ChatGPT for music” called Suno, which uses generative AI to produce realistic songs on demand from short text prompts.
A few weeks later, a similar competitor—Udio—arrived on the scene.
The view that AI systems will never make “real” music like humans do should be understood more as a claim about social context than technical capability.
Generating audio from text prompts in itself is nothing new. However, Suno and Udio have made an obvious development: from a simple text prompt, they generate song lyrics (using a ChatGPT-like text generator), feed them into a generative voice model, and integrate the “vocals” with generated music to produce a coherent song segment.
The effect can be uncanny. It’s AI, but the voice can still cut through with emotional impact. When the music performs a perfectly executed end-of-bar pirouette into a new section, my brain gets some of those little sparks of pattern-processing joy that I might get listening to a great band.
This highlights something sometimes missed about musical expression: AI doesn’t need to experience emotions and life events to successfully express them in music that resonates with people.
Like other generative AI products, Suno and Udio were trained on vast amounts of existing work by real humans—and there is much debate about those humans’ intellectual property rights.