Google’s AI music generator is like ChatGPT for audio



Skip to content

Google’s AI music generator is like ChatGPT for audio

It can write 5-minute songs based on short text prompts.
April 17, 2023
Credit: Adobe Stock / Annelisa Leinbach

Google has unveiled an advanced AI music generator that can turn a snippet of text into a song — but legal concerns might prevent the tech giant from ever sharing it with the public.

The AI revolution: ChatGPT, DALL-E 2, and other advanced AIs capable of generating impressive text or images in response to user prompts exploded in popularity in 2022, but they weren’t the first generative AIs, nor the only examples of what the neural networks can do.

Several companies have also trained AIs to generate music in response to text, audio, or image prompts — OpenAI, the research firm behind ChatGPT and DALL-E 2, even released an AI music generator called "Jukebox" back in 2020.

These systems haven’t been as enthusiastically embraced as their text- and image-generating counterparts, though, mainly because their outputs aren’t as impressive — most are low-fidelity, simplistic, and lacking in traditional song structures, such as repeating choruses.

What’s new? Music-making AIs are getting better, though, and perhaps the most impressive example of the technology is MusicLM, an AI music generator unveiled by Google in January 2023.

The system can generate clips up to 5 minutes long based on text descriptions, and while the music isn’t going to win any Grammys, the audio does sound more like something a human might record than the clips generated by other AIs.

How it works: Google trained MusicLM on more than 280,000 hours of music sourced from MuLan, a model trained to link music to descriptions written in natural language.

They then created MusicCaps, a publicly accessible dataset of more than 5,500 music clips to use to evaluate the AI music generator. Expert musicians wrote captions for each of these clips, as well as lists of aspects to describe them, such as their genre or mood.

During the evaluation stage, Google pitted MusicLM against two other text-to-music AIs — Mubert and Riffusion — using several quantitative metrics for assessing a clip’s audio quality and adherence to a text description.

They also presented human evaluators with MusicCaps’ descriptions and two audio clips — these might be two clips produced by AIs or one AI-generated clip and the music upon which the MusicCaps description was based. The evaluators then chose which of the clips they thought best matched the description.

According to a paper Google shared on the preprint server arXiv, MusicLM outperformed the other AIs across the board.

"We strongly emphasize the need for more future work in tackling these risks associated to music generation."

Agostinelli et al.

Looking ahead: Google’s AI music generator may be able to produce audio that sounds closer to human-written music, but it still can’t replicate traditional song structures, and the vocals it creates are particularly poor quality, with unintelligible lyrics.

Google says future work on the system could focus on those issues, improving the overall quality of the audio, and addressing the problem that’s preventing it from releasing the MusicLM to the public: about 1% of its output can be approximately matched to audio in its training data.

"We acknowledge the risk of potential misappropriation of creative content associated to the use case … We strongly emphasize the need for more future work in tackling these risks associated to music generation," the researchers wrote.

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at tips@freethink.com.

Related
AI could rescue scientific papers from the curse of jargon
Scientific papers have been getting harder to read over time; AI might help reverse the trend.
By Etienne Fortier-Dubois
New smart glasses use sonar to read your lips
New glasses use sonar sensing technology and AI to allow users to control devices via silent, mouthed commands.
Xwing puts autonomous flight on the runway to approval
Autonomous flight startup Xwing has filed for approval to do crewless cargo flights, a first step towards pilotless commercial aviation.
Epilepsy surgery has a success rate of only 50%. This digital brain may change that.
Using patient data and AI, French researchers have created a digital model of the brain to figure out which brain region needs removed.
Subscribe to Freethink for more great stories
Email *
Checkbox List
  • Freethink Weekly
  • Future Explored
If you are a human seeing this field, please leave it empty.
Subscribe to Freethink Weekly
At Freethink, we believe the daily news should inspire people to build a better world. While most media is fueled by toxic politics and negativity, we focus on solutions: the smartest people, the biggest ideas, and the most ground breaking technology shaping our future.
Fields marked with an * are required
Email *
If you are a human seeing this field, please leave it empty.