Google has developed an AI system, MusicLM, that generates musical pieces up to several minutes long based on text prompts. It can also transform a hummed or whistled melody into various instruments, similar to DALL-E’s ability to create images from text inputs.

“We introduce MusicLM, a model generating high-fidelity music from text descriptions such as “a calming violin melody backed by a distorted guitar riff”. MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes,” describes an academic paper published by Cornell University.

While the technology is not ready for you to use it by yourself as of now, the company has uploaded few samples to demonstrate the kind of music MusicLM can generate using texts.

There are over 30 music samples shared on the page that have been generated using rich texts. MusicLM can generate music across various genres like jazz, pop, rock, death metal, and likewise more. MusicLM can also generate painting caption conditioning wherein the AI will generate music from painting description.

These snippets are created from text prompts that specify a genre, mood, and instruments. There are also 5-minute pieces generated from simple phrases like “melodic techno.” A standout demo is the “story mode” where the AI is given a script and transitions between various prompts.

MusicLM is a neural network-based system trained on over 280k hours of music data, allowing it to generate unique music tracks across various instruments, genres, and themes based on text input.

