Google's Latest AI Breakthrough: Generating Music from Text Prompts
The last twelve months have seen a constant stream of new types of generative AI tools coming to the internet. Some like DALL-E 2 and ChatGPT have genuinely caught the internet’s imagination a blown people’s minds. Others like Meta’s Make-A-Video tool and OpenAI’s 3D model-building tool seem more like works in progress that have failed to impress or make a splash. The latest generative AI tool coming from google seems to be something that combines both aspects. It is still rough around the edges, but there is definitely enough there to impress people already. Let's check it out:
The new tool from Google is called MusicLM. It works much like all the other generative tools we’ve seen so far in that it responds to text prompts. However, instead of giving us text, images, or videos, MusicLM will create music from text prompts alone. Not just sounds but music.
In the research paper explaining how they developed the tool, Google says:
“a model for generating high-fidelity music from text descriptions such as “a calming violin melody backed by a distorted guitar riff” MusicLM casts the process of conditional music generation as a hierarchical sequenceto-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes.”
The results are rather startling too and quite impressive with lengthy beats being composed from simple text prompts “melodic techno” and “relaxing jazz”. Things do get really weird though when the music has what seems to be AI-generated vocals, which can come across as rather chilling. You can check them out some of the music created from various prompts below and check out all the available sound clips here:
“The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds, like cymbal crashes or drum rolls.”
“A rising synth is playing an arpeggio with a lot of reverb. It is backed by pads, sub bass line and soft drums. This song is full of synth sounds creating a soothing and adventurous atmosphere. It may be playing at a festival during two songs for a buildup.”
“This is an r&b/hip-hop music piece. There is a male vocal rapping and a female vocal singing in a rap-like manner. The beat is comprised of a piano playing the chords of the tune with an electronic drum backing. The atmosphere of the piece is playful and energetic. This piece could be used in the soundtrack of a high school drama movie/TV show. It could also be played at birthday parties or beach parties.”
Clearly the models still have a long way to go before they will be creating high-quality music that will attract many users. However, Google claims that Music LM is head and shoulders above other generative music models, and it has to be said that the outputs showcased are pretty compelling when you consider that they came from just a few lines of text. Just don't tell Nick Cave.
At this point, Google has said that it doesn’t have any plans to release these models to the public, which means you’ll have to stick to ChatGPT if you want to scratch your generative AI itch.Advertisement