Stable Audio the AI music creator developed by Stability AI has earned a place on TIME’s Best Inventions of 2023 list, demonstrating the immense potential of generative AI in the realm of music and sound generation. Stability AI, a pioneer in open generative AI, launched Stable Audio in September 2023.
This unique product utilizes state-of-the-art generative AI techniques to generate high-quality music and sound effects swiftly and efficiently, all via a user-friendly web interface. Stable Audio offers a basic free version that can generate and download tracks up to 45 seconds long, along with a ‘Pro’ subscription that delivers 90-second tracks suitable for commercial projects.
This innovative product is a boon for musicians looking for unique samples for their work, but its potential extends far beyond that. Stable Audio generates audio tracks in response to descriptive text prompts supplied by the user, along with a specified length of audio. This flexibility opens up limitless opportunities for creators across various fields.
AI audio and music generation
At the heart of Stable Audio is a diffusion-based generative model, specifically a latent diffusion model. These models have substantially advanced generative AI, especially in the creation of images, video, and audio. By operating in the latent encoding space of a pre-trained autoencoder, latent diffusion models offer significant speed improvements in training and inference of diffusion models.
Stable Audio, a product of Stability AI’s generative audio research lab, Harmonai, leverages this technology to generate high-quality, 44.1 kHz music for commercial use. The model is conditioned on text metadata, audio file duration, and start time, allowing for control over the content and length of the generated audio.
One of the challenges with audio diffusion models is their training to generate fixed-size output. This issue can be problematic when generating audio of varying lengths. Stable Audio addresses this challenge by using a dataset of over 800,000 audio files, equating to over 19,500 hours of audio. This extensive dataset significantly improves output quality, controllability, inference speed, and output length.
Stable Audio example
As an example entering “Post-Rock, Guitars, Drum Kit, Bass, Strings, Euphoric, Up-Lifting, Moody, Flowing, Raw, Epic, Sentimental, 125 BPM” for a 95-second track and will create the results in the YouTube video below. What are your thoughts? Leave your comments below.
Other articles we have written that you may find of interest on the subject of Stability AI and its technologies harnessing the power of artificial intelligence :
Stable Audio’s model architecture consists of a variational autoencoder (VAE), a text encoder, and a U-Net-based conditioned diffusion model. The VAE compresses stereo audio into a data-compressed, noise-resistant, and invertible lossy latent encoding for faster generation and training.
The model is conditioned on text prompts using the frozen text encoder of a CLAP model trained from scratch on the dataset. Timing embeddings are calculated during training time, providing information about the start time and overall duration of the original audio file. These values are translated into per-second discrete learned embeddings and concatenated with the prompt tokens before being passed into the U-Net’s cross-attention layers.
The diffusion model for Stable Audio is a 907M parameter U-Net based on the model used in Moûsai, using a combination of residual layers, self-attention layers, and cross-attention layers to denoise the input conditioned on text and timing embeddings.
The future of generative audio
Stable Audio’s recognition by TIME as one of the best inventions of 2023 is a testament to the potential of generative AI in music and sound generation. As Emad Mostaque, CEO of Stability AI, expressed, the company is excited to use their expertise to support music creators. With Stable Audio, music enthusiasts and creative professionals can generate new content with the help of AI, leading to endless innovations in the field.
Stable Audio is not just an AI music creator; it is a symbol of the transformative power of generative AI. Its recognition on TIME’s Best Inventions of 2023 list is a significant milestone, marking the dawn of a new era in music and sound generation.
“As the only independent, open and multimodal generative AI company, we are thrilled to use our expertise to develop a product in support of music creators,” said Emad Mostaque, CEO of Stability AI. “Our hope is that Stable Audio will empower music enthusiasts and creative professionals to generate new content with the help of AI, and we look forward to the endless innovations it will inspire.”
Tryout Stable Audio for yourself and create music using AI by simply entering prompts such as “Trance, Ibiza, Beach, Sun, 4 AM, Progressive, Synthesizer, 909, Dramatic Chords, Choir, Euphoric, Nostalgic, Dynamic, Flowing”
Filed Under: Guides, Top News
Latest timeswonderful Deals
Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, timeswonderful may earn an affiliate commission. Learn about our Disclosure Policy.