What's new
AI Arts Forum :: Talking about Generative AI Art Tools

Whether you're already in love with Generative AI, just getting started, or wanting to learn more, you're in the right place!

Register for a FREE account today and join the conversation.

Stable Diffusion Stability.ai Update: Stable Audio Open Research Paper Released

Read the Full Research Paper

Key Takeaways:
  • Stable Audio Open research paper details the architecture and training of Stability AI’s text-to-audio model.
  • The model weights are openly available on Hugging Face under the Stability AI Community License.
  • Can generate high-quality stereo audio at 44.1kHz based on text prompts.
  • Operates on consumer-grade GPUs, suitable for academic and artistic purposes.
  • Trained on nearly 500,000 Creative Commons recordings.

Following the release of Stable Audio Open, Stability AI has published the research paper outlining the technical details of the model. The model, trained with Creative Commons data, generates high-quality stereo audio at 44.1kHz using an autoencoder, a T5-based text embedding, and a transformer-based diffusion model. The weights are available on Hugging Face and can be used for various applications, from sound design to academic projects. Stability AI emphasizes data transparency and accessibility, making this a significant milestone in open-source audio AI.

Architecture

Stable Audio Open combines the following components to generate variable-length stereo audio at 44.1kHz:
  • An autoencoder for compressing waveforms.
  • A T5-based text embedding for text conditioning.
  • A transformer-based diffusion model operating in the autoencoder's latent space.
The architecture is comparable to Stable Audio 2.0 but utilizes a different dataset and T5 text conditioning.

Training Data

The model was trained on approximately 500,000 recordings from Freesound and the Free Music Archive, all licensed under CC-0, CC-BY, or CC-Sampling+. Rigorous data curation ensured the exclusion of copyrighted material.

Use Cases

Stable Audio Open provides various applications, suitable for customization and integration into workflows. Here are a few key examples:

Sound Design
  • Sound Effects and Foley: Generating sound effects for film, TV, and games.
  • Ambient Sounds: Crafting background textures for different scenes.
  • Sample Creation: Producing drum loops and music samples.

Commercial and Marketing Applications
  • Audio Branding: Creating unique sound logos and effects for branding.

Education and Research
  • Academic Projects: Facilitating research in audio synthesis and machine learning.

Conclusion

The release of Stable Audio Open is a significant development in the realm of open-source audio AI, providing accessible and high-performance audio generation tools. It encourages exploration and innovation for researchers, developers, and artists. The model weights can be found on Hugging Face, and further insights and community contributions are welcomed through Stability AI’s social media platforms and Discord community.
 
Back