Skip to content

Akshitha0118/Prompt-to-Audio-Generative-AI-Sound-Creator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Prompt-to-Audio-Generative-AI-Sound-Creator

Generate realistic sound effects from simple text prompts using Stable Audio, Hugging Face Diffusers, and Gradio.

Prompt2Audio is a Generative AI project that converts natural language descriptions into high-quality audio clips. By leveraging diffusion-based audio models, users can create sound effects such as rain, hammer strikes, environmental sounds, and more with just a text input.


πŸš€ Project Overview

Prompt2Audio demonstrates how Generative AI can transform text into audio using state-of-the-art diffusion models.

Users simply enter a description like:

"Sound of rain falling on a metal roof during a storm."

The model generates a realistic audio waveform based on the prompt.

The project also includes an interactive Gradio interface, allowing users to easily experiment with different sound prompts and durations.


πŸ›  Tech Stack

  • Python
  • PyTorch
  • Hugging Face Diffusers
  • Stable Audio Open 1.0
  • Gradio
  • Google Colab (GPU)
  • SoundFile

βš™οΈ How It Works

  1. Mount Google Drive in Google Colab
  2. Authenticate with Hugging Face
  3. Install required dependencies
  4. Load the StableAudioPipeline model
  5. Provide a text prompt describing a sound
  6. The diffusion model generates an audio waveform
  7. Save the generated output as a .wav file
  8. Play the audio using the Gradio interface

🎧 Example Prompt

Prompt:
"The sound of a hammer hitting a wooden surface."

Negative Prompt:
"Low quality"

Output β†’ A 10-second realistic hammer sound effect.


▢️ Run the Application

Start the Gradio interface:

python app.py

This will launch a local web interface where you can generate sounds from text prompts.


🎚 Features

  • Generate realistic audio from text
  • Adjustable audio duration (1–20 seconds)
  • Negative prompts for better output control
  • Interactive web interface using Gradio
  • GPU accelerated inference

πŸ“Έ Demo Interface

Users can:

  • Enter a sound description
  • Add a negative prompt
  • Adjust audio duration
  • Generate and listen to the sound instantly

πŸ“š Key Learnings

  • Diffusion models for audio generation
  • Using Hugging Face pipelines
  • Building AI interfaces with Gradio
  • Running generative models on GPU

🀝 Contributing

Contributions are welcome!

If you'd like to improve this project:

  1. Fork the repository
  2. Create a new branch
  3. Submit a pull request

πŸ‘¨β€πŸ’» Author

AKSHITHA HIRAKARI

AI / Machine Learning Enthusiast Passionate about building Generative AI applications

About

Prompt2Audio is a Generative AI project that converts natural language descriptions into high-quality audio clips

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages