
What it does
An all-in-one Gradio WebUI for voice/audio work. Combines multiple AI models into a single interface for creators and developers.
Key capabilities
- TTS: Edge-TTS (Microsoft's free cloud TTS) and Kokoro (local)
- Zero-shot voice cloning: E2 & F5-TTS, CosyVoice - clone a voice from a short sample
- Speech-to-text: Whisper-based transcription
- Vocal isolation: Demucs for separating vocals from music/background
- YouTube download: Pull audio from YouTube for processing
- Translation: Multilingual translation support
Language
Python. Uses Gradio for the web interface.
Install
Clone the repo and run the Gradio app. Requires Python with the usual ML dependencies (torch, transformers, etc.). GPU recommended for real-time voice cloning.
Value
Good for experimenting with voice cloning and TTS without cobbling together separate tools. The zero-shot cloning (E2/F5-TTS) is the most interesting part - you feed it a reference audio clip and it generates speech in that voice. Useful for content creators doing voiceover work, localization demos, or building voice-enabled prototypes.