Voice Pro

Gradio WebUI combining TTS, voice cloning, Whisper transcription, and vocal isolation.

What it does

An all-in-one Gradio WebUI for voice/audio work. Combines multiple AI models into a single interface for creators and developers.

Key capabilities

  • TTS: Edge-TTS (Microsoft's free cloud TTS) and Kokoro (local)
  • Zero-shot voice cloning: E2 & F5-TTS, CosyVoice - clone a voice from a short sample
  • Speech-to-text: Whisper-based transcription
  • Vocal isolation: Demucs for separating vocals from music/background
  • YouTube download: Pull audio from YouTube for processing
  • Translation: Multilingual translation support

Language

Python. Uses Gradio for the web interface.

Install

Clone the repo and run the Gradio app. Requires Python with the usual ML dependencies (torch, transformers, etc.). GPU recommended for real-time voice cloning.

Value

Good for experimenting with voice cloning and TTS without cobbling together separate tools. The zero-shot cloning (E2/F5-TTS) is the most interesting part - you feed it a reference audio clip and it generates speech in that voice. Useful for content creators doing voiceover work, localization demos, or building voice-enabled prototypes.

links

social