Voice Pro | Nic's notes

Table of Contents

What it does
Key capabilities
Language
Install
Value

abus-aikorea/voice-pro

Gradio WebUI for TTS, voice cloning, Whisper, and vocal isolation

https://github.com/abus-aikorea/voice-pro

What it does

An all-in-one Gradio WebUI for voice/audio work. Combines multiple AI models into a single interface for creators and developers.

Key capabilities

TTS: Edge-TTS (Microsoft's free cloud TTS) and Kokoro (local)
Zero-shot voice cloning: E2 & F5-TTS, CosyVoice - clone a voice from a short sample
Speech-to-text: Whisper-based transcription
Vocal isolation: Demucs for separating vocals from music/background
YouTube download: Pull audio from YouTube for processing
Translation: Multilingual translation support

Language

Python. Uses Gradio for the web interface.

Install

Clone the repo and run the Gradio app. Requires Python with the usual ML dependencies (torch, transformers, etc.). GPU recommended for real-time voice cloning.

Value

Good for experimenting with voice cloning and TTS without cobbling together separate tools. The zero-shot cloning (E2/F5-TTS) is the most interesting part - you feed it a reference audio clip and it generates speech in that voice. Useful for content creators doing voiceover work, localization demos, or building voice-enabled prototypes.