Author: jamiepine
Stars: 802 stars today
Description: The open-source voice synthesis studio
The open-source voice synthesis studio.
Clone voices. Generate speech. Apply effects. Build voice-powered apps.
All running locally on your machine.
voicebox.sh • Docs • Download • Features • API
Click the image above to watch the demo video on voicebox.sh
Voicebox is a local-first voice cloning studio — a free and open-source alternative to ElevenLabs. Clone voices from a few seconds of audio, generate speech in 23 languages across 5 TTS engines, apply post-processing effects, and compose multi-voice projects with a timeline editor.
[laugh], [sigh], [gasp] via Chatterbox Turbo| Platform | Download |
| --------------------- | ------------------------------------------------------ |
| macOS (Apple Silicon) | Download DMG |
| macOS (Intel) | Download DMG |
| Windows | Download MSI |
| Docker | docker compose up |
Linux — Pre-built binaries are not yet available. See voicebox.sh/linux-install for build-from-source instructions.
Five TTS engines with different strengths, switchable per-generation:
| Engine | Languages | Strengths | | --------------------------- | --------- | ---------------------------------------------------------------------------------------------------------------------------------------- | | Qwen3-TTS (0.6B / 1.7B) | 10 | High-quality multilingual cloning, delivery instructions ("speak slowly", "whisper") | | LuxTTS | English | Lightweight (~1GB VRAM), 48kHz output, 150x realtime on CPU | | Chatterbox Multilingual | 23 | Broadest language coverage — Arabic, Danish, Finnish, Greek, Hebrew, Hindi, Malay, Norwegian, Polish, Swahili, Swedish, Turkish and more | | Chatterbox Turbo | English | Fast 350M model with paralinguistic emotion/sound tags | | TADA (1B / 3B) | 10 | HumeAI speech-language model — 700s+ coherent audio, text-acoustic dual alignment |
Type / in the text input to insert expressive tags that the model synthesizes inline with speech (Chatterbox Turbo):
[laugh] [chuckle] [gasp] [cough] [sigh] [groan] [sniff] [shush] [clear throat]
8 audio effects powered by Spotify's pedalboard library. Apply after generation, preview in real time, build reusable presets.
| Effect | Description | | ---------------- | --------------------------------------------- | | Pitch Shift | Up or down by up to 12 semitones | | Reverb | Configurable room size, damping, wet/dry mix | | Delay | Echo with adjustable time, feedback, and mix | | Chorus / Flanger | Modulated delay for metallic or lush textures | | Compressor | Dynamic range compression | | Gain | Volume adjustment (-40 to +40 dB) | | High-Pass Filter | Remove low frequencies | | Low-Pass Filter | Remove high frequencies |
Ships with 4 built-in presets (Robotic, Radio, Echo Chamber, Deep Voice) and supports custom presets. Effects can be assigned per-profile as defaults.
Text is automatically split at sentence boundaries and each chunk is generated independently, then crossfaded together. Works with all engines.
[tags]Every generation supports multiple versions with provenance tracking:
Generation is non-blocking. Submit and immediately start typing the next one.
Multi-voice timeline editor for conversations, podcasts, and narratives.
VOICEBOX_MODELS_DIR| Platform | Backend | Notes | | ------------------------ | -------------- | ---------------------------------------------- | | macOS (Apple Silicon) | MLX (Metal) | 4-5x faster via Neural Engine | | Windows / Linux (NVIDIA) | PyTorch (CUDA) | Auto-downloads CUDA binary from within the app | | Linux (AMD) | PyTorch (ROCm) | Auto-configures HSA_OVERRIDE_GFX_VERSION | | Windows (any GPU) | DirectML | Universal Windows GPU support | | Intel Arc | IPEX/XPU | Intel discrete GPU acceleration | | Any | CPU | Works everywhere, just slower |
Voicebox exposes a full REST API for integrating voice synthesis into your own apps.
```bash
curl -X POST http://localhost:17493/generate \ -H "Content-Type: application/json" \ -d '{"text": "Hello world", "profile_id": "abc123", "language": "en"}'
curl http://localhost:17493/profiles
curl -X POST http://localhost:17493/profiles \ -H "Content-Type: application/json" \ -d '{"name": "My Voice", "language": "en"}' ```
Use cases: game dialogue, podcast production, accessibility tools, voice assistants, content automation.
Full API documentation available at http://localhost:17493/docs.
| Layer | Technology | | ------------- | ------------------------------------------------- | | Desktop App | Tauri (Rust) | | Frontend | React, TypeScript, Tailwind CSS | | State | Zustand, React Query | | Backend | FastAPI (Python) | | TTS Engines | Qwen3-TTS, LuxTTS, Chatterbox, Chatterbox Turbo, TADA | | Effects | Pedalboard (Spotify) | | Transcription | Whisper / Whisper Turbo (PyTorch or MLX) | | Inference | MLX (Apple Silicon) / PyTorch (CUDA/ROCm/XPU/CPU) | | Database | SQLite | | Audio | WaveSurfer.js, librosa |
| Feature | Description | | ----------------------- | ---------------------------------------------- | | Real-time Streaming | Stream audio as it generates, word by word | | Voice Design | Create new voices from text descriptions | | More Models | XTTS, Bark, and other open-source voice models | | Plugin Architecture | Extend with custom models and effects | | Mobile Companion | Control Voicebox from your phone |
See CONTRIBUTING.md for detailed setup and contribution guidelines.
```bash git clone https://github.com/jamiepine/voicebox.git cd voicebox
just setup # creates Python venv, installs all deps just dev # starts backend + desktop app ```
Install just: brew install just or cargo install just. Run just --list to see all commands.
Prerequisites: Bun, Rust, Python 3.11+, Tauri Prerequisites, and Xcode on macOS.
bash
just build # Build CPU server binary + Tauri app
just build-local # (Windows) Build CPU + CUDA server binaries + Tauri app
The multi-engine architecture makes adding new TTS engines straightforward. A step-by-step guide covers the full process: dependency research, backend protocol implementation, frontend wiring, and PyInstaller bundling.
The guide is optimized for AI coding agents. An agent skill can pick up a model name and handle the entire integration autonomously — you just test the build locally.
voicebox/
├── app/ # Shared React frontend
├── tauri/ # Desktop app (Tauri + Rust)
├── web/ # Web deployment
├── backend/ # Python FastAPI server
├── landing/ # Marketing website
└── scripts/ # Build & release scripts
Contributions welcome! See CONTRIBUTING.md for guidelines.
Found a security vulnerability? Please report it responsibly. See SECURITY.md for details.
MIT License — see LICENSE for details.
Unable to fetch file structure.