GitHub Trending Repositories

jamiepine/voicebox

Author: jamiepine

Stars: 802 stars today

Description: The open-source voice synthesis studio

README

Voicebox

Voicebox

The open-source voice synthesis studio.
Clone voices. Generate speech. Apply effects. Build voice-powered apps.
All running locally on your machine.

Downloads Release Stars License

voicebox.shDocsDownloadFeaturesAPI


Voicebox App Screenshot

Click the image above to watch the demo video on voicebox.sh


Voicebox Screenshot 2

Voicebox Screenshot 3


What is Voicebox?

Voicebox is a local-first voice cloning studio — a free and open-source alternative to ElevenLabs. Clone voices from a few seconds of audio, generate speech in 23 languages across 5 TTS engines, apply post-processing effects, and compose multi-voice projects with a timeline editor.


Download

| Platform | Download | | --------------------- | ------------------------------------------------------ | | macOS (Apple Silicon) | Download DMG | | macOS (Intel) | Download DMG | | Windows | Download MSI | | Docker | docker compose up |

View all binaries →

Linux — Pre-built binaries are not yet available. See voicebox.sh/linux-install for build-from-source instructions.


Features

Multi-Engine Voice Cloning

Five TTS engines with different strengths, switchable per-generation:

| Engine | Languages | Strengths | | --------------------------- | --------- | ---------------------------------------------------------------------------------------------------------------------------------------- | | Qwen3-TTS (0.6B / 1.7B) | 10 | High-quality multilingual cloning, delivery instructions ("speak slowly", "whisper") | | LuxTTS | English | Lightweight (~1GB VRAM), 48kHz output, 150x realtime on CPU | | Chatterbox Multilingual | 23 | Broadest language coverage — Arabic, Danish, Finnish, Greek, Hebrew, Hindi, Malay, Norwegian, Polish, Swahili, Swedish, Turkish and more | | Chatterbox Turbo | English | Fast 350M model with paralinguistic emotion/sound tags | | TADA (1B / 3B) | 10 | HumeAI speech-language model — 700s+ coherent audio, text-acoustic dual alignment |

Emotions & Paralinguistic Tags

Type / in the text input to insert expressive tags that the model synthesizes inline with speech (Chatterbox Turbo):

[laugh] [chuckle] [gasp] [cough] [sigh] [groan] [sniff] [shush] [clear throat]

Post-Processing Effects

8 audio effects powered by Spotify's pedalboard library. Apply after generation, preview in real time, build reusable presets.

| Effect | Description | | ---------------- | --------------------------------------------- | | Pitch Shift | Up or down by up to 12 semitones | | Reverb | Configurable room size, damping, wet/dry mix | | Delay | Echo with adjustable time, feedback, and mix | | Chorus / Flanger | Modulated delay for metallic or lush textures | | Compressor | Dynamic range compression | | Gain | Volume adjustment (-40 to +40 dB) | | High-Pass Filter | Remove low frequencies | | Low-Pass Filter | Remove high frequencies |

Ships with 4 built-in presets (Robotic, Radio, Echo Chamber, Deep Voice) and supports custom presets. Effects can be assigned per-profile as defaults.

Unlimited Generation Length

Text is automatically split at sentence boundaries and each chunk is generated independently, then crossfaded together. Works with all engines.

Generation Versions

Every generation supports multiple versions with provenance tracking:

Async Generation Queue

Generation is non-blocking. Submit and immediately start typing the next one.

Voice Profile Management

Stories Editor

Multi-voice timeline editor for conversations, podcasts, and narratives.

Recording & Transcription

Model Management

GPU Support

| Platform | Backend | Notes | | ------------------------ | -------------- | ---------------------------------------------- | | macOS (Apple Silicon) | MLX (Metal) | 4-5x faster via Neural Engine | | Windows / Linux (NVIDIA) | PyTorch (CUDA) | Auto-downloads CUDA binary from within the app | | Linux (AMD) | PyTorch (ROCm) | Auto-configures HSA_OVERRIDE_GFX_VERSION | | Windows (any GPU) | DirectML | Universal Windows GPU support | | Intel Arc | IPEX/XPU | Intel discrete GPU acceleration | | Any | CPU | Works everywhere, just slower |


API

Voicebox exposes a full REST API for integrating voice synthesis into your own apps.

```bash

Generate speech

curl -X POST http://localhost:17493/generate \ -H "Content-Type: application/json" \ -d '{"text": "Hello world", "profile_id": "abc123", "language": "en"}'

List voice profiles

curl http://localhost:17493/profiles

Create a profile

curl -X POST http://localhost:17493/profiles \ -H "Content-Type: application/json" \ -d '{"name": "My Voice", "language": "en"}' ```

Use cases: game dialogue, podcast production, accessibility tools, voice assistants, content automation.

Full API documentation available at http://localhost:17493/docs.


Tech Stack

| Layer | Technology | | ------------- | ------------------------------------------------- | | Desktop App | Tauri (Rust) | | Frontend | React, TypeScript, Tailwind CSS | | State | Zustand, React Query | | Backend | FastAPI (Python) | | TTS Engines | Qwen3-TTS, LuxTTS, Chatterbox, Chatterbox Turbo, TADA | | Effects | Pedalboard (Spotify) | | Transcription | Whisper / Whisper Turbo (PyTorch or MLX) | | Inference | MLX (Apple Silicon) / PyTorch (CUDA/ROCm/XPU/CPU) | | Database | SQLite | | Audio | WaveSurfer.js, librosa |


Roadmap

| Feature | Description | | ----------------------- | ---------------------------------------------- | | Real-time Streaming | Stream audio as it generates, word by word | | Voice Design | Create new voices from text descriptions | | More Models | XTTS, Bark, and other open-source voice models | | Plugin Architecture | Extend with custom models and effects | | Mobile Companion | Control Voicebox from your phone |


Development

See CONTRIBUTING.md for detailed setup and contribution guidelines.

Quick Start

```bash git clone https://github.com/jamiepine/voicebox.git cd voicebox

just setup # creates Python venv, installs all deps just dev # starts backend + desktop app ```

Install just: brew install just or cargo install just. Run just --list to see all commands.

Prerequisites: Bun, Rust, Python 3.11+, Tauri Prerequisites, and Xcode on macOS.

Building Locally

bash just build # Build CPU server binary + Tauri app just build-local # (Windows) Build CPU + CUDA server binaries + Tauri app

Adding New Voice Models

The multi-engine architecture makes adding new TTS engines straightforward. A step-by-step guide covers the full process: dependency research, backend protocol implementation, frontend wiring, and PyInstaller bundling.

The guide is optimized for AI coding agents. An agent skill can pick up a model name and handle the entire integration autonomously — you just test the build locally.

Project Structure

voicebox/ ├── app/ # Shared React frontend ├── tauri/ # Desktop app (Tauri + Rust) ├── web/ # Web deployment ├── backend/ # Python FastAPI server ├── landing/ # Marketing website └── scripts/ # Build & release scripts


Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

  1. Fork the repo
  2. Create a feature branch
  3. Make your changes
  4. Submit a PR

Security

Found a security vulnerability? Please report it responsibly. See SECURITY.md for details.


License

MIT License — see LICENSE for details.


voicebox.sh

File Structure

Unable to fetch file structure.

Back to Trending