#qwen3_tts — Public Fediverse posts on home.social

hasamba @[email protected] · 2026-03-10 · 11:26 UTC

----------------

🎛️ Tool — Voicebox

Voicebox is an open-source, local-first desktop studio for voice cloning and speech synthesis. The project centers on providing on-device voice model downloads, rapid voice cloning from a few seconds of audio, and a digital-audio-workstation style editor for composing multi-voice projects. Key implementation details disclosed by the release: the primary synthesis backend is Qwen3-TTS, the application is built with Tauri (Rust) rather than Electron, and an MLX backend provides native Metal acceleration to speed inference on Apple Silicon.

Architecture and capabilities
• The application exposes an API-first surface for integration with other projects while retaining a full desktop UI for studio-style workflows.
• Voice modeling: Qwen3-TTS is used for high-fidelity cloning; the project notes planned support for additional models such as XTTS and Bark.
• Editing features: multi-track timeline, audio trimming, conversation mixing, and per-voice profiles derived from short audio samples.
• Platform builds: packaged releases were provided for macOS (Apple Silicon and Intel) and Windows; Linux builds are noted as forthcoming but blocked by CI runner disk space constraints.

Performance and privacy
• Local-first design keeps models and audio data on-device, avoiding cloud-based storage or inference by default.
• On Apple Silicon, MLX with Metal acceleration is reported to deliver multiple-fold faster inference compared with generic paths, improving responsiveness for generation and cloning workflows.

Limitations and scope
• Current model support centers on Qwen3-TTS; multi-model support is listed as planned but not yet available.
• Linux availability is pending; CI limitations were explicitly cited as the blocker for those builds.
• No cloud collaboration or managed hosting is part of the announced feature set; the project emphasizes offline, local operation.

This release documents a desktop-focused, privacy-oriented approach to voice cloning with clear statements on model choice, runtime acceleration, and editing capabilities. #voicebox #qwen3_tts #TTS #voice_cloning

🔗 Source: https://github.com/jamiepine/voicebox?tab=readme-ov-file

#voice_cloning #tts #qwen3_tts #voicebox