LocalAI - Models

voxcpm-1.5

VoxCPM 1.5 is an end-to-end text-to-speech (TTS) model from ModelBest. It features zero-shot voice cloning and high-quality speech synthesis capabilities.

Links

https://huggingface.co/openbmb/VoxCPM1.5

Tags

neutts-air

NeuTTS Air is the world's first super-realistic, on-device TTS speech language model with instant voice cloning. Built on a 0.5B LLM backbone, it brings natural-sounding speech, real-time performance, and speaker cloning to local devices.

Links

https://github.com/neuphonic/neutts-air

Tags

vllm-omni-qwen3-tts-custom-voice

Qwen3-TTS-12Hz-1.7B-CustomVoice via vLLM-Omni - Text-to-speech model from Alibaba Qwen team with custom voice cloning capabilities. Generates natural-sounding speech with voice personalization.

Links

https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice

Tags

qwen3-tts-1.7b-custom-voice

Qwen3-TTS is a high-quality text-to-speech model supporting custom voice, voice design, and voice cloning.

Links

https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice

Tags

qwen3-tts-0.6b-custom-voice

Qwen3-TTS is a high-quality text-to-speech model supporting custom voice, voice design, and voice cloning.

Links

https://huggingface.co/Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice

Tags

vibevoice

Links

https://github.com/microsoft/VibeVoice

Tags

pocket-tts

Links

https://github.com/kyutai-labs/pocket-tts

Tags

kokoro

Kokoro is an open-weight TTS model with 82 million parametrs. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.

Links

https://github.com/hexgrad/kokoro

Tags

kitten-tts

Kitten TTS is an open-source realistic text-to-speech model with just 15 million parameters, designed for lightweight deployment and high-quality voice synthesis.

Links

https://github.com/KittenML/KittenTTS

Tags

chatterbox

Chatterbox, Resemble AI's first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.

Links

https://github.com/resemble-ai/chatterbox

Tags

dia

Links

Tags

outetts

Links

https://github.com/edwko/OuteTTS

Tags

parler-tts-mini-v0.1

Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc). It is a reproduction of work from the paper Natural language guidance of high-fidelity text-to-speech with synthetic annotations by Dan Lyth and Simon King, from Stability AI and Edinburgh University respectively.

Links

https://github.com/huggingface/parler-tts

Tags