#wavtokenizer — Public Fediverse posts on home.social

michabbb @[email protected] · 2024-11-06 · 14:29 UTC

🎯 #OuteTTS introduces a novel approach to text-to-speech synthesis using pure #languagemodeling
🔧 Built on #LLaMa architecture with just 350M parameters, featuring:

Zero-shot #voicecloning capability
Integration with #WavTokenizer (75 tokens/sec)
Local deployment via #llamacpp
#GGUF format compatibility

🔍 Technical Implementation:

Audio tokenization process
CTC forced alignment
Structured prompt system
Temperature-adjustable outputs

⚠️ Current Limitations:

Limited vocabulary range
String-only input support
Best performance with shorter sentences
Variable temperature sensitivity

https://github.com/edwko/OuteTTS
https://huggingface.co/OuteAI/OuteTTS-0.1-350M

#outetts #languagemodeling #llama #voicecloning #wavtokenizer #llamacpp

michabbb @[email protected] · 2024-11-06 · 14:29 UTC

🎯 #OuteTTS introduces a novel approach to text-to-speech synthesis using pure #languagemodeling
🔧 Built on #LLaMa architecture with just 350M parameters, featuring:

Zero-shot #voicecloning capability
Integration with #WavTokenizer (75 tokens/sec)
Local deployment via #llamacpp
#GGUF format compatibility

🔍 Technical Implementation:

Audio tokenization process
CTC forced alignment
Structured prompt system
Temperature-adjustable outputs

⚠️ Current Limitations:

Limited vocabulary range
String-only input support
Best performance with shorter sentences
Variable temperature sensitivity

https://github.com/edwko/OuteTTS
https://huggingface.co/OuteAI/OuteTTS-0.1-350M

#outetts #languagemodeling #llama #voicecloning #wavtokenizer #llamacpp

michabbb @[email protected] · 2024-11-06 · 14:29 UTC

🎯 #OuteTTS introduces a novel approach to text-to-speech synthesis using pure #languagemodeling
🔧 Built on #LLaMa architecture with just 350M parameters, featuring:

Zero-shot #voicecloning capability
Integration with #WavTokenizer (75 tokens/sec)
Local deployment via #llamacpp
#GGUF format compatibility

🔍 Technical Implementation:

Audio tokenization process
CTC forced alignment
Structured prompt system
Temperature-adjustable outputs

⚠️ Current Limitations:

Limited vocabulary range
String-only input support
Best performance with shorter sentences
Variable temperature sensitivity

https://github.com/edwko/OuteTTS
https://huggingface.co/OuteAI/OuteTTS-0.1-350M

#outetts #languagemodeling #llama #voicecloning #wavtokenizer #llamacpp

michabbb @[email protected] · 2024-11-06 · 14:29 UTC

🎯 #OuteTTS introduces a novel approach to text-to-speech synthesis using pure #languagemodeling
🔧 Built on #LLaMa architecture with just 350M parameters, featuring:

Zero-shot #voicecloning capability
Integration with #WavTokenizer (75 tokens/sec)
Local deployment via #llamacpp
#GGUF format compatibility

🔍 Technical Implementation:

Audio tokenization process
CTC forced alignment
Structured prompt system
Temperature-adjustable outputs

⚠️ Current Limitations:

Limited vocabulary range
String-only input support
Best performance with shorter sentences
Variable temperature sensitivity

https://github.com/edwko/OuteTTS
https://huggingface.co/OuteAI/OuteTTS-0.1-350M

#gguf #llamacpp #wavtokenizer #voicecloning #llama #languagemodeling