#deepspeech — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #deepspeech, aggregated by home.social.
-
#Mozilla stellt #DeepSpeech ein.
DeepSpeech ist eine quelloffene, lokale #Sprache-zu-Text-Engine, die mit #Machine #Learning unter Verwendung des #TensorFlow-Frameworks erstellt wurde. Es ist in Verbindung mit der ebenfalls von #Mozilla zusammengestellten Stimmdatenbank Common Voice Corpus bereits auf einem Raspberry Pi 4 in Echtzeit lauffähig.
-
#Mozilla stellt #DeepSpeech ein.
DeepSpeech ist eine quelloffene, lokale #Sprache-zu-Text-Engine, die mit #Machine #Learning unter Verwendung des #TensorFlow-Frameworks erstellt wurde. Es ist in Verbindung mit der ebenfalls von #Mozilla zusammengestellten Stimmdatenbank Common Voice Corpus bereits auf einem Raspberry Pi 4 in Echtzeit lauffähig.
-
#Mozilla stellt #DeepSpeech ein.
DeepSpeech ist eine quelloffene, lokale #Sprache-zu-Text-Engine, die mit #Machine #Learning unter Verwendung des #TensorFlow-Frameworks erstellt wurde. Es ist in Verbindung mit der ebenfalls von #Mozilla zusammengestellten Stimmdatenbank Common Voice Corpus bereits auf einem Raspberry Pi 4 in Echtzeit lauffähig.
-
#Mozilla stellt #DeepSpeech ein.
DeepSpeech ist eine quelloffene, lokale #Sprache-zu-Text-Engine, die mit #Machine #Learning unter Verwendung des #TensorFlow-Frameworks erstellt wurde. Es ist in Verbindung mit der ebenfalls von #Mozilla zusammengestellten Stimmdatenbank Common Voice Corpus bereits auf einem Raspberry Pi 4 in Echtzeit lauffähig.
-
#Mozilla stellt #DeepSpeech ein.
DeepSpeech ist eine quelloffene, lokale #Sprache-zu-Text-Engine, die mit #Machine #Learning unter Verwendung des #TensorFlow-Frameworks erstellt wurde. Es ist in Verbindung mit der ebenfalls von #Mozilla zusammengestellten Stimmdatenbank Common Voice Corpus bereits auf einem Raspberry Pi 4 in Echtzeit lauffähig.
-
#Mozilla Formally Discontinues Its #DeepSpeech Project
#MozillaDeepSpeech was a #speechtotext engine with great performance for real-time communication even when running on #RaspberryPi and other low-power systems.
Mozilla discontinuing DeepSpeech sadly doesn't as surprise. Last tagged release was 0.9.3 back in December 2020 and there hadn't been any Git activity since 2021.
Even in 2020 DeepSpeech was considered at risk of ceasing development following Mozilla layoffs.
https://www.phoronix.com/news/Mozilla-DeepSpeech-Discontinued -
#Mozilla Formally Discontinues Its #DeepSpeech Project
#MozillaDeepSpeech was a #speechtotext engine with great performance for real-time communication even when running on #RaspberryPi and other low-power systems.
Mozilla discontinuing DeepSpeech sadly doesn't as surprise. Last tagged release was 0.9.3 back in December 2020 and there hadn't been any Git activity since 2021.
Even in 2020 DeepSpeech was considered at risk of ceasing development following Mozilla layoffs.
https://www.phoronix.com/news/Mozilla-DeepSpeech-Discontinued -
#Mozilla Formally Discontinues Its #DeepSpeech Project
#MozillaDeepSpeech was a #speechtotext engine with great performance for real-time communication even when running on #RaspberryPi and other low-power systems.
Mozilla discontinuing DeepSpeech sadly doesn't as surprise. Last tagged release was 0.9.3 back in December 2020 and there hadn't been any Git activity since 2021.
Even in 2020 DeepSpeech was considered at risk of ceasing development following Mozilla layoffs.
https://www.phoronix.com/news/Mozilla-DeepSpeech-Discontinued -
#Mozilla Formally Discontinues Its #DeepSpeech Project
#MozillaDeepSpeech was a #speechtotext engine with great performance for real-time communication even when running on #RaspberryPi and other low-power systems.
Mozilla discontinuing DeepSpeech sadly doesn't as surprise. Last tagged release was 0.9.3 back in December 2020 and there hadn't been any Git activity since 2021.
Even in 2020 DeepSpeech was considered at risk of ceasing development following Mozilla layoffs.
https://www.phoronix.com/news/Mozilla-DeepSpeech-Discontinued -
#Mozilla Formally Discontinues Its #DeepSpeech Project
#MozillaDeepSpeech was a #speechtotext engine with great performance for real-time communication even when running on #RaspberryPi and other low-power systems.
Mozilla discontinuing DeepSpeech sadly doesn't as surprise. Last tagged release was 0.9.3 back in December 2020 and there hadn't been any Git activity since 2021.
Even in 2020 DeepSpeech was considered at risk of ceasing development following Mozilla layoffs.
https://www.phoronix.com/news/Mozilla-DeepSpeech-Discontinued -
#Mozilla Formally Discontinues Its #DeepSpeech speech-to-text Project - Slashdot 😱 😭
-
#Mozilla Formally Discontinues Its #DeepSpeech speech-to-text Project - Slashdot 😱 😭
-
#Mozilla Formally Discontinues Its #DeepSpeech speech-to-text Project - Slashdot 😱 😭
-
#Mozilla Formally Discontinues Its #DeepSpeech speech-to-text Project - Slashdot 😱 😭
-
#Mozilla Formally Discontinues Its #DeepSpeech speech-to-text Project - Slashdot 😱 😭
-
Mozilla interrompe lo sviluppo di DeepSpeech
https://www.marcosbox.com/2025/06/25/mozilla-interrompe-lo-sviluppo-di-deepspeech/
-
Mozilla interrompe lo sviluppo di DeepSpeech
https://www.marcosbox.com/2025/06/25/mozilla-interrompe-lo-sviluppo-di-deepspeech/
-
Mozilla interrompe lo sviluppo di DeepSpeech
https://www.marcosbox.com/2025/06/25/mozilla-interrompe-lo-sviluppo-di-deepspeech/
-
Mozilla interrompe lo sviluppo di DeepSpeech
https://www.marcosbox.com/2025/06/25/mozilla-interrompe-lo-sviluppo-di-deepspeech/
-
🌟✨BREAKING NEWS✨🌟 Mozilla's #DeepSpeech is so cutting-edge that it's been cut entirely! 😂 Now you can enjoy the sound of silence on your Raspberry Pi 4 without the distraction of real-time speech-to-text. Maybe next time they'll invent something that doesn't get #discontinued faster than you can say "GitHub Copilot"! 🚀
https://github.com/mozilla/DeepSpeech #Mozilla #RaspberryPi #TechNews #SpeechToText #HackerNews #ngated -
🌟✨BREAKING NEWS✨🌟 Mozilla's #DeepSpeech is so cutting-edge that it's been cut entirely! 😂 Now you can enjoy the sound of silence on your Raspberry Pi 4 without the distraction of real-time speech-to-text. Maybe next time they'll invent something that doesn't get #discontinued faster than you can say "GitHub Copilot"! 🚀
https://github.com/mozilla/DeepSpeech #Mozilla #RaspberryPi #TechNews #SpeechToText #HackerNews #ngated -
🌟✨BREAKING NEWS✨🌟 Mozilla's #DeepSpeech is so cutting-edge that it's been cut entirely! 😂 Now you can enjoy the sound of silence on your Raspberry Pi 4 without the distraction of real-time speech-to-text. Maybe next time they'll invent something that doesn't get #discontinued faster than you can say "GitHub Copilot"! 🚀
https://github.com/mozilla/DeepSpeech #Mozilla #RaspberryPi #TechNews #SpeechToText #HackerNews #ngated -
🌟✨BREAKING NEWS✨🌟 Mozilla's #DeepSpeech is so cutting-edge that it's been cut entirely! 😂 Now you can enjoy the sound of silence on your Raspberry Pi 4 without the distraction of real-time speech-to-text. Maybe next time they'll invent something that doesn't get #discontinued faster than you can say "GitHub Copilot"! 🚀
https://github.com/mozilla/DeepSpeech #Mozilla #RaspberryPi #TechNews #SpeechToText #HackerNews #ngated -
🌟✨BREAKING NEWS✨🌟 Mozilla's #DeepSpeech is so cutting-edge that it's been cut entirely! 😂 Now you can enjoy the sound of silence on your Raspberry Pi 4 without the distraction of real-time speech-to-text. Maybe next time they'll invent something that doesn't get #discontinued faster than you can say "GitHub Copilot"! 🚀
https://github.com/mozilla/DeepSpeech #Mozilla #RaspberryPi #TechNews #SpeechToText #HackerNews #ngated -
"Coqui, a conversational AI startup, on Wednesday (January 3, 2023), announced that it is shutting down its operation ...
[It] specialises in building open source models and applications in the area of quick voice cloning, text-to-voice, etc. The former employees of Mozilla, left the company after it stopped developing their own Speech-to-text engine, DeepSpeech to begin Coqui.”
#KLKrithika, 2024
#translation #MachineTranslation #ASR #TTS #Coqui #Mozilla #DeepSpeech
-
"Coqui, a conversational AI startup, on Wednesday (January 3, 2023), announced that it is shutting down its operation ...
[It] specialises in building open source models and applications in the area of quick voice cloning, text-to-voice, etc. The former employees of Mozilla, left the company after it stopped developing their own Speech-to-text engine, DeepSpeech to begin Coqui.”
#KLKrithika, 2024
#translation #MachineTranslation #ASR #TTS #Coqui #Mozilla #DeepSpeech
-
"Coqui, a conversational AI startup, on Wednesday (January 3, 2023), announced that it is shutting down its operation ...
[It] specialises in building open source models and applications in the area of quick voice cloning, text-to-voice, etc. The former employees of Mozilla, left the company after it stopped developing their own Speech-to-text engine, DeepSpeech to begin Coqui.”
#KLKrithika, 2024
#translation #MachineTranslation #ASR #TTS #Coqui #Mozilla #DeepSpeech
-
"Coqui, a conversational AI startup, on Wednesday (January 3, 2023), announced that it is shutting down its operation ...
[It] specialises in building open source models and applications in the area of quick voice cloning, text-to-voice, etc. The former employees of Mozilla, left the company after it stopped developing their own Speech-to-text engine, DeepSpeech to begin Coqui.”
#KLKrithika, 2024
#translation #MachineTranslation #ASR #TTS #Coqui #Mozilla #DeepSpeech
-
Common Voice is a project by Mozilla to build an extensive ethically-sourced dataset of spoken word in various languages to help push forward open-source voice recognition technology like DeepVoice (also by Mozilla).
I just recorded a dozen or so sentences :)
https://commonvoice.mozilla.org/
#OpenSource #SpeechRecognition #SpeechToText #Mozilla #DeepSpeech
-
Common Voice is a project by Mozilla to build an extensive ethically-sourced dataset of spoken word in various languages to help push forward open-source voice recognition technology like DeepVoice (also by Mozilla).
I just recorded a dozen or so sentences :)
https://commonvoice.mozilla.org/
#OpenSource #SpeechRecognition #SpeechToText #Mozilla #DeepSpeech
-
Common Voice is a project by Mozilla to build an extensive ethically-sourced dataset of spoken word in various languages to help push forward open-source voice recognition technology like DeepVoice (also by Mozilla).
I just recorded a dozen or so sentences :)
https://commonvoice.mozilla.org/
#OpenSource #SpeechRecognition #SpeechToText #Mozilla #DeepSpeech
-
Common Voice is a project by Mozilla to build an extensive ethically-sourced dataset of spoken word in various languages to help push forward open-source voice recognition technology like DeepVoice (also by Mozilla).
I just recorded a dozen or so sentences :)
https://commonvoice.mozilla.org/
#OpenSource #SpeechRecognition #SpeechToText #Mozilla #DeepSpeech
-
Common Voice is a project by Mozilla to build an extensive ethically-sourced dataset of spoken word in various languages to help push forward open-source voice recognition technology like DeepVoice (also by Mozilla).
I just recorded a dozen or so sentences :)
https://commonvoice.mozilla.org/
#OpenSource #SpeechRecognition #SpeechToText #Mozilla #DeepSpeech
-
je me relance dans mes investigations Speech-to-text et Text-to-speech. Bizarrement c'est quelque chose qui revient régulièrement. Est ce que des gens savent ce qu'est devenu #DeepSpeech et si #MozillaVoice est toujours maintenu?
De mon côté je me base sur #mycroft mais j'ai le sentiment que le projet est un peu à l'arrêt mais je me trompe peut-être. -
If you have to do Speech-to-Text and Text-to-Speech tasks and don't want to send your data to the Internet, I recommend you to try Speech Note (Linux desktop app).
It is easy to use, works offline and supports 57 languages!
Speech Note works thanks to powerful #STT and #TTS engines underneath: #DeepSpeech #Coqui #Vosk #Whisper #Piper #eSpeak #MBROLA #RHVoice
You can download #SpeechNote from #Flathub: https://flathub.org/apps/net.mkiol.SpeechNote
Video demo: https://youtu.be/EhUPvaHvssw
-
If you have to do Speech-to-Text and Text-to-Speech tasks and don't want to send your data to the Internet, I recommend you to try Speech Note (Linux desktop app).
It is easy to use, works offline and supports 57 languages!
Speech Note works thanks to powerful #STT and #TTS engines underneath: #DeepSpeech #Coqui #Vosk #Whisper #Piper #eSpeak #MBROLA #RHVoice
You can download #SpeechNote from #Flathub: https://flathub.org/apps/net.mkiol.SpeechNote
Video demo: https://youtu.be/EhUPvaHvssw
-
If you have to do Speech-to-Text and Text-to-Speech tasks and don't want to send your data to the Internet, I recommend you to try Speech Note (Linux desktop app).
It is easy to use, works offline and supports 57 languages!
Speech Note works thanks to powerful #STT and #TTS engines underneath: #DeepSpeech #Coqui #Vosk #Whisper #Piper #eSpeak #MBROLA #RHVoice
You can download #SpeechNote from #Flathub: https://flathub.org/apps/net.mkiol.SpeechNote
Video demo: https://youtu.be/EhUPvaHvssw
-
If you have to do Speech-to-Text and Text-to-Speech tasks and don't want to send your data to the Internet, I recommend you to try Speech Note (Linux desktop app).
It is easy to use, works offline and supports 57 languages!
Speech Note works thanks to powerful #STT and #TTS engines underneath: #DeepSpeech #Coqui #Vosk #Whisper #Piper #eSpeak #MBROLA #RHVoice
You can download #SpeechNote from #Flathub: https://flathub.org/apps/net.mkiol.SpeechNote
Video demo: https://youtu.be/EhUPvaHvssw
-
If you have to do Speech-to-Text and Text-to-Speech tasks and don't want to send your data to the Internet, I recommend you to try Speech Note (Linux desktop app).
It is easy to use, works offline and supports 57 languages!
Speech Note works thanks to powerful #STT and #TTS engines underneath: #DeepSpeech #Coqui #Vosk #Whisper #Piper #eSpeak #MBROLA #RHVoice
You can download #SpeechNote from #Flathub: https://flathub.org/apps/net.mkiol.SpeechNote
Video demo: https://youtu.be/EhUPvaHvssw
-
It's good that other people are also bringing up the elephant in the room: why do you need to pay money for one more electronic gadget that listens to you 24/7, when voice assistants aren't supposed to be rocket science in 2023 anymore? https://news.ycombinator.com/item?id=35857631
I wrote two articles on how to build custom #VoiceAssistants using just a Raspberry Pi and a microphone, one in 2019 https://blog.platypush.tech/article/Build-your-customizable-voice-assistant-with-Platypush and one in 2020 https://blog.platypush.tech/article/Build-custom-voice-assistants.
It's definitely doable and I still have my own custom assistants in the house. However, I had to get around with a #Snowboy model for hotword detection (and Snowboy is now basically abandoned), Mozilla #DeepSpeech model for speech-to-text (and that's quite heavy), and #Mycroft's mimic3 text-to-speech model (and Mycroft is now basically bankrupt). Then writing the integration is relatively easy - I used #Platypush, but it can definitely be done with Home Assistant and OpenHAB too.Compared to 3-4 years ago, I think we're now in a state where the content is no longer the issue (just plug into a LLM, and all of your text requests will get an answer), nor integrations are a problem (just write a Platypush event hook on speech detected, and you can connect it to everything, no need for "Works with Google/Alexa" labels). Text-to-speech synthesis has also become cheap and ubiquitous.
But the hotword detection and speech-to-text models are still IMHO the bottleneck. Hotword detection is a field where you need a very small and lightweight model that only detects a specific word or phrase in a very reliable way. Snowboy was an amazing FOSS project - which also came with this cool idea of "crowd-funded models", where in order to download a model for a certain hotword you were first supposed to provide three audio tracks where you say that word in order to improve the model. But it's now discontinued because it cost the volunteers too much to run the infra.
And Mozilla DeepSpeech is a relatively good choice for general-purpose speech-to-text, but it's heavy (it takes 100% of the CPU when it runs on a Raspberry Pi) and it's mostly optimized for English - even support for other Western languages is patchy. OpenAI's recent Whisper model seems like a solid alternative, but it's also plagued by the 100% CPU issue - also, I no longer trust anything that comes from OpenAI, no matter how noble some of their efforts may look.
If there are other open-source alternatives that solve these problems, I'd be very happy to learn about them. Once these blockers are removed, there should be really no reason for anyone to feed their audio streams to Google or Amazon.
In the meantime, I'm planning to spend some time playing with some self-hosted LLM model to see if I can replace the Google Assistant library on the last Raspberry Pi that runs it in my home.
-
It's good that other people are also bringing up the elephant in the room: why do you need to pay money for one more electronic gadget that listens to you 24/7, when voice assistants aren't supposed to be rocket science in 2023 anymore? https://news.ycombinator.com/item?id=35857631
I wrote two articles on how to build custom #VoiceAssistants using just a Raspberry Pi and a microphone, one in 2019 https://blog.platypush.tech/article/Build-your-customizable-voice-assistant-with-Platypush and one in 2020 https://blog.platypush.tech/article/Build-custom-voice-assistants.
It's definitely doable and I still have my own custom assistants in the house. However, I had to get around with a #Snowboy model for hotword detection (and Snowboy is now basically abandoned), Mozilla #DeepSpeech model for speech-to-text (and that's quite heavy), and #Mycroft's mimic3 text-to-speech model (and Mycroft is now basically bankrupt). Then writing the integration is relatively easy - I used #Platypush, but it can definitely be done with Home Assistant and OpenHAB too.Compared to 3-4 years ago, I think we're now in a state where the content is no longer the issue (just plug into a LLM, and all of your text requests will get an answer), nor integrations are a problem (just write a Platypush event hook on speech detected, and you can connect it to everything, no need for "Works with Google/Alexa" labels). Text-to-speech synthesis has also become cheap and ubiquitous.
But the hotword detection and speech-to-text models are still IMHO the bottleneck. Hotword detection is a field where you need a very small and lightweight model that only detects a specific word or phrase in a very reliable way. Snowboy was an amazing FOSS project - which also came with this cool idea of "crowd-funded models", where in order to download a model for a certain hotword you were first supposed to provide three audio tracks where you say that word in order to improve the model. But it's now discontinued because it cost the volunteers too much to run the infra.
And Mozilla DeepSpeech is a relatively good choice for general-purpose speech-to-text, but it's heavy (it takes 100% of the CPU when it runs on a Raspberry Pi) and it's mostly optimized for English - even support for other Western languages is patchy. OpenAI's recent Whisper model seems like a solid alternative, but it's also plagued by the 100% CPU issue - also, I no longer trust anything that comes from OpenAI, no matter how noble some of their efforts may look.
If there are other open-source alternatives that solve these problems, I'd be very happy to learn about them. Once these blockers are removed, there should be really no reason for anyone to feed their audio streams to Google or Amazon.
In the meantime, I'm planning to spend some time playing with some self-hosted LLM model to see if I can replace the Google Assistant library on the last Raspberry Pi that runs it in my home.
-
It's good that other people are also bringing up the elephant in the room: why do you need to pay money for one more electronic gadget that listens to you 24/7, when voice assistants aren't supposed to be rocket science in 2023 anymore? https://news.ycombinator.com/item?id=35857631
I wrote two articles on how to build custom #VoiceAssistants using just a Raspberry Pi and a microphone, one in 2019 https://blog.platypush.tech/article/Build-your-customizable-voice-assistant-with-Platypush and one in 2020 https://blog.platypush.tech/article/Build-custom-voice-assistants.
It's definitely doable and I still have my own custom assistants in the house. However, I had to get around with a #Snowboy model for hotword detection (and Snowboy is now basically abandoned), Mozilla #DeepSpeech model for speech-to-text (and that's quite heavy), and #Mycroft's mimic3 text-to-speech model (and Mycroft is now basically bankrupt). Then writing the integration is relatively easy - I used #Platypush, but it can definitely be done with Home Assistant and OpenHAB too.Compared to 3-4 years ago, I think we're now in a state where the content is no longer the issue (just plug into a LLM, and all of your text requests will get an answer), nor integrations are a problem (just write a Platypush event hook on speech detected, and you can connect it to everything, no need for "Works with Google/Alexa" labels). Text-to-speech synthesis has also become cheap and ubiquitous.
But the hotword detection and speech-to-text models are still IMHO the bottleneck. Hotword detection is a field where you need a very small and lightweight model that only detects a specific word or phrase in a very reliable way. Snowboy was an amazing FOSS project - which also came with this cool idea of "crowd-funded models", where in order to download a model for a certain hotword you were first supposed to provide three audio tracks where you say that word in order to improve the model. But it's now discontinued because it cost the volunteers too much to run the infra.
And Mozilla DeepSpeech is a relatively good choice for general-purpose speech-to-text, but it's heavy (it takes 100% of the CPU when it runs on a Raspberry Pi) and it's mostly optimized for English - even support for other Western languages is patchy. OpenAI's recent Whisper model seems like a solid alternative, but it's also plagued by the 100% CPU issue - also, I no longer trust anything that comes from OpenAI, no matter how noble some of their efforts may look.
If there are other open-source alternatives that solve these problems, I'd be very happy to learn about them. Once these blockers are removed, there should be really no reason for anyone to feed their audio streams to Google or Amazon.
In the meantime, I'm planning to spend some time playing with some self-hosted LLM model to see if I can replace the Google Assistant library on the last Raspberry Pi that runs it in my home.
-
It's good that other people are also bringing up the elephant in the room: why do you need to pay money for one more electronic gadget that listens to you 24/7, when voice assistants aren't supposed to be rocket science in 2023 anymore? https://news.ycombinator.com/item?id=35857631
I wrote two articles on how to build custom #VoiceAssistants using just a Raspberry Pi and a microphone, one in 2019 https://blog.platypush.tech/article/Build-your-customizable-voice-assistant-with-Platypush and one in 2020 https://blog.platypush.tech/article/Build-custom-voice-assistants.
It's definitely doable and I still have my own custom assistants in the house. However, I had to get around with a #Snowboy model for hotword detection (and Snowboy is now basically abandoned), Mozilla #DeepSpeech model for speech-to-text (and that's quite heavy), and #Mycroft's mimic3 text-to-speech model (and Mycroft is now basically bankrupt). Then writing the integration is relatively easy - I used #Platypush, but it can definitely be done with Home Assistant and OpenHAB too.Compared to 3-4 years ago, I think we're now in a state where the content is no longer the issue (just plug into a LLM, and all of your text requests will get an answer), nor integrations are a problem (just write a Platypush event hook on speech detected, and you can connect it to everything, no need for "Works with Google/Alexa" labels). Text-to-speech synthesis has also become cheap and ubiquitous.
But the hotword detection and speech-to-text models are still IMHO the bottleneck. Hotword detection is a field where you need a very small and lightweight model that only detects a specific word or phrase in a very reliable way. Snowboy was an amazing FOSS project - which also came with this cool idea of "crowd-funded models", where in order to download a model for a certain hotword you were first supposed to provide three audio tracks where you say that word in order to improve the model. But it's now discontinued because it cost the volunteers too much to run the infra.
And Mozilla DeepSpeech is a relatively good choice for general-purpose speech-to-text, but it's heavy (it takes 100% of the CPU when it runs on a Raspberry Pi) and it's mostly optimized for English - even support for other Western languages is patchy. OpenAI's recent Whisper model seems like a solid alternative, but it's also plagued by the 100% CPU issue - also, I no longer trust anything that comes from OpenAI, no matter how noble some of their efforts may look.
If there are other open-source alternatives that solve these problems, I'd be very happy to learn about them. Once these blockers are removed, there should be really no reason for anyone to feed their audio streams to Google or Amazon.
In the meantime, I'm planning to spend some time playing with some self-hosted LLM model to see if I can replace the Google Assistant library on the last Raspberry Pi that runs it in my home.
-
It's good that other people are also bringing up the elephant in the room: why do you need to pay money for one more electronic gadget that listens to you 24/7, when voice assistants aren't supposed to be rocket science in 2023 anymore? https://news.ycombinator.com/item?id=35857631
I wrote two articles on how to build custom #VoiceAssistants using just a Raspberry Pi and a microphone, one in 2019 https://blog.platypush.tech/article/Build-your-customizable-voice-assistant-with-Platypush and one in 2020 https://blog.platypush.tech/article/Build-custom-voice-assistants.
It's definitely doable and I still have my own custom assistants in the house. However, I had to get around with a #Snowboy model for hotword detection (and Snowboy is now basically abandoned), Mozilla #DeepSpeech model for speech-to-text (and that's quite heavy), and #Mycroft's mimic3 text-to-speech model (and Mycroft is now basically bankrupt). Then writing the integration is relatively easy - I used #Platypush, but it can definitely be done with Home Assistant and OpenHAB too.Compared to 3-4 years ago, I think we're now in a state where the content is no longer the issue (just plug into a LLM, and all of your text requests will get an answer), nor integrations are a problem (just write a Platypush event hook on speech detected, and you can connect it to everything, no need for "Works with Google/Alexa" labels). Text-to-speech synthesis has also become cheap and ubiquitous.
But the hotword detection and speech-to-text models are still IMHO the bottleneck. Hotword detection is a field where you need a very small and lightweight model that only detects a specific word or phrase in a very reliable way. Snowboy was an amazing FOSS project - which also came with this cool idea of "crowd-funded models", where in order to download a model for a certain hotword you were first supposed to provide three audio tracks where you say that word in order to improve the model. But it's now discontinued because it cost the volunteers too much to run the infra.
And Mozilla DeepSpeech is a relatively good choice for general-purpose speech-to-text, but it's heavy (it takes 100% of the CPU when it runs on a Raspberry Pi) and it's mostly optimized for English - even support for other Western languages is patchy. OpenAI's recent Whisper model seems like a solid alternative, but it's also plagued by the 100% CPU issue - also, I no longer trust anything that comes from OpenAI, no matter how noble some of their efforts may look.
If there are other open-source alternatives that solve these problems, I'd be very happy to learn about them. Once these blockers are removed, there should be really no reason for anyone to feed their audio streams to Google or Amazon.
In the meantime, I'm planning to spend some time playing with some self-hosted LLM model to see if I can replace the Google Assistant library on the last Raspberry Pi that runs it in my home.
-