home.social

#deepspeech — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #deepspeech, aggregated by home.social.

  1. #Mozilla stellt #DeepSpeech ein.

    DeepSpeech ist eine quelloffene, lokale #Sprache-zu-Text-Engine, die mit #Machine #Learning unter Verwendung des #TensorFlow-Frameworks erstellt wurde. Es ist in Verbindung mit der ebenfalls von #Mozilla zusammengestellten Stimmdatenbank Common Voice Corpus bereits auf einem Raspberry Pi 4 in Echtzeit lauffähig.

    linuxnews.de/mozilla-stellt-de

  2. #Mozilla stellt #DeepSpeech ein.

    DeepSpeech ist eine quelloffene, lokale #Sprache-zu-Text-Engine, die mit #Machine #Learning unter Verwendung des #TensorFlow-Frameworks erstellt wurde. Es ist in Verbindung mit der ebenfalls von #Mozilla zusammengestellten Stimmdatenbank Common Voice Corpus bereits auf einem Raspberry Pi 4 in Echtzeit lauffähig.

    linuxnews.de/mozilla-stellt-de

  3. #Mozilla stellt #DeepSpeech ein.

    DeepSpeech ist eine quelloffene, lokale #Sprache-zu-Text-Engine, die mit #Machine #Learning unter Verwendung des #TensorFlow-Frameworks erstellt wurde. Es ist in Verbindung mit der ebenfalls von #Mozilla zusammengestellten Stimmdatenbank Common Voice Corpus bereits auf einem Raspberry Pi 4 in Echtzeit lauffähig.

    linuxnews.de/mozilla-stellt-de

  4. #Mozilla stellt #DeepSpeech ein.

    DeepSpeech ist eine quelloffene, lokale #Sprache-zu-Text-Engine, die mit #Machine #Learning unter Verwendung des #TensorFlow-Frameworks erstellt wurde. Es ist in Verbindung mit der ebenfalls von #Mozilla zusammengestellten Stimmdatenbank Common Voice Corpus bereits auf einem Raspberry Pi 4 in Echtzeit lauffähig.

    linuxnews.de/mozilla-stellt-de

  5. #Mozilla stellt #DeepSpeech ein.

    DeepSpeech ist eine quelloffene, lokale #Sprache-zu-Text-Engine, die mit #Machine #Learning unter Verwendung des #TensorFlow-Frameworks erstellt wurde. Es ist in Verbindung mit der ebenfalls von #Mozilla zusammengestellten Stimmdatenbank Common Voice Corpus bereits auf einem Raspberry Pi 4 in Echtzeit lauffähig.

    linuxnews.de/mozilla-stellt-de

  6. #Mozilla Formally Discontinues Its #DeepSpeech Project
    #MozillaDeepSpeech was a #speechtotext engine with great performance for real-time communication even when running on #RaspberryPi and other low-power systems.
    Mozilla discontinuing DeepSpeech sadly doesn't as surprise. Last tagged release was 0.9.3 back in December 2020 and there hadn't been any Git activity since 2021.
    Even in 2020 DeepSpeech was considered at risk of ceasing development following Mozilla layoffs.
    phoronix.com/news/Mozilla-Deep

  7. #Mozilla Formally Discontinues Its #DeepSpeech Project
    #MozillaDeepSpeech was a #speechtotext engine with great performance for real-time communication even when running on #RaspberryPi and other low-power systems.
    Mozilla discontinuing DeepSpeech sadly doesn't as surprise. Last tagged release was 0.9.3 back in December 2020 and there hadn't been any Git activity since 2021.
    Even in 2020 DeepSpeech was considered at risk of ceasing development following Mozilla layoffs.
    phoronix.com/news/Mozilla-Deep

  8. Formally Discontinues Its Project
    was a engine with great performance for real-time communication even when running on and other low-power systems.
    Mozilla discontinuing DeepSpeech sadly doesn't as surprise. Last tagged release was 0.9.3 back in December 2020 and there hadn't been any Git activity since 2021.
    Even in 2020 DeepSpeech was considered at risk of ceasing development following Mozilla layoffs.
    phoronix.com/news/Mozilla-Deep

  9. #Mozilla Formally Discontinues Its #DeepSpeech Project
    #MozillaDeepSpeech was a #speechtotext engine with great performance for real-time communication even when running on #RaspberryPi and other low-power systems.
    Mozilla discontinuing DeepSpeech sadly doesn't as surprise. Last tagged release was 0.9.3 back in December 2020 and there hadn't been any Git activity since 2021.
    Even in 2020 DeepSpeech was considered at risk of ceasing development following Mozilla layoffs.
    phoronix.com/news/Mozilla-Deep

  10. #Mozilla Formally Discontinues Its #DeepSpeech Project
    #MozillaDeepSpeech was a #speechtotext engine with great performance for real-time communication even when running on #RaspberryPi and other low-power systems.
    Mozilla discontinuing DeepSpeech sadly doesn't as surprise. Last tagged release was 0.9.3 back in December 2020 and there hadn't been any Git activity since 2021.
    Even in 2020 DeepSpeech was considered at risk of ceasing development following Mozilla layoffs.
    phoronix.com/news/Mozilla-Deep

  11. 🌟✨BREAKING NEWS✨🌟 Mozilla's #DeepSpeech is so cutting-edge that it's been cut entirely! 😂 Now you can enjoy the sound of silence on your Raspberry Pi 4 without the distraction of real-time speech-to-text. Maybe next time they'll invent something that doesn't get #discontinued faster than you can say "GitHub Copilot"! 🚀
    github.com/mozilla/DeepSpeech #Mozilla #RaspberryPi #TechNews #SpeechToText #HackerNews #ngated

  12. 🌟✨BREAKING NEWS✨🌟 Mozilla's #DeepSpeech is so cutting-edge that it's been cut entirely! 😂 Now you can enjoy the sound of silence on your Raspberry Pi 4 without the distraction of real-time speech-to-text. Maybe next time they'll invent something that doesn't get #discontinued faster than you can say "GitHub Copilot"! 🚀
    github.com/mozilla/DeepSpeech #Mozilla #RaspberryPi #TechNews #SpeechToText #HackerNews #ngated

  13. 🌟✨BREAKING NEWS✨🌟 Mozilla's #DeepSpeech is so cutting-edge that it's been cut entirely! 😂 Now you can enjoy the sound of silence on your Raspberry Pi 4 without the distraction of real-time speech-to-text. Maybe next time they'll invent something that doesn't get #discontinued faster than you can say "GitHub Copilot"! 🚀
    github.com/mozilla/DeepSpeech #Mozilla #RaspberryPi #TechNews #SpeechToText #HackerNews #ngated

  14. 🌟✨BREAKING NEWS✨🌟 Mozilla's #DeepSpeech is so cutting-edge that it's been cut entirely! 😂 Now you can enjoy the sound of silence on your Raspberry Pi 4 without the distraction of real-time speech-to-text. Maybe next time they'll invent something that doesn't get #discontinued faster than you can say "GitHub Copilot"! 🚀
    github.com/mozilla/DeepSpeech #Mozilla #RaspberryPi #TechNews #SpeechToText #HackerNews #ngated

  15. 🌟✨BREAKING NEWS✨🌟 Mozilla's #DeepSpeech is so cutting-edge that it's been cut entirely! 😂 Now you can enjoy the sound of silence on your Raspberry Pi 4 without the distraction of real-time speech-to-text. Maybe next time they'll invent something that doesn't get #discontinued faster than you can say "GitHub Copilot"! 🚀
    github.com/mozilla/DeepSpeech #Mozilla #RaspberryPi #TechNews #SpeechToText #HackerNews #ngated

  16. "Coqui, a conversational AI startup, on Wednesday (January 3, 2023), announced that it is shutting down its operation ...

    [It] specialises in building open source models and applications in the area of quick voice cloning, text-to-voice, etc. The former employees of Mozilla, left the company after it stopped developing their own Speech-to-text engine, DeepSpeech to begin Coqui.”

    #KLKrithika, 2024

    analyticsindiamag.com/industry

    #translation #MachineTranslation #ASR #TTS #Coqui #Mozilla #DeepSpeech

  17. "Coqui, a conversational AI startup, on Wednesday (January 3, 2023), announced that it is shutting down its operation ...

    [It] specialises in building open source models and applications in the area of quick voice cloning, text-to-voice, etc. The former employees of Mozilla, left the company after it stopped developing their own Speech-to-text engine, DeepSpeech to begin Coqui.”

    #KLKrithika, 2024

    analyticsindiamag.com/industry

    #translation #MachineTranslation #ASR #TTS #Coqui #Mozilla #DeepSpeech

  18. "Coqui, a conversational AI startup, on Wednesday (January 3, 2023), announced that it is shutting down its operation ...

    [It] specialises in building open source models and applications in the area of quick voice cloning, text-to-voice, etc. The former employees of Mozilla, left the company after it stopped developing their own Speech-to-text engine, DeepSpeech to begin Coqui.”

    #KLKrithika, 2024

    analyticsindiamag.com/industry

    #translation #MachineTranslation #ASR #TTS #Coqui #Mozilla #DeepSpeech

  19. "Coqui, a conversational AI startup, on Wednesday (January 3, 2023), announced that it is shutting down its operation ...

    [It] specialises in building open source models and applications in the area of quick voice cloning, text-to-voice, etc. The former employees of Mozilla, left the company after it stopped developing their own Speech-to-text engine, DeepSpeech to begin Coqui.”

    #KLKrithika, 2024

    analyticsindiamag.com/industry

    #translation #MachineTranslation #ASR #TTS #Coqui #Mozilla #DeepSpeech

  20. Common Voice is a project by Mozilla to build an extensive ethically-sourced dataset of spoken word in various languages to help push forward open-source voice recognition technology like DeepVoice (also by Mozilla).

    I just recorded a dozen or so sentences :)

    commonvoice.mozilla.org/

    #OpenSource #SpeechRecognition #SpeechToText #Mozilla #DeepSpeech

  21. Common Voice is a project by Mozilla to build an extensive ethically-sourced dataset of spoken word in various languages to help push forward open-source voice recognition technology like DeepVoice (also by Mozilla).

    I just recorded a dozen or so sentences :)

    commonvoice.mozilla.org/

    #OpenSource #SpeechRecognition #SpeechToText #Mozilla #DeepSpeech

  22. Common Voice is a project by Mozilla to build an extensive ethically-sourced dataset of spoken word in various languages to help push forward open-source voice recognition technology like DeepVoice (also by Mozilla).

    I just recorded a dozen or so sentences :)

    commonvoice.mozilla.org/

    #OpenSource #SpeechRecognition #SpeechToText #Mozilla #DeepSpeech

  23. Common Voice is a project by Mozilla to build an extensive ethically-sourced dataset of spoken word in various languages to help push forward open-source voice recognition technology like DeepVoice (also by Mozilla).

    I just recorded a dozen or so sentences :)

    commonvoice.mozilla.org/

    #OpenSource #SpeechRecognition #SpeechToText #Mozilla #DeepSpeech

  24. Common Voice is a project by Mozilla to build an extensive ethically-sourced dataset of spoken word in various languages to help push forward open-source voice recognition technology like DeepVoice (also by Mozilla).

    I just recorded a dozen or so sentences :)

    commonvoice.mozilla.org/

    #OpenSource #SpeechRecognition #SpeechToText #Mozilla #DeepSpeech

  25. je me relance dans mes investigations Speech-to-text et Text-to-speech. Bizarrement c'est quelque chose qui revient régulièrement. Est ce que des gens savent ce qu'est devenu #DeepSpeech et si #MozillaVoice est toujours maintenu?
    De mon côté je me base sur #mycroft mais j'ai le sentiment que le projet est un peu à l'arrêt mais je me trompe peut-être.

  26. If you have to do Speech-to-Text and Text-to-Speech tasks and don't want to send your data to the Internet, I recommend you to try Speech Note (Linux desktop app).

    It is easy to use, works offline and supports 57 languages!

    Speech Note works thanks to powerful #STT and #TTS engines underneath: #DeepSpeech #Coqui #Vosk #Whisper #Piper #eSpeak #MBROLA #RHVoice

    You can download #SpeechNote from #Flathub: flathub.org/apps/net.mkiol.Spe

    Video demo: youtu.be/EhUPvaHvssw

  27. If you have to do Speech-to-Text and Text-to-Speech tasks and don't want to send your data to the Internet, I recommend you to try Speech Note (Linux desktop app).

    It is easy to use, works offline and supports 57 languages!

    Speech Note works thanks to powerful #STT and #TTS engines underneath: #DeepSpeech #Coqui #Vosk #Whisper #Piper #eSpeak #MBROLA #RHVoice

    You can download #SpeechNote from #Flathub: flathub.org/apps/net.mkiol.Spe

    Video demo: youtu.be/EhUPvaHvssw

  28. If you have to do Speech-to-Text and Text-to-Speech tasks and don't want to send your data to the Internet, I recommend you to try Speech Note (Linux desktop app).

    It is easy to use, works offline and supports 57 languages!

    Speech Note works thanks to powerful #STT and #TTS engines underneath: #DeepSpeech #Coqui #Vosk #Whisper #Piper #eSpeak #MBROLA #RHVoice

    You can download #SpeechNote from #Flathub: flathub.org/apps/net.mkiol.Spe

    Video demo: youtu.be/EhUPvaHvssw

  29. If you have to do Speech-to-Text and Text-to-Speech tasks and don't want to send your data to the Internet, I recommend you to try Speech Note (Linux desktop app).

    It is easy to use, works offline and supports 57 languages!

    Speech Note works thanks to powerful #STT and #TTS engines underneath: #DeepSpeech #Coqui #Vosk #Whisper #Piper #eSpeak #MBROLA #RHVoice

    You can download #SpeechNote from #Flathub: flathub.org/apps/net.mkiol.Spe

    Video demo: youtu.be/EhUPvaHvssw

  30. If you have to do Speech-to-Text and Text-to-Speech tasks and don't want to send your data to the Internet, I recommend you to try Speech Note (Linux desktop app).

    It is easy to use, works offline and supports 57 languages!

    Speech Note works thanks to powerful #STT and #TTS engines underneath: #DeepSpeech #Coqui #Vosk #Whisper #Piper #eSpeak #MBROLA #RHVoice

    You can download #SpeechNote from #Flathub: flathub.org/apps/net.mkiol.Spe

    Video demo: youtu.be/EhUPvaHvssw

  31. It's good that other people are also bringing up the elephant in the room: why do you need to pay money for one more electronic gadget that listens to you 24/7, when voice assistants aren't supposed to be rocket science in 2023 anymore? news.ycombinator.com/item?id=3

    I wrote two articles on how to build custom #VoiceAssistants using just a Raspberry Pi and a microphone, one in 2019 blog.platypush.tech/article/Bu and one in 2020 blog.platypush.tech/article/Bu.
    It's definitely doable and I still have my own custom assistants in the house. However, I had to get around with a #Snowboy model for hotword detection (and Snowboy is now basically abandoned), Mozilla #DeepSpeech model for speech-to-text (and that's quite heavy), and #Mycroft's mimic3 text-to-speech model (and Mycroft is now basically bankrupt). Then writing the integration is relatively easy - I used #Platypush, but it can definitely be done with Home Assistant and OpenHAB too.

    Compared to 3-4 years ago, I think we're now in a state where the content is no longer the issue (just plug into a LLM, and all of your text requests will get an answer), nor integrations are a problem (just write a Platypush event hook on speech detected, and you can connect it to everything, no need for "Works with Google/Alexa" labels). Text-to-speech synthesis has also become cheap and ubiquitous.

    But the hotword detection and speech-to-text models are still IMHO the bottleneck. Hotword detection is a field where you need a very small and lightweight model that only detects a specific word or phrase in a very reliable way. Snowboy was an amazing FOSS project - which also came with this cool idea of "crowd-funded models", where in order to download a model for a certain hotword you were first supposed to provide three audio tracks where you say that word in order to improve the model. But it's now discontinued because it cost the volunteers too much to run the infra.

    And Mozilla DeepSpeech is a relatively good choice for general-purpose speech-to-text, but it's heavy (it takes 100% of the CPU when it runs on a Raspberry Pi) and it's mostly optimized for English - even support for other Western languages is patchy. OpenAI's recent Whisper model seems like a solid alternative, but it's also plagued by the 100% CPU issue - also, I no longer trust anything that comes from OpenAI, no matter how noble some of their efforts may look.

    If there are other open-source alternatives that solve these problems, I'd be very happy to learn about them. Once these blockers are removed, there should be really no reason for anyone to feed their audio streams to Google or Amazon.

    In the meantime, I'm planning to spend some time playing with some self-hosted LLM model to see if I can replace the Google Assistant library on the last Raspberry Pi that runs it in my home.

  32. It's good that other people are also bringing up the elephant in the room: why do you need to pay money for one more electronic gadget that listens to you 24/7, when voice assistants aren't supposed to be rocket science in 2023 anymore? news.ycombinator.com/item?id=3

    I wrote two articles on how to build custom #VoiceAssistants using just a Raspberry Pi and a microphone, one in 2019 blog.platypush.tech/article/Bu and one in 2020 blog.platypush.tech/article/Bu.
    It's definitely doable and I still have my own custom assistants in the house. However, I had to get around with a #Snowboy model for hotword detection (and Snowboy is now basically abandoned), Mozilla #DeepSpeech model for speech-to-text (and that's quite heavy), and #Mycroft's mimic3 text-to-speech model (and Mycroft is now basically bankrupt). Then writing the integration is relatively easy - I used #Platypush, but it can definitely be done with Home Assistant and OpenHAB too.

    Compared to 3-4 years ago, I think we're now in a state where the content is no longer the issue (just plug into a LLM, and all of your text requests will get an answer), nor integrations are a problem (just write a Platypush event hook on speech detected, and you can connect it to everything, no need for "Works with Google/Alexa" labels). Text-to-speech synthesis has also become cheap and ubiquitous.

    But the hotword detection and speech-to-text models are still IMHO the bottleneck. Hotword detection is a field where you need a very small and lightweight model that only detects a specific word or phrase in a very reliable way. Snowboy was an amazing FOSS project - which also came with this cool idea of "crowd-funded models", where in order to download a model for a certain hotword you were first supposed to provide three audio tracks where you say that word in order to improve the model. But it's now discontinued because it cost the volunteers too much to run the infra.

    And Mozilla DeepSpeech is a relatively good choice for general-purpose speech-to-text, but it's heavy (it takes 100% of the CPU when it runs on a Raspberry Pi) and it's mostly optimized for English - even support for other Western languages is patchy. OpenAI's recent Whisper model seems like a solid alternative, but it's also plagued by the 100% CPU issue - also, I no longer trust anything that comes from OpenAI, no matter how noble some of their efforts may look.

    If there are other open-source alternatives that solve these problems, I'd be very happy to learn about them. Once these blockers are removed, there should be really no reason for anyone to feed their audio streams to Google or Amazon.

    In the meantime, I'm planning to spend some time playing with some self-hosted LLM model to see if I can replace the Google Assistant library on the last Raspberry Pi that runs it in my home.

  33. It's good that other people are also bringing up the elephant in the room: why do you need to pay money for one more electronic gadget that listens to you 24/7, when voice assistants aren't supposed to be rocket science in 2023 anymore? news.ycombinator.com/item?id=3

    I wrote two articles on how to build custom #VoiceAssistants using just a Raspberry Pi and a microphone, one in 2019 blog.platypush.tech/article/Bu and one in 2020 blog.platypush.tech/article/Bu.
    It's definitely doable and I still have my own custom assistants in the house. However, I had to get around with a #Snowboy model for hotword detection (and Snowboy is now basically abandoned), Mozilla #DeepSpeech model for speech-to-text (and that's quite heavy), and #Mycroft's mimic3 text-to-speech model (and Mycroft is now basically bankrupt). Then writing the integration is relatively easy - I used #Platypush, but it can definitely be done with Home Assistant and OpenHAB too.

    Compared to 3-4 years ago, I think we're now in a state where the content is no longer the issue (just plug into a LLM, and all of your text requests will get an answer), nor integrations are a problem (just write a Platypush event hook on speech detected, and you can connect it to everything, no need for "Works with Google/Alexa" labels). Text-to-speech synthesis has also become cheap and ubiquitous.

    But the hotword detection and speech-to-text models are still IMHO the bottleneck. Hotword detection is a field where you need a very small and lightweight model that only detects a specific word or phrase in a very reliable way. Snowboy was an amazing FOSS project - which also came with this cool idea of "crowd-funded models", where in order to download a model for a certain hotword you were first supposed to provide three audio tracks where you say that word in order to improve the model. But it's now discontinued because it cost the volunteers too much to run the infra.

    And Mozilla DeepSpeech is a relatively good choice for general-purpose speech-to-text, but it's heavy (it takes 100% of the CPU when it runs on a Raspberry Pi) and it's mostly optimized for English - even support for other Western languages is patchy. OpenAI's recent Whisper model seems like a solid alternative, but it's also plagued by the 100% CPU issue - also, I no longer trust anything that comes from OpenAI, no matter how noble some of their efforts may look.

    If there are other open-source alternatives that solve these problems, I'd be very happy to learn about them. Once these blockers are removed, there should be really no reason for anyone to feed their audio streams to Google or Amazon.

    In the meantime, I'm planning to spend some time playing with some self-hosted LLM model to see if I can replace the Google Assistant library on the last Raspberry Pi that runs it in my home.

  34. It's good that other people are also bringing up the elephant in the room: why do you need to pay money for one more electronic gadget that listens to you 24/7, when voice assistants aren't supposed to be rocket science in 2023 anymore? news.ycombinator.com/item?id=3

    I wrote two articles on how to build custom #VoiceAssistants using just a Raspberry Pi and a microphone, one in 2019 blog.platypush.tech/article/Bu and one in 2020 blog.platypush.tech/article/Bu.
    It's definitely doable and I still have my own custom assistants in the house. However, I had to get around with a #Snowboy model for hotword detection (and Snowboy is now basically abandoned), Mozilla #DeepSpeech model for speech-to-text (and that's quite heavy), and #Mycroft's mimic3 text-to-speech model (and Mycroft is now basically bankrupt). Then writing the integration is relatively easy - I used #Platypush, but it can definitely be done with Home Assistant and OpenHAB too.

    Compared to 3-4 years ago, I think we're now in a state where the content is no longer the issue (just plug into a LLM, and all of your text requests will get an answer), nor integrations are a problem (just write a Platypush event hook on speech detected, and you can connect it to everything, no need for "Works with Google/Alexa" labels). Text-to-speech synthesis has also become cheap and ubiquitous.

    But the hotword detection and speech-to-text models are still IMHO the bottleneck. Hotword detection is a field where you need a very small and lightweight model that only detects a specific word or phrase in a very reliable way. Snowboy was an amazing FOSS project - which also came with this cool idea of "crowd-funded models", where in order to download a model for a certain hotword you were first supposed to provide three audio tracks where you say that word in order to improve the model. But it's now discontinued because it cost the volunteers too much to run the infra.

    And Mozilla DeepSpeech is a relatively good choice for general-purpose speech-to-text, but it's heavy (it takes 100% of the CPU when it runs on a Raspberry Pi) and it's mostly optimized for English - even support for other Western languages is patchy. OpenAI's recent Whisper model seems like a solid alternative, but it's also plagued by the 100% CPU issue - also, I no longer trust anything that comes from OpenAI, no matter how noble some of their efforts may look.

    If there are other open-source alternatives that solve these problems, I'd be very happy to learn about them. Once these blockers are removed, there should be really no reason for anyone to feed their audio streams to Google or Amazon.

    In the meantime, I'm planning to spend some time playing with some self-hosted LLM model to see if I can replace the Google Assistant library on the last Raspberry Pi that runs it in my home.

  35. It's good that other people are also bringing up the elephant in the room: why do you need to pay money for one more electronic gadget that listens to you 24/7, when voice assistants aren't supposed to be rocket science in 2023 anymore? news.ycombinator.com/item?id=3

    I wrote two articles on how to build custom #VoiceAssistants using just a Raspberry Pi and a microphone, one in 2019 blog.platypush.tech/article/Bu and one in 2020 blog.platypush.tech/article/Bu.
    It's definitely doable and I still have my own custom assistants in the house. However, I had to get around with a #Snowboy model for hotword detection (and Snowboy is now basically abandoned), Mozilla #DeepSpeech model for speech-to-text (and that's quite heavy), and #Mycroft's mimic3 text-to-speech model (and Mycroft is now basically bankrupt). Then writing the integration is relatively easy - I used #Platypush, but it can definitely be done with Home Assistant and OpenHAB too.

    Compared to 3-4 years ago, I think we're now in a state where the content is no longer the issue (just plug into a LLM, and all of your text requests will get an answer), nor integrations are a problem (just write a Platypush event hook on speech detected, and you can connect it to everything, no need for "Works with Google/Alexa" labels). Text-to-speech synthesis has also become cheap and ubiquitous.

    But the hotword detection and speech-to-text models are still IMHO the bottleneck. Hotword detection is a field where you need a very small and lightweight model that only detects a specific word or phrase in a very reliable way. Snowboy was an amazing FOSS project - which also came with this cool idea of "crowd-funded models", where in order to download a model for a certain hotword you were first supposed to provide three audio tracks where you say that word in order to improve the model. But it's now discontinued because it cost the volunteers too much to run the infra.

    And Mozilla DeepSpeech is a relatively good choice for general-purpose speech-to-text, but it's heavy (it takes 100% of the CPU when it runs on a Raspberry Pi) and it's mostly optimized for English - even support for other Western languages is patchy. OpenAI's recent Whisper model seems like a solid alternative, but it's also plagued by the 100% CPU issue - also, I no longer trust anything that comes from OpenAI, no matter how noble some of their efforts may look.

    If there are other open-source alternatives that solve these problems, I'd be very happy to learn about them. Once these blockers are removed, there should be really no reason for anyone to feed their audio streams to Google or Amazon.

    In the meantime, I'm planning to spend some time playing with some self-hosted LLM model to see if I can replace the Google Assistant library on the last Raspberry Pi that runs it in my home.