home.social

#speech-recognition — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #speech-recognition, aggregated by home.social.

fetched live
  1. #UnplugBigTech Tipp 5: Open-Source-Sprachassistent

    Verabschiede dich von Alexa und anderen Sprachassistenten, die deine Gespräche mithören und auswerten. Nutze stattdessen eine datenschutzfreundliche Alternative wie OpenVoiceOS, ein Open-Source-Sprachassistent, der von einer aktiven Community weiterentwickelt wird und auf einem RaspberryPi läuft. So behältst du die Kontrolle über deine Daten.

    openvoiceos.org/

    #Alexa #OpenVoiceOS #Sprachassistent #VoiceControl #SpeechRecognition #datenschutz #privacy

  2. #UnplugBigTech Tipp 5: Open-Source-Sprachassistent

    Verabschiede dich von Alexa und anderen Sprachassistenten, die deine Gespräche mithören und auswerten. Nutze stattdessen eine datenschutzfreundliche Alternative wie OpenVoiceOS, ein Open-Source-Sprachassistent, der von einer aktiven Community weiterentwickelt wird und auf einem RaspberryPi läuft. So behältst du die Kontrolle über deine Daten.

    openvoiceos.org/

    #Alexa #OpenVoiceOS #Sprachassistent #VoiceControl #SpeechRecognition #datenschutz #privacy

  3. #UnplugBigTech Tipp 5: Open-Source-Sprachassistent

    Verabschiede dich von Alexa und anderen Sprachassistenten, die deine Gespräche mithören und auswerten. Nutze stattdessen eine datenschutzfreundliche Alternative wie OpenVoiceOS, ein Open-Source-Sprachassistent, der von einer aktiven Community weiterentwickelt wird und auf einem RaspberryPi läuft. So behältst du die Kontrolle über deine Daten.

    openvoiceos.org/

    #Alexa #OpenVoiceOS #Sprachassistent #VoiceControl #SpeechRecognition #datenschutz #privacy

  4. #UnplugBigTech Tipp 5: Open-Source-Sprachassistent

    Verabschiede dich von Alexa und anderen Sprachassistenten, die deine Gespräche mithören und auswerten. Nutze stattdessen eine datenschutzfreundliche Alternative wie OpenVoiceOS, ein Open-Source-Sprachassistent, der von einer aktiven Community weiterentwickelt wird und auf einem RaspberryPi läuft. So behältst du die Kontrolle über deine Daten.

    openvoiceos.org/

    #Alexa #OpenVoiceOS #Sprachassistent #VoiceControl #SpeechRecognition #datenschutz #privacy

  5. #UnplugBigTech Tipp 5: Open-Source-Sprachassistent

    Verabschiede dich von Alexa und anderen Sprachassistenten, die deine Gespräche mithören und auswerten. Nutze stattdessen eine datenschutzfreundliche Alternative wie OpenVoiceOS, ein Open-Source-Sprachassistent, der von einer aktiven Community weiterentwickelt wird und auf einem RaspberryPi läuft. So behältst du die Kontrolle über deine Daten.

    openvoiceos.org/

    #Alexa #OpenVoiceOS #Sprachassistent #VoiceControl #SpeechRecognition #datenschutz #privacy

  6. Govorun PC: переносим офлайн-диктовку с Android на Windows за один вечер (с Claude)

    На Android у меня живёт Govorun Lite — офлайн-диктовка на русском. Нажал кнопку, сказал, текст вставился. Никаких облаков, никакой отправки голоса на серверы. Работает через GigaAM v2 от Сбера. Проблема одна: на ПК такого нет. Встроенная Windows-диктовка — онлайн. Whisper — либо медленный, либо требует видеокарту. Сторонние сервисы — снова облако. Я решил портировать Govorun на Windows, и для ускорения взял Claude как пару-программиста. Что из этого вышло — в этой статье.

    habr.com/ru/articles/1031240/

    #python #speechrecognition #onnx #windows #llm #голосовой_ввод

  7. Govorun PC: переносим офлайн-диктовку с Android на Windows за один вечер (с Claude)

    На Android у меня живёт Govorun Lite — офлайн-диктовка на русском. Нажал кнопку, сказал, текст вставился. Никаких облаков, никакой отправки голоса на серверы. Работает через GigaAM v2 от Сбера. Проблема одна: на ПК такого нет. Встроенная Windows-диктовка — онлайн. Whisper — либо медленный, либо требует видеокарту. Сторонние сервисы — снова облако. Я решил портировать Govorun на Windows, и для ускорения взял Claude как пару-программиста. Что из этого вышло — в этой статье.

    habr.com/ru/articles/1031240/

    #python #speechrecognition #onnx #windows #llm #голосовой_ввод

  8. Govorun PC: переносим офлайн-диктовку с Android на Windows за один вечер (с Claude)

    На Android у меня живёт Govorun Lite — офлайн-диктовка на русском. Нажал кнопку, сказал, текст вставился. Никаких облаков, никакой отправки голоса на серверы. Работает через GigaAM v2 от Сбера. Проблема одна: на ПК такого нет. Встроенная Windows-диктовка — онлайн. Whisper — либо медленный, либо требует видеокарту. Сторонние сервисы — снова облако. Я решил портировать Govorun на Windows, и для ускорения взял Claude как пару-программиста. Что из этого вышло — в этой статье.

    habr.com/ru/articles/1031240/

    #python #speechrecognition #onnx #windows #llm #голосовой_ввод

  9. Govorun PC: переносим офлайн-диктовку с Android на Windows за один вечер (с Claude)

    На Android у меня живёт Govorun Lite — офлайн-диктовка на русском. Нажал кнопку, сказал, текст вставился. Никаких облаков, никакой отправки голоса на серверы. Работает через GigaAM v2 от Сбера. Проблема одна: на ПК такого нет. Встроенная Windows-диктовка — онлайн. Whisper — либо медленный, либо требует видеокарту. Сторонние сервисы — снова облако. Я решил портировать Govorun на Windows, и для ускорения взял Claude как пару-программиста. Что из этого вышло — в этой статье.

    habr.com/ru/articles/1031240/

    #python #speechrecognition #onnx #windows #llm #голосовой_ввод

  10. Amical - Open-source AI dictation app

    Cossmology Profile: dub.sh/Vk7tPkn

    Key People: Haritabh Singh, Naomi Chopra

    #SpeechRecognition #OpenSource #OSS #COSS

  11. Xiaomi Unleashes MiMo-V2.5-Pro, Claiming Frontier Model Performance At Reduced Cost

    Xiaomi's new MiMo-V2.5-Pro and MiMo-V2.5 AI models offer strong performance, with Pro version matching top AI models at a lower token cost. Learn about MiMo-V2.5-ASR speech recognition.

    #XiaomiAI, #MiMoV25Pro, #AICost, #SpeechRecognition, #AIModels

    newsletter.tf/xiaomi-mimo-v2-5

  12. Xiaomi's new MiMo-V2.5-Pro AI model is now available, offering performance similar to top AI models but at a lower cost. The MiMo-V2.5-ASR speech model also shows advanced capabilities.

    #XiaomiAI, #MiMoV25Pro, #AICost, #SpeechRecognition, #AIModels
    newsletter.tf/xiaomi-mimo-v2-5

  13. Deepgram released Flux Multilingual, a speech recognition model that handles 10 languages with real-time switching during conversations. The system detects language changes mid-call and processes conversational turns in under 400ms. Available as cloud API or self-hosted at the same price as English-only versions. Could simplify multilingual voice applications that previously required separate detection and routing systems.

    #SpeechRecognition #MultilingualAI #VoiceTech

    implicator.ai/deepgram-launche

  14. Deepgram released Flux Multilingual, a speech recognition model that handles 10 languages with real-time switching during conversations. The system detects language changes mid-call and processes conversational turns in under 400ms. Available as cloud API or self-hosted at the same price as English-only versions. Could simplify multilingual voice applications that previously required separate detection and routing systems.

    #SpeechRecognition #MultilingualAI #VoiceTech

    implicator.ai/deepgram-launche

  15. Non-lexical sounds impact ASR in clinical documentation.

    🔊 NLCS: 2.4% of total words, conveying key clinical info
    😷 Google's WER: 40.8%, Amazon's: 57.2% (all NLCS)
    ❌ Error rates for clinically relevant NLCS: Google 94.7%, Amazon 98.7%
    📝 Total words: 135,647; 3284 NLCS; 76 conveyed critical data
    🗣️ Described implications on documentation accuracy

    #ASR #ClinicalDocumentation #SpeechRecognition #AI #NLPSolutions #Pub2Post tnyp.me/Npmiz0F4/m

  16. Non-lexical sounds impact ASR in clinical documentation.

    🔊 NLCS: 2.4% of total words, conveying key clinical info
    😷 Google's WER: 40.8%, Amazon's: 57.2% (all NLCS)
    ❌ Error rates for clinically relevant NLCS: Google 94.7%, Amazon 98.7%
    📝 Total words: 135,647; 3284 NLCS; 76 conveyed critical data
    🗣️ Described implications on documentation accuracy

    #ASR #ClinicalDocumentation #SpeechRecognition #AI #NLPSolutions #Pub2Post tnyp.me/Npmiz0F4/m

  17. Non-lexical sounds impact ASR in clinical documentation.

    🔊 NLCS: 2.4% of total words, conveying key clinical info
    😷 Google's WER: 40.8%, Amazon's: 57.2% (all NLCS)
    ❌ Error rates for clinically relevant NLCS: Google 94.7%, Amazon 98.7%
    📝 Total words: 135,647; 3284 NLCS; 76 conveyed critical data
    🗣️ Described implications on documentation accuracy

    #ASR #ClinicalDocumentation #SpeechRecognition #AI #NLPSolutions #Pub2Post tnyp.me/Npmiz0F4/m

  18. Non-lexical sounds impact ASR in clinical documentation.

    🔊 NLCS: 2.4% of total words, conveying key clinical info
    😷 Google's WER: 40.8%, Amazon's: 57.2% (all NLCS)
    ❌ Error rates for clinically relevant NLCS: Google 94.7%, Amazon 98.7%
    📝 Total words: 135,647; 3284 NLCS; 76 conveyed critical data
    🗣️ Described implications on documentation accuracy

    #ASR #ClinicalDocumentation #SpeechRecognition #AI #NLPSolutions #Pub2Post tnyp.me/Npmiz0F4/m

  19. Non-lexical sounds impact ASR in clinical documentation.

    🔊 NLCS: 2.4% of total words, conveying key clinical info
    😷 Google's WER: 40.8%, Amazon's: 57.2% (all NLCS)
    ❌ Error rates for clinically relevant NLCS: Google 94.7%, Amazon 98.7%
    📝 Total words: 135,647; 3284 NLCS; 76 conveyed critical data
    🗣️ Described implications on documentation accuracy

    #ASR #ClinicalDocumentation #SpeechRecognition #AI #NLPSolutions #Pub2Post tnyp.me/Npmiz0F4/m

  20. Whisper was too slow. Vosk was inconsistent. The answer was embarrassingly simple: Android speech recognition over local WiFi, and 80 lines of Python. hackernoon.com/the-embarrassin #speechrecognition

  21. Whisper was too slow. Vosk was inconsistent. The answer was embarrassingly simple: Android speech recognition over local WiFi, and 80 lines of Python. hackernoon.com/the-embarrassin #speechrecognition

  22. Whisper was too slow. Vosk was inconsistent. The answer was embarrassingly simple: Android speech recognition over local WiFi, and 80 lines of Python. hackernoon.com/the-embarrassin #speechrecognition

  23. Whisper was too slow. Vosk was inconsistent. The answer was embarrassingly simple: Android speech recognition over local WiFi, and 80 lines of Python. hackernoon.com/the-embarrassin

  24. Whisper was too slow. Vosk was inconsistent. The answer was embarrassingly simple: Android speech recognition over local WiFi, and 80 lines of Python. hackernoon.com/the-embarrassin #speechrecognition

  25. RE: mastodon.social/@zugaldia/1163

    The "Speed of Sound" app by @zugaldia, once you set up a custom global keyboard shortcut that doesn't conflict with GNOME's, is pretty amazing: flathub.org/en/apps/io.speedof

    This is the first time I experience reliable speech recognition for #dictation on the desktop, particularly on #Linux! Until now I had given up on that being a possibility.

    Works really well in English. It struggles with French, but who doesn't?!

    #Whisper #speechrecognition #GNOME #accessibility #a11y

  26. RE: mastodon.social/@zugaldia/1163

    The "Speed of Sound" app by @zugaldia, once you set up a custom global keyboard shortcut that doesn't conflict with GNOME's, is pretty amazing: flathub.org/en/apps/io.speedof

    This is the first time I experience reliable speech recognition for #dictation on the desktop, particularly on #Linux! Until now I had given up on that being a possibility.

    Works really well in English. It struggles with French, but who doesn't?!

    #Whisper #speechrecognition #GNOME #accessibility #a11y

  27. RE: mastodon.social/@zugaldia/1163

    The "Speed of Sound" app by @zugaldia, once you set up a custom global keyboard shortcut that doesn't conflict with GNOME's, is pretty amazing: flathub.org/en/apps/io.speedof

    This is the first time I experience reliable speech recognition for #dictation on the desktop, particularly on #Linux! Until now I had given up on that being a possibility.

    Works really well in English. It struggles with French, but who doesn't?!

    #Whisper #speechrecognition #GNOME #accessibility #a11y

  28. RE: mastodon.social/@zugaldia/1163

    The "Speed of Sound" app by @zugaldia, once you set up a custom global keyboard shortcut that doesn't conflict with GNOME's, is pretty amazing: flathub.org/en/apps/io.speedof

    This is the first time I experience reliable speech recognition for #dictation on the desktop, particularly on #Linux! Until now I had given up on that being a possibility.

    Works really well in English. It struggles with French, but who doesn't?!

    #Whisper #speechrecognition #GNOME #accessibility #a11y

  29. RE: mastodon.social/@zugaldia/1163

    The "Speed of Sound" app by @zugaldia, once you set up a custom global keyboard shortcut that doesn't conflict with GNOME's, is pretty amazing: flathub.org/en/apps/io.speedof

    This is the first time I experience reliable speech recognition for #dictation on the desktop, particularly on #Linux! Until now I had given up on that being a possibility.

    Works really well in English. It struggles with French, but who doesn't?!

    #Whisper #speechrecognition #GNOME #accessibility #a11y

  30. 🎤🤖 Behold, the latest in buzzword bingo: a speech recognition model that promises to transcribe your every "um" and "uh" with state-of-the-art accuracy! Because clearly, what the modern workplace needs is yet another AI tool to misinterpret your business jargon and turn it into garbled nonsense. 🚀✨
    cohere.com/blog/transcribe #speechrecognition #AItools #buzzwordbingo #workplaceinnovation #transcriptiontechnology #HackerNews #ngated

  31. 🎤🤖 Behold, the latest in buzzword bingo: a speech recognition model that promises to transcribe your every "um" and "uh" with state-of-the-art accuracy! Because clearly, what the modern workplace needs is yet another AI tool to misinterpret your business jargon and turn it into garbled nonsense. 🚀✨
    cohere.com/blog/transcribe #speechrecognition #AItools #buzzwordbingo #workplaceinnovation #transcriptiontechnology #HackerNews #ngated

  32. 🎤🤖 Behold, the latest in buzzword bingo: a speech recognition model that promises to transcribe your every "um" and "uh" with state-of-the-art accuracy! Because clearly, what the modern workplace needs is yet another AI tool to misinterpret your business jargon and turn it into garbled nonsense. 🚀✨
    cohere.com/blog/transcribe #speechrecognition #AItools #buzzwordbingo #workplaceinnovation #transcriptiontechnology #HackerNews #ngated

  33. 🎤🤖 Behold, the latest in buzzword bingo: a speech recognition model that promises to transcribe your every "um" and "uh" with state-of-the-art accuracy! Because clearly, what the modern workplace needs is yet another AI tool to misinterpret your business jargon and turn it into garbled nonsense. 🚀✨
    cohere.com/blog/transcribe #speechrecognition #AItools #buzzwordbingo #workplaceinnovation #transcriptiontechnology #HackerNews #ngated

  34. 🎤🤖 Behold, the latest in buzzword bingo: a speech recognition model that promises to transcribe your every "um" and "uh" with state-of-the-art accuracy! Because clearly, what the modern workplace needs is yet another AI tool to misinterpret your business jargon and turn it into garbled nonsense. 🚀✨
    cohere.com/blog/transcribe #speechrecognition #AItools #buzzwordbingo #workplaceinnovation #transcriptiontechnology #HackerNews #ngated