#speech-recognition — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #speech-recognition, aggregated by home.social.
-
#UnplugBigTech Tipp 5: Open-Source-Sprachassistent
Verabschiede dich von Alexa und anderen Sprachassistenten, die deine Gespräche mithören und auswerten. Nutze stattdessen eine datenschutzfreundliche Alternative wie OpenVoiceOS, ein Open-Source-Sprachassistent, der von einer aktiven Community weiterentwickelt wird und auf einem RaspberryPi läuft. So behältst du die Kontrolle über deine Daten.
#Alexa #OpenVoiceOS #Sprachassistent #VoiceControl #SpeechRecognition #datenschutz #privacy
-
#UnplugBigTech Tipp 5: Open-Source-Sprachassistent
Verabschiede dich von Alexa und anderen Sprachassistenten, die deine Gespräche mithören und auswerten. Nutze stattdessen eine datenschutzfreundliche Alternative wie OpenVoiceOS, ein Open-Source-Sprachassistent, der von einer aktiven Community weiterentwickelt wird und auf einem RaspberryPi läuft. So behältst du die Kontrolle über deine Daten.
#Alexa #OpenVoiceOS #Sprachassistent #VoiceControl #SpeechRecognition #datenschutz #privacy
-
#UnplugBigTech Tipp 5: Open-Source-Sprachassistent
Verabschiede dich von Alexa und anderen Sprachassistenten, die deine Gespräche mithören und auswerten. Nutze stattdessen eine datenschutzfreundliche Alternative wie OpenVoiceOS, ein Open-Source-Sprachassistent, der von einer aktiven Community weiterentwickelt wird und auf einem RaspberryPi läuft. So behältst du die Kontrolle über deine Daten.
#Alexa #OpenVoiceOS #Sprachassistent #VoiceControl #SpeechRecognition #datenschutz #privacy
-
#UnplugBigTech Tipp 5: Open-Source-Sprachassistent
Verabschiede dich von Alexa und anderen Sprachassistenten, die deine Gespräche mithören und auswerten. Nutze stattdessen eine datenschutzfreundliche Alternative wie OpenVoiceOS, ein Open-Source-Sprachassistent, der von einer aktiven Community weiterentwickelt wird und auf einem RaspberryPi läuft. So behältst du die Kontrolle über deine Daten.
#Alexa #OpenVoiceOS #Sprachassistent #VoiceControl #SpeechRecognition #datenschutz #privacy
-
#UnplugBigTech Tipp 5: Open-Source-Sprachassistent
Verabschiede dich von Alexa und anderen Sprachassistenten, die deine Gespräche mithören und auswerten. Nutze stattdessen eine datenschutzfreundliche Alternative wie OpenVoiceOS, ein Open-Source-Sprachassistent, der von einer aktiven Community weiterentwickelt wird und auf einem RaspberryPi läuft. So behältst du die Kontrolle über deine Daten.
#Alexa #OpenVoiceOS #Sprachassistent #VoiceControl #SpeechRecognition #datenschutz #privacy
-
Govorun PC: переносим офлайн-диктовку с Android на Windows за один вечер (с Claude)
На Android у меня живёт Govorun Lite — офлайн-диктовка на русском. Нажал кнопку, сказал, текст вставился. Никаких облаков, никакой отправки голоса на серверы. Работает через GigaAM v2 от Сбера. Проблема одна: на ПК такого нет. Встроенная Windows-диктовка — онлайн. Whisper — либо медленный, либо требует видеокарту. Сторонние сервисы — снова облако. Я решил портировать Govorun на Windows, и для ускорения взял Claude как пару-программиста. Что из этого вышло — в этой статье.
https://habr.com/ru/articles/1031240/
#python #speechrecognition #onnx #windows #llm #голосовой_ввод
-
Govorun PC: переносим офлайн-диктовку с Android на Windows за один вечер (с Claude)
На Android у меня живёт Govorun Lite — офлайн-диктовка на русском. Нажал кнопку, сказал, текст вставился. Никаких облаков, никакой отправки голоса на серверы. Работает через GigaAM v2 от Сбера. Проблема одна: на ПК такого нет. Встроенная Windows-диктовка — онлайн. Whisper — либо медленный, либо требует видеокарту. Сторонние сервисы — снова облако. Я решил портировать Govorun на Windows, и для ускорения взял Claude как пару-программиста. Что из этого вышло — в этой статье.
https://habr.com/ru/articles/1031240/
#python #speechrecognition #onnx #windows #llm #голосовой_ввод
-
Govorun PC: переносим офлайн-диктовку с Android на Windows за один вечер (с Claude)
На Android у меня живёт Govorun Lite — офлайн-диктовка на русском. Нажал кнопку, сказал, текст вставился. Никаких облаков, никакой отправки голоса на серверы. Работает через GigaAM v2 от Сбера. Проблема одна: на ПК такого нет. Встроенная Windows-диктовка — онлайн. Whisper — либо медленный, либо требует видеокарту. Сторонние сервисы — снова облако. Я решил портировать Govorun на Windows, и для ускорения взял Claude как пару-программиста. Что из этого вышло — в этой статье.
https://habr.com/ru/articles/1031240/
#python #speechrecognition #onnx #windows #llm #голосовой_ввод
-
Govorun PC: переносим офлайн-диктовку с Android на Windows за один вечер (с Claude)
На Android у меня живёт Govorun Lite — офлайн-диктовка на русском. Нажал кнопку, сказал, текст вставился. Никаких облаков, никакой отправки голоса на серверы. Работает через GigaAM v2 от Сбера. Проблема одна: на ПК такого нет. Встроенная Windows-диктовка — онлайн. Whisper — либо медленный, либо требует видеокарту. Сторонние сервисы — снова облако. Я решил портировать Govorun на Windows, и для ускорения взял Claude как пару-программиста. Что из этого вышло — в этой статье.
https://habr.com/ru/articles/1031240/
#python #speechrecognition #onnx #windows #llm #голосовой_ввод
-
Amical - Open-source AI dictation app
Cossmology Profile: https://dub.sh/Vk7tPkn
Key People: Haritabh Singh, Naomi Chopra
-
Xiaomi Unleashes MiMo-V2.5-Pro, Claiming Frontier Model Performance At Reduced Cost
Xiaomi's new MiMo-V2.5-Pro and MiMo-V2.5 AI models offer strong performance, with Pro version matching top AI models at a lower token cost. Learn about MiMo-V2.5-ASR speech recognition.
#XiaomiAI, #MiMoV25Pro, #AICost, #SpeechRecognition, #AIModels
https://newsletter.tf/xiaomi-mimo-v2-5-pro-ai-performance-cost/
-
Xiaomi's new MiMo-V2.5-Pro AI model is now available, offering performance similar to top AI models but at a lower cost. The MiMo-V2.5-ASR speech model also shows advanced capabilities.
#XiaomiAI, #MiMoV25Pro, #AICost, #SpeechRecognition, #AIModels
https://newsletter.tf/xiaomi-mimo-v2-5-pro-ai-performance-cost/ -
Deepgram released Flux Multilingual, a speech recognition model that handles 10 languages with real-time switching during conversations. The system detects language changes mid-call and processes conversational turns in under 400ms. Available as cloud API or self-hosted at the same price as English-only versions. Could simplify multilingual voice applications that previously required separate detection and routing systems.
-
Deepgram released Flux Multilingual, a speech recognition model that handles 10 languages with real-time switching during conversations. The system detects language changes mid-call and processes conversational turns in under 400ms. Available as cloud API or self-hosted at the same price as English-only versions. Could simplify multilingual voice applications that previously required separate detection and routing systems.
-
Non-lexical sounds impact ASR in clinical documentation.
🔊 NLCS: 2.4% of total words, conveying key clinical info
😷 Google's WER: 40.8%, Amazon's: 57.2% (all NLCS)
❌ Error rates for clinically relevant NLCS: Google 94.7%, Amazon 98.7%
📝 Total words: 135,647; 3284 NLCS; 76 conveyed critical data
🗣️ Described implications on documentation accuracy#ASR #ClinicalDocumentation #SpeechRecognition #AI #NLPSolutions #Pub2Post https://tnyp.me/Npmiz0F4/m
-
Non-lexical sounds impact ASR in clinical documentation.
🔊 NLCS: 2.4% of total words, conveying key clinical info
😷 Google's WER: 40.8%, Amazon's: 57.2% (all NLCS)
❌ Error rates for clinically relevant NLCS: Google 94.7%, Amazon 98.7%
📝 Total words: 135,647; 3284 NLCS; 76 conveyed critical data
🗣️ Described implications on documentation accuracy#ASR #ClinicalDocumentation #SpeechRecognition #AI #NLPSolutions #Pub2Post https://tnyp.me/Npmiz0F4/m
-
Non-lexical sounds impact ASR in clinical documentation.
🔊 NLCS: 2.4% of total words, conveying key clinical info
😷 Google's WER: 40.8%, Amazon's: 57.2% (all NLCS)
❌ Error rates for clinically relevant NLCS: Google 94.7%, Amazon 98.7%
📝 Total words: 135,647; 3284 NLCS; 76 conveyed critical data
🗣️ Described implications on documentation accuracy#ASR #ClinicalDocumentation #SpeechRecognition #AI #NLPSolutions #Pub2Post https://tnyp.me/Npmiz0F4/m
-
Non-lexical sounds impact ASR in clinical documentation.
🔊 NLCS: 2.4% of total words, conveying key clinical info
😷 Google's WER: 40.8%, Amazon's: 57.2% (all NLCS)
❌ Error rates for clinically relevant NLCS: Google 94.7%, Amazon 98.7%
📝 Total words: 135,647; 3284 NLCS; 76 conveyed critical data
🗣️ Described implications on documentation accuracy#ASR #ClinicalDocumentation #SpeechRecognition #AI #NLPSolutions #Pub2Post https://tnyp.me/Npmiz0F4/m
-
Non-lexical sounds impact ASR in clinical documentation.
🔊 NLCS: 2.4% of total words, conveying key clinical info
😷 Google's WER: 40.8%, Amazon's: 57.2% (all NLCS)
❌ Error rates for clinically relevant NLCS: Google 94.7%, Amazon 98.7%
📝 Total words: 135,647; 3284 NLCS; 76 conveyed critical data
🗣️ Described implications on documentation accuracy#ASR #ClinicalDocumentation #SpeechRecognition #AI #NLPSolutions #Pub2Post https://tnyp.me/Npmiz0F4/m
-
Learn the basics of neural networks and backpropagation: https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
#video #tutorial #deepLearning #LLMs #recognition #speechRecognition #visualRecognition #neuralNeworks #machineLearning
-
Learn the basics of neural networks and backpropagation: https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
#video #tutorial #deepLearning #LLMs #recognition #speechRecognition #visualRecognition #neuralNeworks #machineLearning
-
Learn the basics of neural networks and backpropagation: https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
#video #tutorial #deepLearning #LLMs #recognition #speechRecognition #visualRecognition #neuralNeworks #machineLearning
-
Learn the basics of neural networks and backpropagation: https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
#video #tutorial #deepLearning #LLMs #recognition #speechRecognition #visualRecognition #neuralNeworks #machineLearning
-
Learn the basics of neural networks and backpropagation: https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
#video #tutorial #deepLearning #LLMs #recognition #speechRecognition #visualRecognition #neuralNeworks #machineLearning
-
Understanding, not correction.
#conversationalcontext #speechrecognition #designphilosophy #Downsyndrome #evaluation
-
Whisper was too slow. Vosk was inconsistent. The answer was embarrassingly simple: Android speech recognition over local WiFi, and 80 lines of Python. https://hackernoon.com/the-embarrassingly-simple-voice-input-system-running-my-home-server-workflow #speechrecognition
-
Whisper was too slow. Vosk was inconsistent. The answer was embarrassingly simple: Android speech recognition over local WiFi, and 80 lines of Python. https://hackernoon.com/the-embarrassingly-simple-voice-input-system-running-my-home-server-workflow #speechrecognition
-
Whisper was too slow. Vosk was inconsistent. The answer was embarrassingly simple: Android speech recognition over local WiFi, and 80 lines of Python. https://hackernoon.com/the-embarrassingly-simple-voice-input-system-running-my-home-server-workflow #speechrecognition
-
Whisper was too slow. Vosk was inconsistent. The answer was embarrassingly simple: Android speech recognition over local WiFi, and 80 lines of Python. https://hackernoon.com/the-embarrassingly-simple-voice-input-system-running-my-home-server-workflow #speechrecognition
-
Whisper was too slow. Vosk was inconsistent. The answer was embarrassingly simple: Android speech recognition over local WiFi, and 80 lines of Python. https://hackernoon.com/the-embarrassingly-simple-voice-input-system-running-my-home-server-workflow #speechrecognition
-
RE: https://mastodon.social/@zugaldia/116351933343098498
The "Speed of Sound" app by @zugaldia, once you set up a custom global keyboard shortcut that doesn't conflict with GNOME's, is pretty amazing: https://flathub.org/en/apps/io.speedofsound.SpeedOfSound
This is the first time I experience reliable speech recognition for #dictation on the desktop, particularly on #Linux! Until now I had given up on that being a possibility.
Works really well in English. It struggles with French, but who doesn't?!
-
RE: https://mastodon.social/@zugaldia/116351933343098498
The "Speed of Sound" app by @zugaldia, once you set up a custom global keyboard shortcut that doesn't conflict with GNOME's, is pretty amazing: https://flathub.org/en/apps/io.speedofsound.SpeedOfSound
This is the first time I experience reliable speech recognition for #dictation on the desktop, particularly on #Linux! Until now I had given up on that being a possibility.
Works really well in English. It struggles with French, but who doesn't?!
-
RE: https://mastodon.social/@zugaldia/116351933343098498
The "Speed of Sound" app by @zugaldia, once you set up a custom global keyboard shortcut that doesn't conflict with GNOME's, is pretty amazing: https://flathub.org/en/apps/io.speedofsound.SpeedOfSound
This is the first time I experience reliable speech recognition for #dictation on the desktop, particularly on #Linux! Until now I had given up on that being a possibility.
Works really well in English. It struggles with French, but who doesn't?!
-
RE: https://mastodon.social/@zugaldia/116351933343098498
The "Speed of Sound" app by @zugaldia, once you set up a custom global keyboard shortcut that doesn't conflict with GNOME's, is pretty amazing: https://flathub.org/en/apps/io.speedofsound.SpeedOfSound
This is the first time I experience reliable speech recognition for #dictation on the desktop, particularly on #Linux! Until now I had given up on that being a possibility.
Works really well in English. It struggles with French, but who doesn't?!
-
RE: https://mastodon.social/@zugaldia/116351933343098498
The "Speed of Sound" app by @zugaldia, once you set up a custom global keyboard shortcut that doesn't conflict with GNOME's, is pretty amazing: https://flathub.org/en/apps/io.speedofsound.SpeedOfSound
This is the first time I experience reliable speech recognition for #dictation on the desktop, particularly on #Linux! Until now I had given up on that being a possibility.
Works really well in English. It struggles with French, but who doesn't?!
-
🎤🤖 Behold, the latest in buzzword bingo: a speech recognition model that promises to transcribe your every "um" and "uh" with state-of-the-art accuracy! Because clearly, what the modern workplace needs is yet another AI tool to misinterpret your business jargon and turn it into garbled nonsense. 🚀✨
https://cohere.com/blog/transcribe #speechrecognition #AItools #buzzwordbingo #workplaceinnovation #transcriptiontechnology #HackerNews #ngated -
🎤🤖 Behold, the latest in buzzword bingo: a speech recognition model that promises to transcribe your every "um" and "uh" with state-of-the-art accuracy! Because clearly, what the modern workplace needs is yet another AI tool to misinterpret your business jargon and turn it into garbled nonsense. 🚀✨
https://cohere.com/blog/transcribe #speechrecognition #AItools #buzzwordbingo #workplaceinnovation #transcriptiontechnology #HackerNews #ngated -
🎤🤖 Behold, the latest in buzzword bingo: a speech recognition model that promises to transcribe your every "um" and "uh" with state-of-the-art accuracy! Because clearly, what the modern workplace needs is yet another AI tool to misinterpret your business jargon and turn it into garbled nonsense. 🚀✨
https://cohere.com/blog/transcribe #speechrecognition #AItools #buzzwordbingo #workplaceinnovation #transcriptiontechnology #HackerNews #ngated -
🎤🤖 Behold, the latest in buzzword bingo: a speech recognition model that promises to transcribe your every "um" and "uh" with state-of-the-art accuracy! Because clearly, what the modern workplace needs is yet another AI tool to misinterpret your business jargon and turn it into garbled nonsense. 🚀✨
https://cohere.com/blog/transcribe #speechrecognition #AItools #buzzwordbingo #workplaceinnovation #transcriptiontechnology #HackerNews #ngated -
🎤🤖 Behold, the latest in buzzword bingo: a speech recognition model that promises to transcribe your every "um" and "uh" with state-of-the-art accuracy! Because clearly, what the modern workplace needs is yet another AI tool to misinterpret your business jargon and turn it into garbled nonsense. 🚀✨
https://cohere.com/blog/transcribe #speechrecognition #AItools #buzzwordbingo #workplaceinnovation #transcriptiontechnology #HackerNews #ngated -
https://winbuzzer.com/2026/03/27/cohere-open-source-transcribe-model-tops-asr-leaderboard-xcxwbn/
Cohere's Open-Source Transcribe Model Tops ASR Leaderboard
#AI #Cohere #CohereTranscribe #SpeechRecognition #AITranscription #OpenSourceAI #HuggingFace #MultimodalAI
-
https://winbuzzer.com/2026/03/27/cohere-open-source-transcribe-model-tops-asr-leaderboard-xcxwbn/
Cohere's Open-Source Transcribe Model Tops ASR Leaderboard
#AI #Cohere #CohereTranscribe #SpeechRecognition #AITranscription #OpenSourceAI #HuggingFace #MultimodalAI
-
https://winbuzzer.com/2026/03/27/cohere-open-source-transcribe-model-tops-asr-leaderboard-xcxwbn/
Cohere's Open-Source Transcribe Model Tops ASR Leaderboard
#AI #Cohere #CohereTranscribe #SpeechRecognition #AITranscription #OpenSourceAI #HuggingFace #MultimodalAI
-
https://winbuzzer.com/2026/03/27/cohere-open-source-transcribe-model-tops-asr-leaderboard-xcxwbn/
Cohere's Open-Source Transcribe Model Tops ASR Leaderboard
#AI #Cohere #CohereTranscribe #SpeechRecognition #AITranscription #OpenSourceAI #HuggingFace #MultimodalAI
-
https://winbuzzer.com/2026/03/27/cohere-open-source-transcribe-model-tops-asr-leaderboard-xcxwbn/
Cohere's Open-Source Transcribe Model Tops ASR Leaderboard
#AI #Cohere #CohereTranscribe #SpeechRecognition #AITranscription #OpenSourceAI #HuggingFace #MultimodalAI
-
https://www.europesays.com/dk/43411/ Amsterdam’s Reson8 raises €5 million to build speech AI infrastructure that resonates across Europe #Amsterdam #BaldertonCapital #JarnoVerhagen #Netherlands #NPHard #RaoulRitter #Reson8 #SpeechAI #SpeechRecognition #ThomasKluiters
-
Chrome extension adjusts video speed based on how fast the speaker is talking
https://github.com/ywong137/speech-speed
#HackerNews #ChromeExtension #VideoSpeed #SpeechRecognition #TechInnovation #OpenSource
-
Chrome extension adjusts video speed based on how fast the speaker is talking
https://github.com/ywong137/speech-speed
#HackerNews #ChromeExtension #VideoSpeed #SpeechRecognition #TechInnovation #OpenSource
-
Chrome extension adjusts video speed based on how fast the speaker is talking
https://github.com/ywong137/speech-speed
#HackerNews #ChromeExtension #VideoSpeed #SpeechRecognition #TechInnovation #OpenSource
-
Chrome extension adjusts video speed based on how fast the speaker is talking
https://github.com/ywong137/speech-speed
#HackerNews #ChromeExtension #VideoSpeed #SpeechRecognition #TechInnovation #OpenSource
-
Chrome extension adjusts video speed based on how fast the speaker is talking
https://github.com/ywong137/speech-speed
#HackerNews #ChromeExtension #VideoSpeed #SpeechRecognition #TechInnovation #OpenSource
-
https://winbuzzer.com/2026/03/16/ibm-granite-4-1b-speech-tops-openasr-leaderboard-xcxwbn/
IBM Granite 4.0 1B Speech Tops OpenASR Leaderboard
#AI #AIModels #IBM #SpeechRecognition #OpenSourceAI #EnterpriseAI #EdgeComputing #AITranslation #OpenASRLeaderboard
-
https://winbuzzer.com/2026/03/16/ibm-granite-4-1b-speech-tops-openasr-leaderboard-xcxwbn/
IBM Granite 4.0 1B Speech Tops OpenASR Leaderboard
#AI #AIModels #IBM #SpeechRecognition #OpenSourceAI #EnterpriseAI #EdgeComputing #AITranslation #OpenASRLeaderboard
-
https://winbuzzer.com/2026/03/16/ibm-granite-4-1b-speech-tops-openasr-leaderboard-xcxwbn/
IBM Granite 4.0 1B Speech Tops OpenASR Leaderboard
#AI #AIModels #IBM #SpeechRecognition #OpenSourceAI #EnterpriseAI #EdgeComputing #AITranslation #OpenASRLeaderboard
-
https://winbuzzer.com/2026/03/16/ibm-granite-4-1b-speech-tops-openasr-leaderboard-xcxwbn/
IBM Granite 4.0 1B Speech Tops OpenASR Leaderboard
#AI #AIModels #IBM #SpeechRecognition #OpenSourceAI #EnterpriseAI #EdgeComputing #AITranslation #OpenASRLeaderboard