home.social

#platypush — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #platypush, aggregated by home.social.

  1. #ActivityPub support in #Madblog

    https://blog.fabiomanganiello.com/article/Madblog-federated-blogging-from-markdown

    I am glad to announce that Madblog has now officially joined the #Fediverse family.

    If you want to test it out, search for this URL on your Fediverse client.

    Madblog has already supported #Webmentions for the past couple of weeks, allowing your blog posts to be mentioned by other sites with Webmentions support (WordPress, Lemmy, HackerNews…) and get those mentions directly rendered on your page.

    It now adds ActivityPub support too, using #Pubby, another little Python library that I’ve put together myself (just like Webmentions) as a mean to quickly plug ActivityPub support to any Python Web app.

    Webmentions and Pubby follow similar principles and implement a similar API, and you can easily use them to add federation support to your existing Web applications - a single bind_webmentions or bind_activitypub call to your existing Flask/FastAPI/Tornado application should suffice for most of the cases.

    Madblog may have now become the easiest way to publish a federated blog - and perhaps the only way that doesn’t require a database, everything is based on plain Markdown files.

    If you have a registered domain and a certificate, then hosting your federated blog is now just a matter of:

    mkdir -p ~/madblog/markdown
    cat <<EOF > ~/madblog/markdown/hello-world.md
    # My first post
    
    This is my first post on [Madblog](https://git.fabiomanganiello.com/madblog)!
    EOF
    
    docker run -it \
      -p 8000:8000 \
      -v "$HOME/madblog:/data" \
      quay.io/blacklight/madblog

    And Markdown files can be hosted wherever you like - a Git folder, an Obsidian Vault, a Nextcloud Notes installation, a folder on your phone synchronized over SyncThing…

    Federation support is also at a quite advanced state compared to e.g. #WriteFreely. It currently supports:

    • Interactions rendered on the articles: if you like, boost, quote or reply to an article, all interactions are rendered directly at the bottom of the article (interactions with WriteFreely through federated accounts were kind of lost in the void instead)

    • Guestbook support (optional): mentions to the federated Madblog handle that are not in response to articles are now rendered on a separate /guestbook route

    • Email notifications: all interactions can have email notifications

    • Support for quotes, also on Mastodon

    • Support for mentions, just drop a @[email protected] in your Markdown file and Joe will get a notification

    • Support for hashtag federation

    • Support for split-domain configurations, you can host your blog on blog.example.com but have a Fediverse handle like @[email protected]. Search by direct post URL on Mastodon will work with both cases

    • Support for custom profile fields, all rendered on Mastodon, with verification support

    • Support for moderation, either through blocklist or allowlist, with support for rules on handles/usernames, URLs, domains or regular expressions

    • A partial (but comprehensive for the provided features) implementation of the Mastodon API

    If you want you can follow both the profiles of my blogs - they are now both federated:

    • My personal blog: @fabio (it used to run WriteFreely before, so if you followed it you may need to unfollow it and re-follow it)

    • The #Platypush blog: @blog

  2. 📰 New blog article

    Self-host your own multi-service #music server on #Android

    How to replace your music streaming apps with a setup that supports multiple streaming services, multiple devices and multiple outputs from a single Webapp.

    #mopidy #platypush #termux #ntfy #Tasker #python

    @Selfhosted @Android @python

    https://blog.platypush.tech/article/Self-host-your-music-experience-on-mobile

  3. 📰 New blog article

    A self-hosted solution to create unlimited private email aliases. Featuring:

    • #Postfix (mail server)
    • #ntfy (pub/sub over HTTP)
    • #Platypush (to listen to alias requests, create them and notify the clients)
    • #Tasker (to conveniently wrap the service on Android into a simple app)
  4. 📦 #Platypush 1.3.5 is out!

    The main feature of this release is the support for multiple backends in the youtube plugin.

    It allows you to watch ad-free YouTube videos on any supported media player and manage your playlists and subscriptions through multiple YouTube implementations.

    Support for multiple YouTube backends

    Earlier only #Piped was supported, but given the state of the project and most instances (all the ones I’ve tested, including my own, are still blocked by #YouTube’s new restrictions) I’ve added support for #Invidious too, and that’s currently the recommended backend.

    The in-browser YouTube player now plays videos using the Invidious embedded player if you configure the invidious backend, so the UI can be used as a full alternative frontend for Invidious.

    I’ve also added a new google backend that leverages the official YouTube Data API to search and fetch your playlists and subscriptions, but keep in mind that:

    • It requires you to register a project on the Google Cloud developers console.

    • It doesn’t support the get_feed() action (YouTube has removed the endpoint in v3), so you won’t be able to get the latest videos published by your subscribed channel.

    • All searches and activities will be logged on your Google account, so it’s probably not the best option if you are looking for a privacy-aware experience (but video streaming will still be ad-free).

    State of YouTube media support

    The youtube plugin should work in tandem with any supported Platypush media integrations (tested with media.mpv, media.vlc, media.gstreamer, media.kodi and media.chromecast), but media.mpv is recommended. The reason is that mpv provides the --ytdl option out of the box to leverage yt-dlp to download and stream videos on the fly, while other media plugins will have to first download the full video locally before streaming it (I’ve tried to implement my own real-time media streaming server, but I’m still not very happy with its stability).

    Leveraging the support for multiple backends to migrate your data around

    I’ve always been baffled by the fact that there isn’t a standard format to export playlists/subscriptions across different backend implementations (even among alternative backends, such as Piped and Invidious).

    As someone who has migrated through different YouTube backends and apps depending on the state of restrictions implemented by Google, I’ve often had to write my own scripts to convert CSV/JSON exports from one platform or app to a format understood by the new solution.

    Since the Platypush youtube plugin exposes the same API regardless of the backend, it is possible to configure multiple backends, and write a small script that fetches all playlists and subscriptions from one and imports them into another:

    from platypush import run
    
    # Get all the playlists from e.g. the Piped backend
    piped_playlists = run("youtube.get_playlists", backend="piped")
    piped_playlists_with_videos = {
      pl["id"]: {
        item["id"]
        for item in run(
          "youtube.get_playlist",
          id=pl["id"],
          backend="piped"
        )
      }
      for pl in piped_playlists
    }
    
    # Create the playlists on Invidious and populate them
    for pl in piped_playlists:
      invidious_playlist = run(
        "youtube.create_playlist",
        name=pl["name"],
        backend="invidious"
      )
    
      run(
        "youtube.add_to_playlist",
        playlist_id=invidious_playlist["id"],
        item_ids=piped_playlists_with_videos[pl["id"]] or [],
        backend="invidious"
      )

    Note that the simple script above doesn’t handle merge of existing playlists and items, but it can be easily adapted - if there’s enough interest I may write a small blog article with a more complete example.

    Other release features

    The full changelog of the new release is here. Besides the youtube integration changes, this release includes the following features:

    • Many stability/performance improvements for the music.mopidy integration - especially in handling connection recoveries.

    • Support for ungrouped lights in the light.hue plugin.

    • Added a new Application tab to the UI, which allows you to monitor all events and requests handled by the service.

    • Adapted ssl layer to Python 3.12 (which has deprecated ssl.wrap_socket()).

    • Migrated the kafka integration to kafka-python-ng instead of kafka, which is currently broken and basically unmaintained.

  5. @Nelfan I use #NewPipe on Android, but unfortunately it doesn’t come with a web version.

    After self-hosting #Piped for a while I’ve recently switched to #Indivious (Piped isn’t seeing much development and it’s much easier to get blocked by YouTube by using it), and I must say that, hosted on a residential address and with IPv6 rotation, it does its job quite well.

    For everything else (streaming on TV, Chromecast etc.) #Platypush with the YouTube plugin and MPV/VLC does a very good job, as long as yt-dlp is up-to-date (of course, being the main developer of it I’m a bit biased here).

    I really hope that yt-dlp keeps working, and I’d direct my efforts towards keeping that alive, because yt-dlp functioning properly (and not only for YouTube) means that a lot of projects downstream will keep functioning.

  6. #Platypush + #WebPush loading

    I’ve taken quite a deep dive in these days into the WebPush implementation and the details of the #VAPID specification for authenticated push notifications through #PWA.

    I’m now testing things in the Platypush web app - aiming to support custom notifications providers too, so dispatching notifications to your mobile devices through your ntfy or NextPush server can also be supported.

    The next release of Platypush may finally include native Web notifications through the PWA layer.

    You should then be able to get notifications for all the media playing on your devices, and control them just like you would do with a Spotify/YouTube media notification, without having to use intermediary layers such as Tasker, Termux, Pushbullet or ntfy. Or get mobile notifications for the interactions with your custom voice assistants. Or create your custom push notifications (e.g. on your event hooks, custom procedures or crons) and dispatch them securely to your mobile devices through your own ntfy server. All (hopefully) without the need of a native Android app - power of Web pushes!

    I only wish that the tools to implement WebPush/VAPID in #Python applications were as mature as those available for the JS ecosystem. py-vapid seems reasonably well designed, but it’s still a bit of an early project and it’s not even available on any major package managers (which is a big no-no for core Platypush features). And it only takes care of signing VAPID claims, not of packing and delivering WebPush requests end-to-end. I’ve eventually resorted to doing my own implementation with ecdsa, plus jose to take care of the JWT encryption boilerplate. I may write a little blog article if it ends up working.

    https://git.platypush.tech/platypush/platypush/issues/417

  7. #Platypush + #WebPush loading

    I’ve taken quite a deep dive in these days into the WebPush implementation and the details of the #VAPID specification for authenticated push notifications through #PWA.

    I’m now testing things in the Platypush web app - aiming to support custom notifications providers too, so dispatching notifications to your mobile devices through your ntfy or NextPush server can also be supported.

    The next release of Platypush may finally include native Web notifications through the PWA layer.

    You should then be able to get notifications for all the media playing on your devices, and control them just like you would do with a Spotify/YouTube media notification, without having to use intermediary layers such as Tasker, Termux, Pushbullet or ntfy. Or get mobile notifications for the interactions with your custom voice assistants. Or create your custom push notifications (e.g. on your event hooks, custom procedures or crons) and dispatch them securely to your mobile devices through your own ntfy server. All (hopefully) without the need of a native Android app - power of Web pushes!

    I only wish that the tools to implement WebPush/VAPID in #Python applications were as mature as those available for the JS ecosystem. py-vapid seems reasonably well designed, but it’s still a bit of an early project and it’s not even available on any major package managers (which is a big no-no for core Platypush features). And it only takes care of signing VAPID claims, not of packing and delivering WebPush requests end-to-end. I’ve eventually resorted to doing my own implementation with ecdsa, plus jose to take care of the JWT encryption boilerplate. I may write a little blog article if it ends up working.

    https://git.platypush.tech/platypush/platypush/issues/417

  8. #Platypush + #WebPush loading

    I’ve taken quite a deep dive in these days into the WebPush implementation and the details of the #VAPID specification for authenticated push notifications through #PWA.

    I’m now testing things in the Platypush web app - aiming to support custom notifications providers too, so dispatching notifications to your mobile devices through your ntfy or NextPush server can also be supported.

    The next release of Platypush may finally include native Web notifications through the PWA layer.

    You should then be able to get notifications for all the media playing on your devices, and control them just like you would do with a Spotify/YouTube media notification, without having to use intermediary layers such as Tasker, Termux, Pushbullet or ntfy. Or get mobile notifications for the interactions with your custom voice assistants. Or create your custom push notifications (e.g. on your event hooks, custom procedures or crons) and dispatch them securely to your mobile devices through your own ntfy server. All (hopefully) without the need of a native Android app - power of Web pushes!

    I only wish that the tools to implement WebPush/VAPID in #Python applications were as mature as those available for the JS ecosystem. py-vapid seems reasonably well designed, but it’s still a bit of an early project and it’s not even available on any major package managers (which is a big no-no for core Platypush features). And it only takes care of signing VAPID claims, not of packing and delivering WebPush requests end-to-end. I’ve eventually resorted to doing my own implementation with ecdsa, plus jose to take care of the JWT encryption boilerplate. I may write a little blog article if it ends up working.

    https://git.platypush.tech/platypush/platypush/issues/417

  9. #Platypush + #WebPush loading

    I’ve taken quite a deep dive in these days into the WebPush implementation and the details of the #VAPID specification for authenticated push notifications through #PWA.

    I’m now testing things in the Platypush web app - aiming to support custom notifications providers too, so dispatching notifications to your mobile devices through your ntfy or NextPush server can also be supported.

    The next release of Platypush may finally include native Web notifications through the PWA layer.

    You should then be able to get notifications for all the media playing on your devices, and control them just like you would do with a Spotify/YouTube media notification, without having to use intermediary layers such as Tasker, Termux, Pushbullet or ntfy. Or get mobile notifications for the interactions with your custom voice assistants. Or create your custom push notifications (e.g. on your event hooks, custom procedures or crons) and dispatch them securely to your mobile devices through your own ntfy server. All (hopefully) without the need of a native Android app - power of Web pushes!

    I only wish that the tools to implement WebPush/VAPID in #Python applications were as mature as those available for the JS ecosystem. py-vapid seems reasonably well designed, but it’s still a bit of an early project and it’s not even available on any major package managers (which is a big no-no for core Platypush features). And it only takes care of signing VAPID claims, not of packing and delivering WebPush requests end-to-end. I’ve eventually resorted to doing my own implementation with ecdsa, plus jose to take care of the JWT encryption boilerplate. I may write a little blog article if it ends up working.

    https://git.platypush.tech/platypush/platypush/issues/417

  10. #Platypush + #WebPush loading

    I’ve taken quite a deep dive in these days into the WebPush implementation and the details of the #VAPID specification for authenticated push notifications through #PWA.

    I’m now testing things in the Platypush web app - aiming to support custom notifications providers too, so dispatching notifications to your mobile devices through your ntfy or NextPush server can also be supported.

    The next release of Platypush may finally include native Web notifications through the PWA layer.

    You should then be able to get notifications for all the media playing on your devices, and control them just like you would do with a Spotify/YouTube media notification, without having to use intermediary layers such as Tasker, Termux, Pushbullet or ntfy. Or get mobile notifications for the interactions with your custom voice assistants. Or create your custom push notifications (e.g. on your event hooks, custom procedures or crons) and dispatch them securely to your mobile devices through your own ntfy server. All (hopefully) without the need of a native Android app - power of Web pushes!

    I only wish that the tools to implement WebPush/VAPID in #Python applications were as mature as those available for the JS ecosystem. py-vapid seems reasonably well designed, but it’s still a bit of an early project and it’s not even available on any major package managers (which is a big no-no for core Platypush features). And it only takes care of signing VAPID claims, not of packing and delivering WebPush requests end-to-end. I’ve eventually resorted to doing my own implementation with ecdsa, plus jose to take care of the JWT encryption boilerplate. I may write a little blog article if it ends up working.

    https://git.platypush.tech/platypush/platypush/issues/417

  11. Testing #Platypush with #Valkey and #Redict, now that #Redis has decided to apply a weirdly restrictive license like SSPL.

    At a first impression, it looks like Valkey is more ambitious and willing to implement many new features and optimize Redis' data model, while Redict seems to stick to "let's do what Redis already does best and become great at it, without time-series, open telemetry and a lot of new whistles and bells".

    I also wish that these projects will soon make it upstream in the major package managers. As of now most of the package managers still provide Redis, which isn't full FOSS anymore, and none of its recent forks.

    If you are working on a project that relies on Redis, what options are you currently considering after Redis' SSPL migration?
  12. @cory the idea sounds similar to what I’ve done a while ago with #Platypush + #mopidy (and optionally Tidal). My implementation also uses the scrobbled tracks over a certain period and Last.fm’s API to automatically generate a “discover weekly” playlist.

    I haven’t toyed with Plex in a while, but why did you have to run everything on Firebase/Supabase? If you have your Plex server running locally isn’t it more convenient to go for a fully local solution?

  13. #Google is shutting down one more service that was actually useful - the #Fit API.

    A few years ago I wrote an article that showed how to create a custom #Grafana dashboard that, among the other things, could leverage #Platypush and its google.fit integration to collect and display health data from any device that supported Google Fit (phones, watches, scales, custom health monitors etc.).

    Since Google Fit never even bothered to implement a proper Web view outside of their mobile app, it was a great way to collect all of your fit data under the same umbrella.

    All these efforts, and thousands of apps and hardware devices that built their health/fit capabilities around the Fit API, are now likely to be washed away like tears in the rain - and all, again, because Google is stuck in an existential crisis and a masochistic spiral of layoffs, cuts and mass divesting.

    Unfortunately the deprecation of the Fit API will leave a huge hole in the market.

    I surely didn’t use the Fit API because I like Google. I used it because it’s the backend supported by basically anything out there that tracks sleep or workouts, or measures steps, weight, calories or pulse.

    I wish that open products like #wger were a bit more mature and supported by more hardware makers and app developers. Or that #Nextcloud Health had a proper API that one could tap in. But that’s not the case. Google Fit has been for a while the lingua franca in this niche.

    Divesting from Google and whatever they build and sell is imperative at this point.

    Google by now fully deserves its reputation as a reverse king Midas that turns everything it touches into shit.

    And we need to make sure that from now on the infrastructure to collect and aggregate fit data is based on open protocols that all the vendors in the field are supposed to adopt.

    We can’t rely on a mediocre company led by a mediocre and shortsighted management class that keeps pushing whole fields in our industry back by a decade and keeps turning existing devices into silicon garbage with a snap of their fingers, rounds of layoffs and corporate bullshit.

    Google is likely to be in less than a decade in the same position where Philips is now.

    A sleepy giant that has wasted all the chances it had of nurturing and following up on all the brilliant ideas of their engineers.

    A place that sucks the lymph out of talented engineers and leaves behind empty shells with no professional purpose.

    A business that was in the right place at the right time, and had many chances of remaining both competitive, innovative and fair, but failed them all because of a shortsighted management class that sees everything as a cost to cut on the altar of the shareholders.

    Eventually, companies like these become walking zombies with no purpose, struggling to put together even the most basic products, kept alive solely by the reminiscence of their past prestige, and just waiting for the last employee to cash in their pension and stocks package and turn off the lights on their way out.

    https://arstechnica.com/gadgets/2024/05/google-fit-apis-get-shut-down-in-2025-might-break-fitness-devices/

  14. The upcoming version of #Platypush will include a revised assistant.picovoice integration as the advised #voice assistant plugin - assistant.google is based on a library deprecated long ago and it’s becoming harder and harder to make it work on newer systems.

    I’m overall very impressed by the state of the #Picovoice products - much better than the last time I checked them out a few years ago. It’s now possible to build a full-featured voice assistant using their hotword+STT+TTS engines, and I’m very impressed by the flexibility of their intent detection too.

    Platypush will now allow setting up your custom ChatGPT-powered voice assistant in something as simple as:

    from platypush import hook, run
    from platypush.message.event.assistant import (
        SpeechRecognizedEvent,
        ResponseEndEvent,
    )
    
    @hook(SpeechRecognizedEvent)
    def on_speech_detected(event, **_):
        response = run(
            "openai.get_response", prompt=event.phrase
        )
    
        event.assistant.render_response(response)
    
    @hook(ResponseEndEvent)
    def on_response_end(event, **_):
        # If the AI response is a question, automatically start
        # a follow-up interaction
        if event.response_text and \
             event.response_text.endswith("?"):
            event.assistant.start_conversation()

    The adventurous among you can already start playing with the new integration by checking out the latest master branch.

    In the meantime, I’m packing the documentation and preparing a new blog post about it, while preparing the new release, so stay tuned!

    https://git.platypush.tech/platypush/platypush/src/commit/72bc6971222af8cef51046e89e83106fac0e3dc0/platypush/plugins/assistant/picovoice/__init__.py#L14

  15. Update on the state of #voice integrations in #Platypush: my initial despair (Snowboy and Mycroft are dead, Mozilla DeepSpeech is abandoned, OpenAI’s Whisper is very resource-hungry, and the Google Assistant library is deprecated and still alive just because I’m keeping those bindings alive in Platypush) turned into excitement after I’ve played a bit more with #Picovoice.

    RE: https://manganiello.blog/api/posts/yp1zetjnrw

  16. Btw I’m wondering how this license may impact use-cases where the component A released under #SSPL is not used directly by software B, but software B uses C (or it forks C), which in turn uses A (i.e. if the license is transitive).

    An example: in #Platypush I use #Redis quite heavily - as an in-memory cache, as a pub/sub framework for inter-process communication, and as a memory queue.

    Platypush is already FOSS, but it’s released under a relatively permissive MIT license. (I’ve pondered a lot over the pros/cons of MIT vs. *GPL when licensing a product like Platypush, in the future I may also considering switching to AGPL, but for now MIT is a good trade-off).

    This means that people are free to copy the code of a Platypush plugin into their projects. Or use its plugins as libraries for their own integrations. Or extend it with their own plugins. Or make a fork and expose it as a service on their cloud. And the MIT license doesn’t require anybody to redistribute the full source.

    But Platypush also uses Redis, which is now under SSPL.

    What should company X, which has made their closed fork of Platypush with a bunch of proprietary integrations, and maybe distribute it as a service, do now? They are basically exposing a closed service that uses Redis under the hood, which violates SSPL. But the service itself is a fork of an open service released under MIT, so there’s no violation from that point of view.

    In other words, does SSPL override other more permissive licenses every time a product is exposed as a service?

  17. It's good that other people are also bringing up the elephant in the room: why do you need to pay money for one more electronic gadget that listens to you 24/7, when voice assistants aren't supposed to be rocket science in 2023 anymore? news.ycombinator.com/item?id=3

    I wrote two articles on how to build custom #VoiceAssistants using just a Raspberry Pi and a microphone, one in 2019 blog.platypush.tech/article/Bu and one in 2020 blog.platypush.tech/article/Bu.
    It's definitely doable and I still have my own custom assistants in the house. However, I had to get around with a #Snowboy model for hotword detection (and Snowboy is now basically abandoned), Mozilla #DeepSpeech model for speech-to-text (and that's quite heavy), and #Mycroft's mimic3 text-to-speech model (and Mycroft is now basically bankrupt). Then writing the integration is relatively easy - I used #Platypush, but it can definitely be done with Home Assistant and OpenHAB too.

    Compared to 3-4 years ago, I think we're now in a state where the content is no longer the issue (just plug into a LLM, and all of your text requests will get an answer), nor integrations are a problem (just write a Platypush event hook on speech detected, and you can connect it to everything, no need for "Works with Google/Alexa" labels). Text-to-speech synthesis has also become cheap and ubiquitous.

    But the hotword detection and speech-to-text models are still IMHO the bottleneck. Hotword detection is a field where you need a very small and lightweight model that only detects a specific word or phrase in a very reliable way. Snowboy was an amazing FOSS project - which also came with this cool idea of "crowd-funded models", where in order to download a model for a certain hotword you were first supposed to provide three audio tracks where you say that word in order to improve the model. But it's now discontinued because it cost the volunteers too much to run the infra.

    And Mozilla DeepSpeech is a relatively good choice for general-purpose speech-to-text, but it's heavy (it takes 100% of the CPU when it runs on a Raspberry Pi) and it's mostly optimized for English - even support for other Western languages is patchy. OpenAI's recent Whisper model seems like a solid alternative, but it's also plagued by the 100% CPU issue - also, I no longer trust anything that comes from OpenAI, no matter how noble some of their efforts may look.

    If there are other open-source alternatives that solve these problems, I'd be very happy to learn about them. Once these blockers are removed, there should be really no reason for anyone to feed their audio streams to Google or Amazon.

    In the meantime, I'm planning to spend some time playing with some self-hosted LLM model to see if I can replace the Google Assistant library on the last Raspberry Pi that runs it in my home.

  18. It's good that other people are also bringing up the elephant in the room: why do you need to pay money for one more electronic gadget that listens to you 24/7, when voice assistants aren't supposed to be rocket science in 2023 anymore? news.ycombinator.com/item?id=3

    I wrote two articles on how to build custom #VoiceAssistants using just a Raspberry Pi and a microphone, one in 2019 blog.platypush.tech/article/Bu and one in 2020 blog.platypush.tech/article/Bu.
    It's definitely doable and I still have my own custom assistants in the house. However, I had to get around with a #Snowboy model for hotword detection (and Snowboy is now basically abandoned), Mozilla #DeepSpeech model for speech-to-text (and that's quite heavy), and #Mycroft's mimic3 text-to-speech model (and Mycroft is now basically bankrupt). Then writing the integration is relatively easy - I used #Platypush, but it can definitely be done with Home Assistant and OpenHAB too.

    Compared to 3-4 years ago, I think we're now in a state where the content is no longer the issue (just plug into a LLM, and all of your text requests will get an answer), nor integrations are a problem (just write a Platypush event hook on speech detected, and you can connect it to everything, no need for "Works with Google/Alexa" labels). Text-to-speech synthesis has also become cheap and ubiquitous.

    But the hotword detection and speech-to-text models are still IMHO the bottleneck. Hotword detection is a field where you need a very small and lightweight model that only detects a specific word or phrase in a very reliable way. Snowboy was an amazing FOSS project - which also came with this cool idea of "crowd-funded models", where in order to download a model for a certain hotword you were first supposed to provide three audio tracks where you say that word in order to improve the model. But it's now discontinued because it cost the volunteers too much to run the infra.

    And Mozilla DeepSpeech is a relatively good choice for general-purpose speech-to-text, but it's heavy (it takes 100% of the CPU when it runs on a Raspberry Pi) and it's mostly optimized for English - even support for other Western languages is patchy. OpenAI's recent Whisper model seems like a solid alternative, but it's also plagued by the 100% CPU issue - also, I no longer trust anything that comes from OpenAI, no matter how noble some of their efforts may look.

    If there are other open-source alternatives that solve these problems, I'd be very happy to learn about them. Once these blockers are removed, there should be really no reason for anyone to feed their audio streams to Google or Amazon.

    In the meantime, I'm planning to spend some time playing with some self-hosted LLM model to see if I can replace the Google Assistant library on the last Raspberry Pi that runs it in my home.

  19. It's good that other people are also bringing up the elephant in the room: why do you need to pay money for one more electronic gadget that listens to you 24/7, when voice assistants aren't supposed to be rocket science in 2023 anymore? news.ycombinator.com/item?id=3

    I wrote two articles on how to build custom #VoiceAssistants using just a Raspberry Pi and a microphone, one in 2019 blog.platypush.tech/article/Bu and one in 2020 blog.platypush.tech/article/Bu.
    It's definitely doable and I still have my own custom assistants in the house. However, I had to get around with a #Snowboy model for hotword detection (and Snowboy is now basically abandoned), Mozilla #DeepSpeech model for speech-to-text (and that's quite heavy), and #Mycroft's mimic3 text-to-speech model (and Mycroft is now basically bankrupt). Then writing the integration is relatively easy - I used #Platypush, but it can definitely be done with Home Assistant and OpenHAB too.

    Compared to 3-4 years ago, I think we're now in a state where the content is no longer the issue (just plug into a LLM, and all of your text requests will get an answer), nor integrations are a problem (just write a Platypush event hook on speech detected, and you can connect it to everything, no need for "Works with Google/Alexa" labels). Text-to-speech synthesis has also become cheap and ubiquitous.

    But the hotword detection and speech-to-text models are still IMHO the bottleneck. Hotword detection is a field where you need a very small and lightweight model that only detects a specific word or phrase in a very reliable way. Snowboy was an amazing FOSS project - which also came with this cool idea of "crowd-funded models", where in order to download a model for a certain hotword you were first supposed to provide three audio tracks where you say that word in order to improve the model. But it's now discontinued because it cost the volunteers too much to run the infra.

    And Mozilla DeepSpeech is a relatively good choice for general-purpose speech-to-text, but it's heavy (it takes 100% of the CPU when it runs on a Raspberry Pi) and it's mostly optimized for English - even support for other Western languages is patchy. OpenAI's recent Whisper model seems like a solid alternative, but it's also plagued by the 100% CPU issue - also, I no longer trust anything that comes from OpenAI, no matter how noble some of their efforts may look.

    If there are other open-source alternatives that solve these problems, I'd be very happy to learn about them. Once these blockers are removed, there should be really no reason for anyone to feed their audio streams to Google or Amazon.

    In the meantime, I'm planning to spend some time playing with some self-hosted LLM model to see if I can replace the Google Assistant library on the last Raspberry Pi that runs it in my home.

  20. It's good that other people are also bringing up the elephant in the room: why do you need to pay money for one more electronic gadget that listens to you 24/7, when voice assistants aren't supposed to be rocket science in 2023 anymore? news.ycombinator.com/item?id=3

    I wrote two articles on how to build custom #VoiceAssistants using just a Raspberry Pi and a microphone, one in 2019 blog.platypush.tech/article/Bu and one in 2020 blog.platypush.tech/article/Bu.
    It's definitely doable and I still have my own custom assistants in the house. However, I had to get around with a #Snowboy model for hotword detection (and Snowboy is now basically abandoned), Mozilla #DeepSpeech model for speech-to-text (and that's quite heavy), and #Mycroft's mimic3 text-to-speech model (and Mycroft is now basically bankrupt). Then writing the integration is relatively easy - I used #Platypush, but it can definitely be done with Home Assistant and OpenHAB too.

    Compared to 3-4 years ago, I think we're now in a state where the content is no longer the issue (just plug into a LLM, and all of your text requests will get an answer), nor integrations are a problem (just write a Platypush event hook on speech detected, and you can connect it to everything, no need for "Works with Google/Alexa" labels). Text-to-speech synthesis has also become cheap and ubiquitous.

    But the hotword detection and speech-to-text models are still IMHO the bottleneck. Hotword detection is a field where you need a very small and lightweight model that only detects a specific word or phrase in a very reliable way. Snowboy was an amazing FOSS project - which also came with this cool idea of "crowd-funded models", where in order to download a model for a certain hotword you were first supposed to provide three audio tracks where you say that word in order to improve the model. But it's now discontinued because it cost the volunteers too much to run the infra.

    And Mozilla DeepSpeech is a relatively good choice for general-purpose speech-to-text, but it's heavy (it takes 100% of the CPU when it runs on a Raspberry Pi) and it's mostly optimized for English - even support for other Western languages is patchy. OpenAI's recent Whisper model seems like a solid alternative, but it's also plagued by the 100% CPU issue - also, I no longer trust anything that comes from OpenAI, no matter how noble some of their efforts may look.

    If there are other open-source alternatives that solve these problems, I'd be very happy to learn about them. Once these blockers are removed, there should be really no reason for anyone to feed their audio streams to Google or Amazon.

    In the meantime, I'm planning to spend some time playing with some self-hosted LLM model to see if I can replace the Google Assistant library on the last Raspberry Pi that runs it in my home.

  21. It's good that other people are also bringing up the elephant in the room: why do you need to pay money for one more electronic gadget that listens to you 24/7, when voice assistants aren't supposed to be rocket science in 2023 anymore? news.ycombinator.com/item?id=3

    I wrote two articles on how to build custom #VoiceAssistants using just a Raspberry Pi and a microphone, one in 2019 blog.platypush.tech/article/Bu and one in 2020 blog.platypush.tech/article/Bu.
    It's definitely doable and I still have my own custom assistants in the house. However, I had to get around with a #Snowboy model for hotword detection (and Snowboy is now basically abandoned), Mozilla #DeepSpeech model for speech-to-text (and that's quite heavy), and #Mycroft's mimic3 text-to-speech model (and Mycroft is now basically bankrupt). Then writing the integration is relatively easy - I used #Platypush, but it can definitely be done with Home Assistant and OpenHAB too.

    Compared to 3-4 years ago, I think we're now in a state where the content is no longer the issue (just plug into a LLM, and all of your text requests will get an answer), nor integrations are a problem (just write a Platypush event hook on speech detected, and you can connect it to everything, no need for "Works with Google/Alexa" labels). Text-to-speech synthesis has also become cheap and ubiquitous.

    But the hotword detection and speech-to-text models are still IMHO the bottleneck. Hotword detection is a field where you need a very small and lightweight model that only detects a specific word or phrase in a very reliable way. Snowboy was an amazing FOSS project - which also came with this cool idea of "crowd-funded models", where in order to download a model for a certain hotword you were first supposed to provide three audio tracks where you say that word in order to improve the model. But it's now discontinued because it cost the volunteers too much to run the infra.

    And Mozilla DeepSpeech is a relatively good choice for general-purpose speech-to-text, but it's heavy (it takes 100% of the CPU when it runs on a Raspberry Pi) and it's mostly optimized for English - even support for other Western languages is patchy. OpenAI's recent Whisper model seems like a solid alternative, but it's also plagued by the 100% CPU issue - also, I no longer trust anything that comes from OpenAI, no matter how noble some of their efforts may look.

    If there are other open-source alternatives that solve these problems, I'd be very happy to learn about them. Once these blockers are removed, there should be really no reason for anyone to feed their audio streams to Google or Amazon.

    In the meantime, I'm planning to spend some time playing with some self-hosted LLM model to see if I can replace the Google Assistant library on the last Raspberry Pi that runs it in my home.