home.social

#fulltextsearch — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #fulltextsearch, aggregated by home.social.

  1. Oh look, another "groundbreaking" #blog post about #DuckDB from a self-proclaimed data wizard. 🙄 Apparently, the limitations of basic text queries are just too much for our hero, who bravely delves into the wild world of Full-Text Search. 🌟 Spoiler alert: it's as thrilling as watching paint dry on a data frame. 🥱
    peterdohertys.website/blog-pos #DataWizard #FullTextSearch #DataFrame #HackerNews #ngated

  2. Oh look, another "groundbreaking" #blog post about #DuckDB from a self-proclaimed data wizard. 🙄 Apparently, the limitations of basic text queries are just too much for our hero, who bravely delves into the wild world of Full-Text Search. 🌟 Spoiler alert: it's as thrilling as watching paint dry on a data frame. 🥱
    peterdohertys.website/blog-pos #DataWizard #FullTextSearch #DataFrame #HackerNews #ngated

  3. Oh look, another "groundbreaking" #blog post about #DuckDB from a self-proclaimed data wizard. 🙄 Apparently, the limitations of basic text queries are just too much for our hero, who bravely delves into the wild world of Full-Text Search. 🌟 Spoiler alert: it's as thrilling as watching paint dry on a data frame. 🥱
    peterdohertys.website/blog-pos #DataWizard #FullTextSearch #DataFrame #HackerNews #ngated

  4. Oh look, another "groundbreaking" #blog post about #DuckDB from a self-proclaimed data wizard. 🙄 Apparently, the limitations of basic text queries are just too much for our hero, who bravely delves into the wild world of Full-Text Search. 🌟 Spoiler alert: it's as thrilling as watching paint dry on a data frame. 🥱
    peterdohertys.website/blog-pos #DataWizard #FullTextSearch #DataFrame #HackerNews #ngated

  5. Oh look, another "groundbreaking" #blog post about #DuckDB from a self-proclaimed data wizard. 🙄 Apparently, the limitations of basic text queries are just too much for our hero, who bravely delves into the wild world of Full-Text Search. 🌟 Spoiler alert: it's as thrilling as watching paint dry on a data frame. 🥱
    peterdohertys.website/blog-pos #DataWizard #FullTextSearch #DataFrame #HackerNews #ngated

  6. @geneapleau I too have had some great success with the full text search. Particularly with #wills, finding those executors and beneficiaries you might never have otherwise located. #familysearch #fulltextsearch #genealogy @genealogy

  7. New feature.

    I've added #fulltextsearch to beyng.com using #PageFind. I'm excited because Google/Bing/Yandex only index 10% of the ~40,000 pages–philosophy texts don't contain the keywords advertisers pay for. Now it's possible to search them all.

    Features I'm hoping for in future releases of PageFind are open-results-in-new-tab and include og:image in results.

    It takes half-a-day to FTP upload the ~180 megabytes of index files, so I won't be updating indexes often.

    beyng.com/hb/pagefind.html

  8. @Ben Pate 🤘🏻 Allow me to take a look at this from a Hubzilla/(streams)/Forte point of view.

    The Sin of Overwhelming Complexity: Instance Selection Paralysis


    The only way to really combat this effectively is by hiding the whole concept of servers/instances at first, railroading everyone to a server and only letting them know about decentralisation and servers/instances after the fact.

    In theory, this could be doable with Hubzilla, (streams) and Forte, and even better than with Mastodon with its themed servers. It wouldn't make sense to offer Hubzilla, (streams) or Forte servers for certain topics or target audiences, seeing as the whole thing would become moot the very moment when you make your first clone on another server. Simply build a kind of "automatic on-boarder" that sends everyone to the geographically closest open-registration server.

    In practice, that'd be a bad idea, but for a different reason than on Mastodon. And that's how these servers tend to be very different. Not in topic. Not in target audiences. Not in rules. But in features. Hubzilla is modular, (streams) is modular, Forte is modular, and each admin decides differently on which "apps" to activate. Then you want to join Hubzilla for one cool feature, but the on-boarder railroads you to a server where that very feature isn't even activated.

    Sure, the on-boarder could include the option to select certain features that you absolutely must have in your new home and then pick a server that has them. But that'd be extra hassle and extra confusing.

    Besides, where'd you put that on-boarder? On the official Hubzilla website? Haha, no can do. The official Hubzilla website is a webpage on a Hubzilla channel itself. It's all just dumb old static HTML with a CSS. If it's even HTML and not Markdown or BBcode, that is. You couldn't add scripts to it if you tried.

    Oh, and (streams) and Forte don't even have official websites. And (streams) will never have one, seeing as it's officially and intentionally nameless, brandless and totally not even a project. Their "websites" are readme files in their code repositories on Codeberg.

    The Sin of Inconsistent Navigation: Timeline Turmoil


    The streams on Hubzilla, (streams) and Forte are quite a bit different from Mastodon timelines.

    First of all, what you usually don't have on public servers is the counterpart to Mastodon's local timeline and Mastodon's federated timeline. On all three, this would be only one stream, the "public stream" or "pubstream". It can be switched by the admin to either what'd be local or what'd be federated. However, public servers usually have it off entirely. Unavailable even to local users. That's because the admins don't want to be held liable for what's happening on the pubstream.

    Technically speaking, you only have one stream on a public server, and that's your channel stream. It's much more efficient than a Mastodon timeline because it always shows entire conversations by default instead of detached single-message piecemeal, and because it has a counter for unread messages which even lists these unread messages for you to directly go to the corresponding conversation. But that's another story.

    However, your channel stream can be viewed on your channel page, conversation by conversation, or it can be viewed on the stream page as an actual stream with all conversations shown in a feed/timeline-like fashion, one upon another, and with its own set of built-in filters such as "only my own messages" or "only conversations started by members of one particular privacy group/access list" or "only conversations from one particular group actor". It's actually much more convenient than any Mastodon timeline, but for those who want a Twitter clone for dumb-dumbs, it can be very overwhelming.

    Yes, Hubzilla, (streams) and Forte are much more complex in handling than, say, snac2. But they're also much more complex in features than snac2. That power is their USP. And that power must be harnessed somehow.

    The Sin of Remote Interaction Purgatory: Federation Gymnastics


    Sure, Hubzilla, (streams) and Forte have some of the best built-in search systems in the whole Fediverse. They can pull almost everything onto your channel stream just by searching for it. And if it has replies, chances are they pull these in as well.

    But still, they're geared towards desktop users. They still require copy-paste. Phone users don't copy paste. Most of them don't even know the very concept of copy-paste. For most of those who do, copy-paste is much too fumbly if the input device available to them is a 6" touch screen.

    You can't blame them, though. This is next to impossible to do any differently. I mean, you won't see a button magically appear with which you can pull in just that one post or comment you want to pull in.

    Rather, the issue is that they can only reel in almost everything. Sometimes the search returns nothing, like a void. Sometimes the search runs indefinitely without any kind of result. This may be because someone has blocked your channel, because someone has blocked your entire server, because the server someone is on has blocked you or your entire server, because Hubzilla/(streams)/Forte doesn't understand the URI pasted into the search field or whatever.

    So this is made worse by Hubzilla, (streams) and Forte not knowing what they can search for, what they can't and why not.

    Connecting with someone whom you encounter on your channel stream is fairly easy. Connections can be initiated with only two clicks. Either you click their long name, and you're taken to a pretty much distraction-less local "intermediate page" with a striking green button that's labelled "+ Connect". Or if you don't want to leave the channel page, you hover your mouse cursor over their profile picture, click on the little white arrow that appears, and you get a small menu that offers you the "Connect" option as well. Granted, even some veterans don't know the latter trick because it isn't immediately advertised on the channel page.

    Also, sure, you don't simply follow them right off the bat with nothing else to do like on Mastodon. You're taken to your Connections page, and you have to configure the connection (you don't have to do that on Mastodon because you can't configure connections on Mastodon).

    Following accounts/channels from the directory is a bit easier. The green "+ Connect" button is there right away (unless you're already connected). However, Hubzilla's directory only lists channels based on the Nomad protocol, i.e. Hubzilla and (streams) channels, because ActivityPub is only implemented in an optional, off-by-default-for-new-channels add-on whereas it's in the core and on by default on (streams) and the only available protocol on Forte.

    Importing contents or following actors when seeing them locally on other servers without copy-pasting and searching can be done. It requires OpenWebAuth magic single sign-on, however, and it requires it to be implemented on all servers of all Fediverse server applications from Mastodon to WordPress to Ghost to Flipboard. Hubzilla, (streams) and Forte are the only Fediverse server applications with full (client-side and server-side) OpenWebAuth implementations. But that's of little use if the rest of the Fediverse doesn't have server-side implementations, and Mastodon has even silently rejected a mere client-side implementation already developed to a pull request two years ago.

    The Sin of DM Disasters Waiting to Happen


    I think this is less of an issue on Hubzilla, (streams) and Forte because they handle DMs differently from Mastodon (which "the Fediverse" actually refers to in the article).

    On all three, DMs are integrated into their extensive, fine-grained permissions system in which everything is only public if it's really public. The difference between a post and a DM is not just a switch.

    If I want to DM you, I can either tag you @!{[email protected]} rather than @[url=https://mastodon.social/@benpate]Ben Pate 🤘🏻[/url]. Then you're a) the only one to whom the message is sent (it literally doesn't even go out to any other server than mastodon.social plus my clone on hub.hubzilla.de as can be seen in the delivery report) and b) the only one who is granted permission to view the message.

    Or I can use the padlock icon and select you from the opening list as the sole recipient. The very moment that I select certain recipients, the post I'm composing quits being public, and the padlock icon switches from open to closed. This isn't a one-click or two-click toggle. You don't do that casually. It's basically configuration. It requires so many mouse clicks that you do it consciously and intentionally. If you want to post in private, you have to really want to post in private.

    Better yet: You can default to posting only to a certain limited target audience. In fact, by default on a brand-new channel, you only post to the members of one privacy group/access list (which is a Mastodon list on coke and 'roids). You have to manually reconfigure your new channel if you want to post to the general public by default.

    If you preview your post, you can see whether it's a direct message to one or multiple single connections (envelope icon next to your long name), a limited-permissions message to one or multiple privacy groups/access lists/group actors (closed padlock icon) or actually public (no icon).

    Even better yet: Posts to group actors generally aren't public. Posts to at least Friendica groups, Hubzilla forums, (streams) groups and Forte groups are never public. They do not go out to your followers as well unless they're connected to the same group. And this is independent from whether a group is public or private. You can't accidentially post to a group actor in public, and if you do, you don't post to that group actor at all, at least not in a way that makes the group actor forward your post to its other connections.

    Granted, what does not happen is your background switching from your background colour or background image (which can be user-configured) to red #800000 or a yellow-and-back chevron pattern when you change visibility and permissions to something that isn't public.

    The Sin of Ghost Conversations and Phantom Follower Counts


    And again, when @Tim Chambers says, "the Fediverse", he almost exclusively means Mastodon. He writes as if the entire Fediverse handled conversations as terribly as Mastodon, as if the entire Fediverse was as blissfully unaware of enclosed conversations as Mastodon. Which is not the case.

    Hubzilla, (streams) and Forte, as well as their ancestor Friendica, handle conversations in ways that exceed Mastodon users' imaginations and wildest dreams by magnitudes. Unlike Mastodon, they know threaded conversations, and they see them as enclosed objects where only the start post counts as a post, and everything else counts as a comment.

    This means that once you've received a post on your stream, you will also receive all comments on that post, regardless of whether or not you follow the commenters, regardless of whether or not they mention you. That's because all four reel in the comments not from the commentors, but from the original poster who is perceived as the owner of the thread. Only blocks or channel-wide filters can prevent comments from coming in.

    Beyond that, (streams) was the first to introduce Conversation Containers. Forte inherited them from (streams), and when they were defined in FEP-171b, Hubzilla implemented them, too.

    Here on Hubzilla, I can see all comments in this thread because my channel has fetched them directly from @Johannes Ernst. And I can actually see them right away because that's the default view here on Hubzilla, rather than Mastodon's piecemeal.

    Even if you import a post manually using the search feature (and you better import the actual start post), AFAIK existing comments will eventually be backfilled. Comments that come in after importing will definitely end up on your stream as part of the thread.

    So this is not a shortcoming of the Fediverse. The Fediverse has been able to do better for 15 years. It's a shortcoming of Mastodon.

    The only "issue" here may be that it sometimes takes some time for a comment to show up for some reasons. But unless there are blocks or filters in play, it eventually will.

    The Sin of Invisible Discovery: The Content Mirage


    I'm not going to pick on the audacious implication that "Eugen and team" invented the Fediverse.

    But Tim writes like literally everyone wants "the Fediverse" (read, actually Mastodon) to be literally Twitter without Musk.

    Also:
    • Friendica has had full-blown full-text search since its inception as early as 2010. Five and a half years longer than Mastodon has even existed.
    • Hubzilla has had full-blown full-text search since its inception as early as 2011 when it was forked from Free-Friendika. It has inherited full-text search from Friendica.
    • (streams) and Forte have had full-blown full-text search since their respective inception in 2021 and 2024, both having inherited it themselves.

    Oh, and none of them has an explicit opt-in switch to soothe panicking Twitter converts because panicking Twitter converts have never been the primary target audience of either of them.

    Instead, on Hubzilla, whether someone can find your content depends on whether they've got permission to view it in the first place ("Can view my channel stream and posts"). If it's public, they have it. Full stop. Public is public is public. Stop whining. You've made it public, now deal with everything being able to see it.

    (streams) and Forte behave the same. In addition, they have an extra permission: "Grant search access to your channel stream and posts". This controls who may search your channel stream using your own local search feature while visiting your channel locally. Something that isn't even possible on Mastodon.

    As for not having any content on my channel stream before I connect to anyone: I, for one, do not want some algorithm to force content upon me that I'm not interested in. Full. Frigging. Stop. I want to have full and exclusive control over what I see and what I don't.

    The Sin of User Discovery Hell


    Can it really be that Mastodon's directory is so much worse than Friendica's, Hubzilla's, (streams)' and Forte's directories? I guess it is because it really only lists local accounts on that one particular server. A side-effect of Mastodon being a microblogging service and Twitter clone. And not a full-blown, fully-featured social network and Facebook alternative. No, seriously, it isn't that.

    Friendica is. It was designed as such. It was designed to take Facebook's place, and not by aping and cloning Facebook, but by being better than Facebook.

    The directory on each node is decentralised. It lists all actors known to that node. What's outright unimaginable from a Mastodon point of view: It takes the keywords in the profiles into account. Better even: It ranks suggestions by the number of matching keywords.

    Want something centralised instead? Try the Friendica Directory. Looking for people? Looking for news accounts? Looking for groups? There are specialised tabs for that. Friendica can tell them apart, and so can the Friendica Directory.

    Caveat: The Friendica Directory only lists Friendica accounts. Friendica's built-in directory should list everything it knows. I haven't used Friendica in many years, but I guess this even includes diaspora* accounts because why not?

    Hubzilla has indirectly inherited its directory from Friendica. This is the directory on Netzgemeinde, the biggest Hubzilla hub.

    Again, it lists local as well as federated channels. You can choose whether to see only local channels ("This Website Only") or federated channels as well. You can choose whether channels flagged NSFW shall be listed or not ("Safe Mode"). You can choose to only have group actors listed that let themselves be listed ("Public Forums Only"). You have a cloud of keywords from the keyword lists in the profiles that you can filter by (Mastodon doesn't even have keyword lists in profiles). You have full-text search for names and keywords. There's even a Facebook-style suggestion mode that proposes connections to you with a ranking based on your keywords and their keywords as well as the number of common connections, and that still has the same filters.

    Caveat this time: Hubzilla's directory only supports the one sole protocol built into Hubzilla's core. And that's Zot6. This means that Hubzilla's directory only lists Hubzilla and (streams) channels because Hubzilla and (streams) are the only Fediverse server applications that support Zot6.

    (streams) and Forte have inherited their directories again. And they probably have the most powerful decentralised directories in the entire Fediverse. I'd give you a link, but (streams) directories generally aren't public; only local channels can access them.

    These directories are similar to the ones on Hubzilla. You see local and federated actors, and you can choose to only see local actors ("This Website Only"). You can choose to only see group actors ("Groups Only"). You can choose to not see channels flagged NSFW ("Safe Mode"). What's new: Inactive actors can be kept out, too ("Recently Updated").

    Now it comes: (streams) has ActivityPub built into its core, and it's on by default on new channels. Forte is entirely based on ActivityPub.

    This means that their directories can list anything from anywhere that uses ActivityPub. "Groups Only" gives you Guppe groups, Lemmy communities, /kbin and Mbin magazines, PieFed communities, Mobilizon groups, Flipboard magazines, Friendica groups, Hubzilla forums, (streams) groups, Forte groups etc., all on one list.

    (streams) has a slight edge over Forte here because it also lists Hubzilla and (streams) channels that have ActivityPub off such as the Streams Users Tea Garden where ActivityPub was turned off with the very intention to keep Mastodon out.

    If there was a gigantic Forte server, as big as mastodon.social, and its directory was accessible to the public, that directory would be the best directory in the Fediverse for anything really. If it was on (streams), it would list more, but it would confuse some users of e.g. Mastodon who'd try to follow Hubzilla or (streams) channels that have ActivityPub off. Forte simply doesn't list these because it can't find them.

    A global directory of everything sounds like a good idea, but it's next to impossible to implement.

    Either the directory would go look for actors itself. In order to do that, it would have to know within a split-second not only whenever a new actor is created somewhere so it can index that actor right away, but also whenever a new server is spun up so that the admin actor can be indexed, and that server can be watched. How is it supposed to know all that?

    Well, or the directory, a single, monolithic, centralised website, would have to be hard-coded into all Fediverse server software. That way, each server could immediately report newly created actors to the central directory upon their creation.

    For starters, this would make the whole Fediverse depend on one single centralised website under the control of, if bad comes to worse, one person.

    Besides, this would be a privacy nightmare. Let's suppose I create a new (streams) channel that's supposed to be private. Its existence and all its properties would be sent to the central directory before I can set it to private and restrict its permissions. This wouldn't be so bad on Hubzilla because I'd make the channel private before I turn on PubCrawl and make the channel accessible to the directory in the first place because the directory would only understand ActivityPub.

    Of course, the directory would mostly be built against Mastodon. It would not understand the permissions systems implemented on Hubzilla, (streams) and Forte, and it might happily siphon off the profiles of channels where access to the profile is restricted and make them publicly accessible. On the other hand, this is likely to mean that the directory couldn't read most of Hubzilla's, (streams)' and Forte's profile text fields anyway because Mastodon doesn't have them.

    But such a centralised directory wouldn't make connecting to other users that much easier and more convenient. You'd still have to copy and paste URLs or IDs into your local search and search for them (unless you're on Friendica, Hubzilla, (streams) or Forte where you can connect to URLs directly). At the very least, you should be able to go to the centralised directory and follow anyone just by clicking or tapping them. That, however, would require OpenWebAuth support on both your home server and that directory.

    Ideally, that directory would be firmly built into all instances of all Fediverse software from snac2 to Mastodon to Hubzilla, even replacing any existing directory to confuse people less. But that would make the Fediverse even more dependent on one central website and its owner, something which should be avoided at all cost.

    Lastly, nothing can ever be built into all instances of all Fediverse software. Remember that there's software with living instances that's barely being developed such as Plume. There's even software with living instances that's been officially pronounced dead such as Calckey, Firefish or /kbin. How are Firefish servers supposed to implement such a feature if nobody maintains Firefish anymore, and even the code repository was deleted?

    CC: @Risotto Bias

    #Long #LongPost #CWLong #CWLongPost #FediMeta #FediverseMeta #CWFediMeta #CWFediverseMeta #Fediverse #Friendica #Hubzilla #Streams #(streams) #Forte #OpenWebAuth #SingleSignOn #NomadicIdentity #Search #FullTextSearch #Directory #Permissions #Privacy #Conversations #ThreadedConversations #FEP_171b #ConversationContainers
    1. advanced search operators prototype. status: not quite ready for prime time.
      • has a bunch of goofy operators nobody but me will ever use, such as is:article
      • still missing some classics like lang:, domain:, before:, and after:, and some oddballs like is:bot (would require extra join) and sort: (would break ID-based paging)
      • needs docs, although i know where Past Vyr basically already wrote them: https://github.com/VyrCossont/mastodon/pull/8 😇
    2. indexed full text search prototype. status: heretical.
      • only works on PostgreSQL: SQLite's full-text search is much fussier and requires using a "virtual table" and frankly i can't be bothered, at least tonight
      • direct port of https://github.com/VyrCossont/mastodon/pull/3 and has the same limitations: HTML isn't stripped, and media alt text and poll options aren't indexed
      • fixing that would start by adding a tsvector column that concatenates (with record separators? as an array?) the contents of filterableFields for a status, updates it every time the status or its attachments are edited, and GIN-indexes that column
      • ignores the whole issue of matching posts to language tags and language tags to PG text search configurations by assuming that everything is English
      • still massively faster than unindexed ILIKE that vanilla GTS uses

    edit: fixed a backwards flag in has:media and related operators

    #GoToSocial #GTS #FullText #FullTextSearch

  9. ok, here you go, updated GTS search patches for 0.18.0rc1. notice how they're on my repo? these are completely unofficial. do not bug anyone but me about them.

    1. improved hashtag search. status: upstreamable, mostly.
      • doesn't require # prefix to search hashtags
      • searches for matches anywhere in a hashtag: Mac now matches VintageMac as well as MacOS
      • includes hashtags when not specifically searching for accounts or statuses, like most Mastodon-compatibles
      • doesn't change existing tag sorting. popularity and/or recency might be more useful
    2. offset paging for searches. status: not upstreamable yet.
      • more compatible: many clients can't do ID paging
      • allows paging hashtag search results: Mastodon API has no concept of IDs for hashtags, so ID paging can't work for those anyway
      • possible performance issues: see comments on why main doesn't have it already. personally, i haven't noticed and i run this instance on a tiny VPS
    3. remove search restrictions. status: heretical.
      • searches any post on your instance (except other accounts' private/direct posts, and accounts that have you blocked)
      • includes public, unlisted, your own private and DM posts, and private and DM posts that are replies to you
      • expanded search is default: revert to standard GTS behavior by adding scope:classic or in:library operator to search query
      • definite performance issues: this means searching more posts! GTS does not use either PG full-text indexes/operators or SQLite full-text virtual tables, and this patch doesn't change that.
      • doesn't include alt text of media attachments, or polls, because main doesn't

    i may add more patches to this list in the medium future as i add more functionality to my own instance, for example, date range operators (before:date, after:date), post property operators (has:image,has:poll, has:cw, is:sensitive, visibility:public), threading operators (to:[email protected], is:reply, -is:reply), sort operators (sort:oldest, sort:newest, sort:favs) and maybe PG full-text indexing if i have a really good day (i really don't wanna figure out SQLite's weird shit! someone else do it!)

    randos don't debate me about Fedi search. my clients can't set per-post interaction controls yet so i'll just block you.

    #GoToSocial #GTS #FullText #FullTextSearch

  10. Что ищет он в краю далёком? Как найти смысл жизни с PostgreSQL

    Эта статья родилась из пары лекций, которые я прочитал студентам в рамках курса, посвященного вопросам машинного обучения. Почему именно PostgreSQL? Почему векторы? За последние два года тема языковых моделей стала невероятно популярной, и вместе с этим появилось множество инструментов, доступных даже начинающему инженеру, стремящемуся познакомиться с миром текстового анализа. Доступность этих технологий открывает безграничные возможности для их применения в самых разных областях: от систем управления знаниями до «копилотов», помогающих более тщательно анализировать анамнез пациентов, или информационных киосков, позволяющих собрать идеальную корзину товаров для пикника. Вряд ли данная работа может похвастаться полнотой или глубиной, однако, я надеюсь, что она предоставит те самые “хорошие” точки входа, которые позволят, погружаясь в детали, открыть для себя множество новых интересных и полезных тем для исследований и инженерных проектов. Откроем скрытые смыслы

    habr.com/ru/articles/855712/

    #postgresql #postgres #pgvector #vectorization #fulltextsearch #fulltext_search #hnsw #python #java #Knowledge_Management_Systems

  11. @Daniel de Kay The biggest cultural difference based on server software has to be between Mastodon on the one hand and the Mike Macgirvin creations Friendica, Hubzilla and (streams) on the other hand.

    Not only do they have vastly different user bases, but they developed independently from one another. When the Mastodon culture developed, those who shaped it didn't even know the other side, so they couldn't adopt any of its culture.

    Said other side's culture dates back to 2010 when Friendica was launched as Mistpark, and since that was almost six years before Mastodon, it couldn't be inspired by Mastodon's culture either.

    Add to that that these respective cultures are greatly shaped by technical features and limitations or the lack thereof.

    Mastodon's culture is largely built around its 500-character limit which is ample for your typical phone-wielding Mastodon user. Friendica, Hubzilla and (streams) don't have any defined character limit whatsoever, and its target audience is largely on desktop or laptop computers, often running Linux, with large screens and full-size hardware keyboards.

    So it's the most normal thing in the world for them to write in one post as much as they want while Mastodon users debate whether threads are good, or you should always limit yourself to 500 characters or less.

    Also, alt-text. Mastodon has many disabled users, including blind or visually-impaired users. And it has a dedicated alt-text field for each image. On top of that, it offers 1,500 characters for each alt-text which, in connection with the 500-character limit for toots, has people write detailed image descriptions and explanations and put them into the alt-text. That's often information that doesn't even belong in alt-text, but there's no room for it elsewhere.

    Friendica, Hubzilla and (streams) have unlimited room, so putting stuff into alt-text because the post text is too limited seems ridiculous. But they don't have a vocal disabled community, so there's little interest in accessibility. And neither of them has a dedicated alt-text field. Alt-text is supported, but it has to be manually grafted into the image-embedding code in the post. And there's no official documentation for that, I think not even for Friendica which is the only one out of the three with actually useful end-user documentation.

    It's similar with content warnings. On Mastodon, they're put into the repurposed summary field, and next to nobody knows that it's a repurposed summary field rather than invented for content warnings from scratch. So since Mastodon has a content warning field, writer-side content warnings are huge, but also cause for drama.

    Mastodon 4.0 has introduced filters that can create reader-side content warnings, but hardly anyone uses them, even fewer people support them with keywords or hashtags, many don't even know this feature exists, and it's generally ignored because it's un-Mastodon.

    The Friendica/Hubzilla/(streams) complex doesn't have a content warning field. Hubzilla and (streams) have a summary field labelled as such. Friendica doesn't even have that; it uses a pair of BBcode tags for that.

    And within their own ecosystem, they don't even need it. They've got the "NSFW app" instead, an over-one-decade-old, optional, simple-as-anything substring filter that automatically hides entire posts with all media and everything behind content warnings if it finds one of the entered keywords or hashtags.

    So they can't understand Mastodon's commotion about content warnings, and Mastodon users can't understand why they don't add Mastodon-style content warnings.

    And then there are all the things that were or are being debated on Mastodon and whether or not it should introduce them. Especially the second-wave Twitter refugees are often staunchly against them.

    Full-text search and quote-tweets are being actively used on Twitter to track down and harass members of minorities who have fled to Mastodon. Of course, they don't want Mastodon to introduce either. That is, in the case of full-text search, Mastodon has found a solution, but one that doesn't really federate to the rest of the Fediverse.

    Quotes and text formatting are seen as bad, too. Many don't know quotes because Twitter doesn't have them, and so they think quotes could be used as tools of harassment. And both are seen as making Mastodon feel less like Mastodon and more complicated.

    Friendica and Hubzilla had all this before there was Mastodon, and (streams) inherited it from its long line of ancestors. Their users have gotten so used to having all this that they don't understand what the problems should be, also because they're so detached from Mastodon's culture. So they keep on using these features unashamedly, even around Mastodon users.

    Some differences are rather simple. Take mentions, for example. Friendica has always used long names for mentions, as does its offspring. Mastodon users may find that freaky. Meanwhile, Friendica, Hubzilla or (streams) users may find Mastodon's mentions cryptic because they use the short name. But even they matter.

    #Long #LongPost #CWLong #CWLongPost #FediMeta #FediverseMeta #CWFediMeta #CWFediverseMeta #FediverseCulture #Fediverse #Mastodon #Friendica #Hubzilla #Streams #(streams) #CharacterLimit #CharacterLimits #500Characters #CW #CWs #ContentWarning #ContentWarnings #ContentWarningMeta #AltText #ImageDescription #ImageDescriptions #ImageDescriptionMeta #CWImageDescriptionMeta #Quotes #QuotePost #QuotePosts #QuoteBoost #QuoteBoost #QuoteTweet #QuoteTweets #QuoteToot #QuoteToots #QuotedShares #TextFormatting #FullTextSearch
  12. @unchartedworlds Thank you! No reply from my admin yet. I found some other posts describing #ElasticSearch as a #ResourceHog, others saying that it's fine after initial indexing. I'm often looking for posts older than Tootfinder's 3mths - even my own! as scrolling back is so slow.

    People may be using hashtags less because they assume we're now searchable, but I wonder what % of users have access to #FullTextSearch - @FediTips is that info available?

  13. Hi @buercher. Now that #Mastodon 4.2 includes the option to "Include Public Posts In Search results" right in the user config, are you planning to use the *indexable* flag for #TootFinder, instead of relying on tags in the profile description? #FullTextSearch #Fulltext #Search

  14. Now that we have proper search permissions in 4.2, is anyone working on a full fediverse search engine like @r000t did with stealthward.xyz? I don't mean just searching the posts that your server knows about, I mean a separate project to search posts from all known good servers. #FullTextSearch #Mastodon #Fulltext #Search

  15. Does the new Mastodon 4.2 search allow searching only my favorites like before? ⭐

    I have an old post I remember adding to favorites which I'd like to find but unfortunately I'm not sure how to do so.

    Appreciate any advice! :AAAAAA:

    EDIT: Looks like this does it! :blobcat:
    Add "in:library" to search your favorites!

    #mastodon #searching #search #mastodonsearch #favorites #fulltextsearch #favourites #favouritesearch #favoritesearch #searchfavorites #searchfavourites

  16. @We Distribute First of all: I wouldn't make it only about #Mastodon. I wouldn't make it mainly about Mastodon either. #MastodonIsNotTheFediverse.

    I wouldn't even make it look like it's only or mainly about Mastodon. I wouldn't give the impression that the #Fediverse is "Mastodon and some other stuff that was bolted onto Mastodon that nobody really gives a damn about, but someone told me I'd better mention that but, like, idfc..."

    Also, I'd split it up, at least into various sub-topics.

    First of alll, I'd present Fediverse projects, one by one, in chronological order instead of in order of size, importance or popularity. I'd with #StatusNet (2008). Or if I were to only include projects that at least understand #ActivityPub, I'd start with #Friendica (2010). I might skip projects that have been discontinued (Red Matrix, Osada, Zap...) or abandoned (Anfora, reel2bits...).

    Also, I'd ask devs, creators who aren't necessarily the devs (#Diaspora*, Friendica and Hubzilla aren't maintained by their respective creators, for example) and experienced users. That's better than trying to only do my own research.

    That is, I might include a mini-series with important dead projects, for example to show how #Hubzilla and #Streams came to exist. Okay, this example could be covered with one or two episodes.

    Depending on the target audience, there could be a tech sub-topic which could cover protocols or other technology.

    Another sub-topic would cover culture. Again, not with a focus solely on one specific project. For example, #ContentWarnings and the necessity therefor aren't only a Mastodon thing, especially since other projects cover them in different ways, unbeknownst to almost everyone on ActivityPub-based projects.

    #FullTextSearch is a hot topic only on Mastodon which didn't even have it until a few days ago. On most other projects, it's absolutely normal to have it. #Quotes are similar: On Mastodon, they're still being debated. Almost everywhere else, they've been available forever, and even quoting Mastodon toots has been possible forever.

    I'd also look at #AltText and #ImageDescriptions from a not-only-Mastodon point of view to break with the general perception that an #ImageDescription always goes into alt-text, full stop, because there's no room anywhere else.

    A few episodes would be Mastodon-centric. One would be about the various stages of #TwitterMigration which pretty much exclusively led to Mastodon. Also, I might do an episode on how Mastodon newbies discover the Fediverse around #MastodonSocial by and by. And I'd certainly do one about how Mastodon users perceive and react upon the Fediverse outside of Mastodon and what comes in from there. Particularly those who either want the Fediverse to be Mastodon and only Mastodon or want everyone else out there to only do what can be done on Mastodon, i.e. forgo #TextFormatting and limit their posts to #500Characters, regardless of how many they could actually post.

    The #Threadiverse in general and the #RedditMigration definitely deserve their own episode.

    Another episode, later into the podcast, could cover cultural differences caused by how the various communities came into the Fediverse. There are the millions who have come over from #Twitter since late October 2022, and who have adapted parts of the old pre-Musk Twitter culture to Mastodon with little regard for anything else because they weren't even aware of there being anything else. There are the fresh arrivals who can't stop acting like they're still on #X, many of whom are pining for a #Bluesky invitation because it promises to be even closer to "literally Twitter without Musk". There is the "old guard" from Friendica and Hubzilla. There is the Threadiverse which basically continues to live the #Reddit culture in decentralised, non-corporate places now and tries to put up with hardly working moderation.

    Last but not least, maybe a look at media coverage could be worth some episodes.
  17. @marqle People can't find instances with over #500Characters because they don't know that #Firefish
    • exists
    • is fully federated with #Mastodon
    • gives you 3,000 characters by default or even more
    • has had #FullTextSearch, #Quotes and #TextFormatting since its inception when it was still #CalcKey
    • is easier to move to from Mastodon than another Mastodon instance

    And that's because everyone and their dog keeps talking about the #Fediverse as being only Mastodon.

    Or maybe it was hard enough already to learn Mastodon's UI coming from Twitter, so they aren't ready to learn yet another UI.

    Or they can't find an app named "Firefish" in their app store.
  18. CW: CW: long (almost 1,700 characters, mostly structured as a list), Mastodon vs non-Mastodon meta
    I'm wondering...

    What if someone on #Mastodon started a campaign for defederating everything in the #Fediverse that isn't Mastodon?

    Reasons:
    • users of these projects keep shoving them into the faces of Mastodon users
    • they think their projects are better than Mastodon
    • they keep bragging about what their projects can do that Mastodon can't
    • they spam Mastodon timelines with #LongPosts with over #500Characters
    • they pester Mastodon users with their #RichText #TextFormatting
    • they even use text formatting that Mastodon doesn't support, that must be intentional
    • their mentions look weird because they're different from Mastodon mentions
    • or they don't mention anyone at all when replying
    • they can #FullTextSearch Mastodon already now
    • they can make #QuoteToots of Mastodon toots, and nothing can stop them
    • they rarely add alt-text
    • they don't add content warnings because they don't have instance rules that demand so
    • or they don't add content warnings because they claim that they'd allegedly got a better way of handling them, and the content warning field was actually for something else
    • they never mark their images sensitive
    • not a single one of their instances has signed the Mastodon Covenant
    • you can't reach the admins of their instances
    • or if you can, and you report a user for some of the stuff above, the admin claims the user hadn't done anything wrong
    • it confuses and disturbs newbies when they find out that the Fediverse is not only Mastodon
    • it makes the Fediverse too complicated if not everything is Mastodon


    If you actually find this a good idea, keep in mind that defederating an entire decentralised project is a game of Whack-a-Mole.
  19. Folks should take a look at the #fullTextsearch function that Eugen is proposing for #Mastodon and how closely it resembles the #search patch of the woman who he kicked off the Mastodon repo GitHub & Discord a couple of months ago. Also, her patch was more configurable that the thing he's bringing to main that has zero attribution of her work. #fediverse #mastodev #MastoDevs

  20. update on the Mastodon extended search patch:

    Feditext work is now taking up all of my Fedi time, and i no longer have the bandwidth to support Mastodon 4.2.x or implement other changes people have requested, so i'm going to have to drop the project.

    since there still seems to be a lot of interest in being able to find things on Mastodon, i hope someone else can pick it up, and i'm happy to answer questions to get people pointed in the right direction.

    #MastoAdmin #MastoDev #FullText #Search #FullTextSearch

  21. I've been pondering the posts by @kissane and @siderea about 'not finding your people' #OnHere and am wondering if sentiments toward fediverse-wide #FullTextSearch have shifted at all. I know this has been implemented several times, and then been graciously shut down by developers who listened to community feedback.
    #Mastodon #Fulltext #Search

  22. @Shini92 @elk
    Would be neat if it resized the search bar to fill up more of the screen when you start typing to give more space for results.

    When using search on Universeodon it also returns post content which gets a bit crowded in the small box, especially on mobile. I'll attach a screenshot.

    The instance uses this custom search patch enabling searching of posts contents: github.com/VyrCossont/mastodon

    Keep up the great work! :fedi:

    #MastodonSearch #Search #Searching #FullTextSearch #elk #elkapp #mastodon #theme #customtheme #mastodontheme #mastodonthemes #themes

  23. @feditips
    This looks like a great option for now but I can't help but think that allowing everything in posts to be searchable as an option would be a better approach. That would do away with hashtags altogether.

    Some Mastodon instances have implemented a custom full text search feature like universeodon.com from github.com/VyrCossont/mastodon I've been using it lately and makes finding things that much easier.

    I'm hoping to see adoption of a way to fulltext search being an instance option, opt-in and opt-out depending on the instance settings implemented by default soon.

    For anyone curious this is being discussed in this GitHub thread: github.com/mastodon/mastodon/i
    #searching #mastodon #search #fulltextsearch #features #suggestions #mastodonsearch #mastodonsearching #hashtags #fediverse #feditips

  24. here's the final iteration of my Mastodon advanced search patch: github.com/VyrCossont/mastodon

    this enables full-text search for posts you haven't interacted with, as well as full-text search for accounts, and includes several advanced filtering operators and parser fixes.

    #FediAdmin #FediDev #MastoAdmin #MastoDev #FullText #Search #FullTextSearch #ElasticSearch

  25. I'm (Paolo Melchiorre), CTO of 20tab, :python: developer and co-organizer of 🇮🇹

    :django: is my favorite web framework, I'm a contributor, member and coach 👩‍💻

    You can see one of my talks at , or conferences 🗣️

    For 20 years I've been using 🐧 and promoting 🥑

    On my blog I also write about 👇
    paulox.net/

  26. Hi all, I’m Paolo Melchiorre, a 🐍 developer who contributes to the 🦄 project and gives talks at tech 🗣️ .

    I’ve been a 🐧 user since 2000 and I use and promote 👨‍💻 .

    I have a degree in Computer Science and currently I'm a 🏡 worker based in Italy.

    I wrote on my personal blog about , , , and technical (, , , , , , , )

    paulox.net/