#datamining — Public Fediverse posts on home.social

deitel @[email protected] · 2026-05-03 · 14:07 UTC

Join me Tuesday for my next Python Data Science & AI Full Throttle! https://deitel.com/PYDSFT

O'Reilly Media Pearson #deitel #python #machinelearning #deeplearning #NLP #datamining #ApacheSpark #BigData #IoT #GenAI

#deitel #python #machinelearning #deeplearning #nlp #datamining

RaymondPierreL3 @[email protected] · 2026-04-27 · 05:50 UTC

@danmac
If my understanding of big data is correct and not about #AISlop #LLM hoovering, then the data centres with need are ‘boutique’ for curated datasets dealing with very narrow applications in health, academic and commercial research, medecine, and biology, etc… Those we can accommodate and ‘federate’ when needed. The data mining tools, while computer intensive, don’t need them either. So, yeah… piss off and no, we’re not goiong to miss out on anything, especially when the #bubble bursts… soonish.

#DataCentres #BigData #DataMining #PatternMatching

#aislop #llm #bubble #datacentres #bigdata #datamining

RaymondPierreL3 @[email protected] · 2026-04-27 · 05:50 UTC

@danmac
If my understanding of big data is correct and not about #AISlop #LLM hoovering, then the data centres with need are ‘boutique’ for curated datasets dealing with very narrow applications in health, academic and commercial research, medecine, and biology, etc… Those we can accommodate and ‘federate’ when needed. The data mining tools, while computer intensive, don’t need them either. So, yeah… piss off and no, we’re not goiong to miss out on anything, especially when the #bubble bursts… soonish.

#DataCentres #BigData #DataMining #PatternMatching

#aislop #llm #bubble #datacentres #bigdata #datamining

RaymondPierreL3 @[email protected] · 2026-04-27 · 05:50 UTC

@danmac
If my understanding of big data is correct and not about #AISlop #LLM hoovering, then the data centres with need are ‘boutique’ for curated datasets dealing with very narrow applications in health, academic and commercial research, medecine, and biology, etc… Those we can accommodate and ‘federate’ when needed. The data mining tools, while computer intensive, don’t need them either. So, yeah… piss off and no, we’re not goiong to miss out on anything, especially when the #bubble bursts… soonish.

#DataCentres #BigData #DataMining #PatternMatching

#aislop #llm #bubble #datacentres #bigdata #datamining

RaymondPierreL3 @[email protected] · 2026-04-27 · 05:50 UTC

@danmac
If my understanding of big data is correct and not about #AISlop #LLM hoovering, then the data centres with need are ‘boutique’ for curated datasets dealing with very narrow applications in health, academic and commercial research, medecine, and biology, etc… Those we can accommodate and ‘federate’ when needed. The data mining tools, while computer intensive, don’t need them either. So, yeah… piss off and no, we’re not goiong to miss out on anything, especially when the #bubble bursts… soonish.

#DataCentres #BigData #DataMining #PatternMatching

#patternmatching #datamining #bigdata #datacentres #bubble #llm

RaymondPierreL3 @[email protected] · 2026-04-27 · 05:50 UTC

@danmac
If my understanding of big data is correct and not about #AISlop #LLM hoovering, then the data centres with need are ‘boutique’ for curated datasets dealing with very narrow applications in health, academic and commercial research, medecine, and biology, etc… Those we can accommodate and ‘federate’ when needed. The data mining tools, while computer intensive, don’t need them either. So, yeah… piss off and no, we’re not goiong to miss out on anything, especially when the #bubble bursts… soonish.

#DataCentres #BigData #DataMining #PatternMatching

#aislop #llm #bubble #datacentres #bigdata #datamining

Killer Rabbit 90 @[email protected] · 2026-04-24 · 19:38 UTC

Palantir Is Helping Trump’s IRS Conduct “Massive-Scale” Data Mining https://theintercept.com/2026/04/24/palantir-irs-contract-data/ #uspoli #TrumpRegime #Palantir #DataMining #PrivacyRights #RegulateTech #TechRegulation #CivilRights

#uspoli #trumpregime #palantir #datamining #privacyrights #regulatetech

Nu Modular @[email protected] · 2026-04-22 · 20:01 UTC

@deFractal #AbuseOfPower #DataMining #Capitalism #TechBros #JustPlainGreed

#abuseofpower #datamining #capitalism #techbros #justplaingreed

Nu Modular @[email protected] · 2026-04-17 · 19:37 UTC

@tobru #Linux #DataMining #AbuseOfPower

#linux #datamining #abuseofpower

DoomsdaysCW @[email protected] · 2026-03-25 · 21:02 UTC

@dalias I blocked and reported. It seems to me that mining folks' posts would be a violation of some server terms (I believe it is a violation of Kolektiva's terms). Many folks have their posts set to auto-delete, so this definitely seems like Seldo's product would be recording posts that would otherwise disappear. I also noticed that some folks from Kolektiva are following that account.

#LLMs #Fediverse #Datamining #AutoDelete #Safety #Security

#llms #fediverse #datamining #autodelete #safety #security

DoomsdaysCW @[email protected] · 2026-03-25 · 21:02 UTC

@dalias I blocked and reported. It seems to me that mining folks' posts would be a violation of some server terms (I believe it is a violation of Kolektiva's terms). Many folks have their posts set to auto-delete, so this definitely seems like Seldo's product would be recording posts that would otherwise disappear. I also noticed that some folks from Kolektiva are following that account.

#LLMs #Fediverse #Datamining #AutoDelete #Safety #Security

#llms #fediverse #datamining #autodelete #safety #security

DoomsdaysCW @[email protected] · 2026-03-25 · 21:02 UTC

@dalias I blocked and reported. It seems to me that mining folks' posts would be a violation of some server terms (I believe it is a violation of Kolektiva's terms). Many folks have their posts set to auto-delete, so this definitely seems like Seldo's product would be recording posts that would otherwise disappear. I also noticed that some folks from Kolektiva are following that account.

#LLMs #Fediverse #Datamining #AutoDelete #Safety #Security

#llms #fediverse #datamining #autodelete #safety #security

DoomsdaysCW @[email protected] · 2026-03-25 · 21:02 UTC

@dalias I blocked and reported. It seems to me that mining folks' posts would be a violation of some server terms (I believe it is a violation of Kolektiva's terms). Many folks have their posts set to auto-delete, so this definitely seems like Seldo's product would be recording posts that would otherwise disappear. I also noticed that some folks from Kolektiva are following that account.

#LLMs #Fediverse #Datamining #AutoDelete #Safety #Security

#security #safety #autodelete #datamining #fediverse #llms

DoomsdaysCW @[email protected] · 2026-03-25 · 21:02 UTC

@dalias I blocked and reported. It seems to me that mining folks' posts would be a violation of some server terms (I believe it is a violation of Kolektiva's terms). Many folks have their posts set to auto-delete, so this definitely seems like Seldo's product would be recording posts that would otherwise disappear. I also noticed that some folks from Kolektiva are following that account.

#LLMs #Fediverse #Datamining #AutoDelete #Safety #Security

#llms #fediverse #datamining #autodelete #safety #security

BBF des DIPF @[email protected] · 2026-03-17 · 10:36 UTC

📣Der Märzvortrag unseres #DHELab nächste Woche am

📅 Fr, 27.3.2026 | 12-13 Uhr | Online

Mit Max Zeterberg (@subugoe) & Lasse Clausen (@unigoettingen) widmet sich der #DigitaleEdition des #Pädagogen Klaus Mollenhauer, ihrer Entstehung und den Nutzungsmöglichkeiten:
➡️ https://bbf.dipf.de/de/aktuell/termine/dhelab-vortrag-2026-03

#histed #textmining #datamining #DigitalHistory #DH #OpenData #FDM #history #histodons #OpenAccess @dipf_aktuell

#dhelab #digitaleedition #padagogen #histed #textmining #datamining

Suisse @[email protected] · 2026-03-04 · 03:45 UTC

https://www.europesays.com/ch-fr/31143/ Ken Griffey Jr.’s Winning Run : des codes de triche SNES cachés ont été découverts après 30 ans, permettant aux joueurs rétro de débloquer 4 équipes secrètes #baseball #caché #CheatCodes #DataMining #équipesD'expansion #équipesSecrètes #InformationsSurDesOrdinateursPortatifs #KenGriffeyJr #N64 #Nintendo #nouvelles #rapport #Rare #RétroGaming #revues #Science #ScienceAndTechnology #Sciences #SciencesEtTechnologies #SNES #Suisse #Technologies #Technology #test

#test #technology #technologies #suisse #snes #sciencesettechnologies

DoomsdaysCW @[email protected] · 2026-02-20 · 16:37 UTC

#AIBots may lead to the end of the internet as we know it

In recent weeks, #OpenDemocracy’s website has been repeatedly brought down by an army of bots. We’re not the only ones

Matthew Linares
20 February 2026

Excerpt: "Slater explained that 'the traffic often arrives through anonymous residential IPs', referring to residential proxy networks that route internet traffic through intermediary servers using IP addresses assigned by internet service providers to real homeowners. This, he said, makes it 'hard to distinguish ‘normal users’ from automated collection'. [That's not right and needs to be changed!!!]

" 'We're being forced into permanent defence mode. #ResidentialProxyNetworks let #AIScrapers hide in plain sight, rotate identities, and extract data at scale. That shifts real costs onto projects that exist to serve people, not feed training pipelines."

#aibots #opendemocracy #residentialproxynetworks #aiscrapers #aisucks #ai

BBF des DIPF @[email protected] · 2026-02-16 · 08:56 UTC

@dipf_aktuell

📣Der Märzvortrag unseres #DHELab am
📅 27.3.2026 | 12-13 Uhr | Online
mit Max Zeterberg (@subugoe) & Lasse Clausen (@unigoettingen) widmet sich der #DigitaleEdition des #Pädagogen Klaus Mollenhauer, ihrer Entstehung und den Nutzungsmöglichkeiten:
➡️ bbf.dipf.de/de/aktuell/t...

#histed #textmining #datamining #DigitalHistory #DH #history #histodons @dipf_aktuell

#dhelab #digitaleedition #padagogen #histed #textmining #datamining

Pluralistic: Daily links from Cory Doctorow – No trackers, no ads. Black type, white background. Privacy policy: we don't collect or retain any data at all ever period. [Unofficial] @[email protected] · 2026-02-12 · 08:42 UTC

Pluralistic: Doctors' union may yet save the NHS from Palantir (12 Feb 2026)

https://web.brid.gy/r/https://pluralistic.net/2026/02/12/palantir-is-ice/

#uncategorized #collaborators #datamining #ehrs #fuckice #godwinexceptionalism

BlueCyberSerpent @[email protected] · 2026-02-11 · 03:05 UTC

BlueSky’s Solution To Moderating Is Moderating Without Moderating via Social Proximity

I have noticed a lot of people are confused about why some posts don’t show up on threads, though they are not labeled by the moderation layer. Bluesky has begun using what it calls social neighborhoods (or network proximity) as a ranking signal for replies in threads. Replies from people who are closer to you in the social graph, accounts you follow, interact with, or share mutual connections with, are prioritized and shown more prominently. Replies from accounts that are farther away in that network are down-ranked. They are pushed far down the thread or placed behind “hidden replies.”

Each person gets their own unique view of a thread based on their social graph. It creates the impression that replies from distant users simply don’t exist. This is true even though they’re still technically public and viewable if you expand the thread or adjust filters. Bluesky is explicitly using features of subgraphs to moderate without moderating. Their reasoning is that if you can’t see each other, you can’t harass each other. Ergo, there is nothing to moderate.

Bluesky mentions that here:

https://bsky.social/about/blog/10-31-2025-building-healthier-social-media-update

As a digression, I’m not going to lie: I really enjoyed working on software built on the AT protocol, but their fucking users are so goddamn weird. It’s sort of like enjoying building houses, but hating every single person who moves into them. But, you don’t have to deal with them because you’re just the contractor. That is how I feel about Bluesky. I hate the people. I really like the protocol and infrastructure.

I sort of am a sadist who does enjoy drama, so I do get schadenfreude from people with social media addictions and parasocial fixations who reply to random people on Bluesky, because they don’t realize their replies are disconnected from the author’s thread unless that person is within their network. They aren’t part of the conversation they think they are. They’re algorithmically isolated from everyone else. Their replies aren’t viewable from the author’s thread because of how Bluesky handles social neighborhoods.

Bluesky’s idea of social neighborhoods is about grouping users into overlapping clusters based on real interaction patterns rather than just the follow graph. Unlike Twitter, it does not treat the network as one big public square. Instead, it models networks of “social neighborhoods” made up of people you follow, people who follow you, people you frequently interact with, and people who are closely connected to those groups. They’re soft, probabilistic groupings rather than strict labels.

Everyone does not see the same replies. Bluesky is being a bit vague with “hidden.” Hidden means your reply is still anchored to the thread and can be expanded. There is another way Bluesky can handle this. Bluesky uses social neighborhoods to judge contextual relevance. Replies from people inside or near your social neighborhood are more likely to be shown inline with a thread, expanded by default, or served in feeds. Replies from outside your neighborhood are still public and still indexed, but they’re treated as lower-context contributions.

Basically, if you reply to a thread, you will see it anchored to the conversation, and everyone will see it in search results, as a hashtag, or from your profile, but it will not be accessible via the thread of the person you were replying to. It is like shadow-banning people from threads unless they are strongly networked.

Because people have not been working with the AT Protocol like I have, they assume they are shadow-banned across the entire Bluesky app view. No—everyone is automatically shadow-banned from everyone else unless they are within the same social neighborhood. In other words, you are not part of the conversation you think you are joining because you are not part of their social group.

Your replies will appear in profiles, hashtag feeds, or search results without being visually anchored to the full thread. Discovery impressions are neighborhood-agnostic: they serve content because it matches a query, tag, or activity stream. Once the reply is shown, the app then decides whether it’s worth pulling in the rest of the conversation for you. If the original author and most participants fall outside your neighborhood, Bluesky often chooses not to expand that context automatically.

Bluesky really is trying to avoid having to moderate, so this is their solution. Instead of banning or issuing takedown labels to DIDs, the system lets replies exist everywhere, but not in that particular instance of the thread.

I find this ironic because a large reason why many people are staying on Bluesky and not moving to the fediverse—thank God, because I do not want them there—is discoverability, virality, and engagement.

In case anyone is asking how I know so much about how these algorithms work: I was a consultant on a lot of these types of algorithms, so I certainly hope I’d know how they work, lol. No, you get no more details about the work I’ve done. I have no hand in the algorithm Bluesky is using, but I have proposed and implemented that type of algorithm before.

I have an interest in noetics and the noosphere. A large amount of my ontological work is an extension of my attempts to model domains that have no spatial or temporal coordinates. The question is how do you generalize a metric space that has no physically, spatial properties. I went to school to try to formalize those ideas. Turns out they’re rather useful for digital social networks, too. The ontological analog to spatial distance, when you have no space, is a graph of similarities.

This can be modeled by representing each item as a node in a weighted graph, where edges are weighted by dissimilarity rather than similarity. Highly similar items are connected by low-weight edges, while less similar items are connected by higher-weight edges. Distances in the graph, computed using standard shortest-path algorithms, then correspond to degrees of similarity. Closely related items are separated by short path lengths, while increasingly dissimilar items require longer paths through the graph. It turns out that attempts to generalize metric spaces for noetic domains—to model noetic/psychic spaces—are actually pretty useful for social media algorithms, lol.

#4chan #8chan #abtesting #activitypub #addiction #addictions