home.social

#aiscraping — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #aiscraping, aggregated by home.social.

  1. @serigala_tropis ahh. AI scraping is also likely on flipboard.

    So I go through the trouble of avoiding ai scraping on my sites, flipboard requires full rss feeds, and suddenly you have a writing honeypot for ai scrapers.

    Sneaky.

    #flipboard #writing #aiscraping

  2. May have to put my rss feeds into flipboard.

    Edit: no. Full rss feeds are a requirement. It's an ai scraper honeypot.

    I will consider the admin overhead.

    I would rather be writing.

    techcrunch.com/2026/04/02/flip

    #flipboard #socialmedia #writing #aiscraping

  3. I heard back on this the other day. The #SJM aka #SJMN aka #mercurynews turned off their feeds due to #aiscraping , as they offered full articles in the feed.

    Ugh. Still a -4 on productivity.

  4. 🚨 Publishers Strike Back: EU Demands “Pay Up” & UK Says “Let Us Opt Out” of AI Search! 🤖💸

    The “wild west” of AI scraping just hit a massive roadblock. In a double-whammy update from Europe, lawmakers are finally drawing a line in the sand. If you own a website, create content, or work in SEO, the game is changing fast.

    Here is the breakdown of the two massive stories shaking up the tech world this week.

    nbloglinks.com/publishers-stri

    #AI #AIScraping #publishers #AIContent ##AIcontrol #UK #EU #technews #SEO

  5. No surprise here about #aiscraping. The question
    Is if it's efficient and produces value that redeems the cost.

    Yes, i laughed at the #typo in the title. Spellcheck alone should have caught it. 🤣

    #ai

    wired.com/story/ai-bots-are-no

  6. There is an ugly truth in this. I block ai scraping on my sites, but I am not blind to the fact that ai scraping can still happen.

    It's not an industry that pays more than lip service to social responsibility.

    They don't even need the data. That is how misguided it is.

    #ai #aiscraping

    axios.com/2026/02/02/iab-ai-ac

  7. Although the bland the "A.I." generated voice is detestable in the extreme the overall concept is mildly amusing if unoriginal, plus, there's a special treat for #DoctorWho fans in trying to fathom where the contents of the Laser Tracking Room were illicitly scraped from...

    youtube.com/watch?v=sZkB11pO9R8

    #cats #Caturday #AIart #TARDIS #copyright #AIscraping

  8. Scraping for AI training may or may not be legal. But the effort crawlers put into evading detection and blocking is a smoking gun, an admission this scraping is not fair.

    arstechnica.com/tech-policy/20

    #AIscraping #scrapers #ai

  9. Today, Meta's list of sites they've targeted for training their AI was leaked. We're on their list.

    I do everything possible to block AI bots. I use Cloudflare AI bot protection. I block what I can. I don't know if they actually get to read anything, but they want to read us.

    dropsitenews.com/p/meta-facebo

    #Meta #AIscraping

  10. Habe den aktualisierten AGBs von #vinted widersprochen, da sie von Datenauswertung via #Ki #AIscraping sprechen, also Nutzerdaten damit verarbeiten möchten. Da der Account sowieso schon deaktiviert war, habe ich zusätzlich um Löschung gebeten. Nun kriege ich die Antwort, dass vinted ein legitimes Interesse hätte und laut Article 17 of the EU General Data Protection Regulation (GDPR) so ziemlich alle meine Daten weiterhin auswerten darf. Ich könne ja juristische Schritte gehen. (1/2)

  11. #KI randaliert im Netz 🤖🪓 – #Admins halten dagegen 🦸

    Meine @campact -Kolumne aus Mai ist heute tagesaktuell dabei!

    > Herzlichen Dank an alle Admins, die unermüdlich dafür kämpfen, uns Nutzende und den Planeten vor der Gier von KI zu schützen. Ich hoffe, dieser Text ist ein Beitrag für mehr Verständnis zu diesem Thema.

    👉 blog.campact.de/2025/05/ki-ran

    #SysAdmins #SystemadminAppreciationDay #FediAdmins #AI #KIScraping
    #AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #GreenIT

  12. #KI randaliert im Netz 🤖🪓 – #Admins halten dagegen 🦸

    Meine @campact -Kolumne aus Mai ist heute tagesaktuell dabei!

    > Herzlichen Dank an alle Admins, die unermüdlich dafür kämpfen, uns Nutzende und den Planeten vor der Gier von KI zu schützen. Ich hoffe, dieser Text ist ein Beitrag für mehr Verständnis zu diesem Thema.

    👉 blog.campact.de/2025/05/ki-ran

    #SysAdmins #SystemadminAppreciationDay #FediAdmins #AI #KIScraping
    #AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #GreenIT

  13. #KI randaliert im Netz 🤖🪓 – #Admins halten dagegen 🦸

    Meine @campact -Kolumne aus Mai ist heute tagesaktuell dabei!

    > Herzlichen Dank an alle Admins, die unermüdlich dafür kämpfen, uns Nutzende und den Planeten vor der Gier von KI zu schützen. Ich hoffe, dieser Text ist ein Beitrag für mehr Verständnis zu diesem Thema.

    👉 blog.campact.de/2025/05/ki-ran

    #SysAdmins #SystemadminAppreciationDay #FediAdmins #AI #KIScraping
    #AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #GreenIT

  14. #KI randaliert im Netz 🤖🪓 – #Admins halten dagegen 🦸

    Meine @campact -Kolumne aus Mai ist heute tagesaktuell dabei!

    > Herzlichen Dank an alle Admins, die unermüdlich dafür kämpfen, uns Nutzende und den Planeten vor der Gier von KI zu schützen. Ich hoffe, dieser Text ist ein Beitrag für mehr Verständnis zu diesem Thema.

    👉 blog.campact.de/2025/05/ki-ran

    #SysAdmins #SystemadminAppreciationDay #FediAdmins #AI #KIScraping
    #AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #GreenIT

  15. #KI randaliert im Netz 🤖🪓 – #Admins halten dagegen 🦸

    Meine @campact -Kolumne aus Mai ist heute tagesaktuell dabei!

    > Herzlichen Dank an alle Admins, die unermüdlich dafür kämpfen, uns Nutzende und den Planeten vor der Gier von KI zu schützen. Ich hoffe, dieser Text ist ein Beitrag für mehr Verständnis zu diesem Thema.

    👉 blog.campact.de/2025/05/ki-ran

    #SysAdmins #SystemadminAppreciationDay #FediAdmins #AI #KIScraping
    #AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #GreenIT

  16. 🔍 / #software / #automation / #scraping

    You can build some pretty insane applications using just #LLMs, even if you don't really know what you're doing. But what separates a good AI app from a great AI app is one thing, and that's data.

    🐱🔗 laravista.altervista.org/CatLi

    #catlink #SoftwareAutomation #SoftwareAutomationScraping #Python #BrightData #AIScraping #AI

  17. 🔍 / #software / #automation / #scraping

    You can build some pretty insane applications using just #LLMs, even if you don't really know what you're doing. But what separates a good AI app from a great AI app is one thing, and that's data.

    🐱🔗 laravista.altervista.org/CatLi

    #catlink #SoftwareAutomation #SoftwareAutomationScraping #Python #BrightData #AIScraping #AI

  18. 🔍 / #software / #automation / #scraping

    You can build some pretty insane applications using just #LLMs, even if you don't really know what you're doing. But what separates a good AI app from a great AI app is one thing, and that's data.

    🐱🔗 laravista.altervista.org/CatLi

    #catlink #SoftwareAutomation #SoftwareAutomationScraping #Python #BrightData #AIScraping #AI

  19. News Summary: Cloudflare Launches Pay Per Crawl for AI Scraping; Amazon Hits One Million Robots

    You’ve heard, of course, of pay-per-view. And we are used to streaming revenue on a pay-per basis from the likes of Audible and Spotify. This week has seen the launch (admittedly at the moment in beta) of possibly the most transformative source of pay-per revenue…
    selfpublishingadvice.org/cloud

    #AIscraping #Amazonrobots #Cloudflare #generativeAI #PayPerCrawl
    @indieauthors

  20. A website appears to be scraping hashtags and creating AI articles, and then replying to the OG post

    It stole one of my posts (oldfriends.live/@paul/11477009) for its AI created article then spammed me from [email protected]

    It's doing it with #HashTagGames tags and other trending hashtags.

    Edit: making links dead as it appears to serve malware now: www.trend247daily.com/articles

    #MastoAdmin

    Article created from scraped post: www.trend247daily.com/article/mastering-the-art-of-the-productive-day-wake-up-look-busy-go-to-bed

    See this thread above, unless the AI content spammer deletes its reply and breaks the thread.

    I don't know where it is getting its content, from it's Mastodon Account ( [email protected] ) account, rss, or the API. If it has an application I would hope [email protected] and [email protected] would shut it down from scraping the API.

    #Spam #Fediblock #AIScraping

  21. The web-scraping is aggressive not just to hoard training data, but also to keep other AI bots from doing the same.

    They're not satisfied with stealing all your content, they also want exclusivity by any means necessary.

    nature.com/articles/d41586-025

    #ai #aiscraping #aichatbots

  22. Wer sich über die vielen tollen Informationsangebote im Internet freut, sollte wissen:

    #KI 🤖 randaliert im Netz – #Admins halten dagegen, damit wir Menschen ungestört surfen können.

    Lest mal, wie Admins ihre absolut frustrierende aber unsichtbare Abwehrarbeit gegen KI beschreiben – im Blog von @campact:
    👉 blog.campact.de/2025/05/ki-ran

    🙏 @flberger

    ❗Nicht vergessen: 25. Juli ist #SysAdminDay

    #FediAdmins #KIScraping #AI #AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep

  23. Wer sich über die vielen tollen Informationsangebote im Internet freut, sollte wissen:

    #KI 🤖 randaliert im Netz – #Admins halten dagegen, damit wir Menschen ungestört surfen können.

    Lest mal, wie Admins ihre absolut frustrierende aber unsichtbare Abwehrarbeit gegen KI beschreiben – im Blog von @campact:
    👉 blog.campact.de/2025/05/ki-ran

    🙏 @flberger

    ❗Nicht vergessen: 25. Juli ist #SysAdminDay

    #FediAdmins #KIScraping #AI #AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep

  24. Wer sich über die vielen tollen Informationsangebote im Internet freut, sollte wissen:

    #KI 🤖 randaliert im Netz – #Admins halten dagegen, damit wir Menschen ungestört surfen können.

    Lest mal, wie Admins ihre absolut frustrierende aber unsichtbare Abwehrarbeit gegen KI beschreiben – im Blog von @campact:
    👉 blog.campact.de/2025/05/ki-ran

    🙏 @flberger

    ❗Nicht vergessen: 25. Juli ist #SysAdminDay

    #FediAdmins #KIScraping #AI #AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep

  25. Wer sich über die vielen tollen Informationsangebote im Internet freut, sollte wissen:

    #KI 🤖 randaliert im Netz – #Admins halten dagegen, damit wir Menschen ungestört surfen können.

    Lest mal, wie Admins ihre absolut frustrierende aber unsichtbare Abwehrarbeit gegen KI beschreiben – im Blog von @campact:
    👉 blog.campact.de/2025/05/ki-ran

    🙏 @flberger

    ❗Nicht vergessen: 25. Juli ist #SysAdminDay

    #FediAdmins #KIScraping #AI #AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep

  26. Wer sich über die vielen tollen Informationsangebote im Internet freut, sollte wissen:

    #KI 🤖 randaliert im Netz – #Admins halten dagegen, damit wir Menschen ungestört surfen können.

    Lest mal, wie Admins ihre absolut frustrierende aber unsichtbare Abwehrarbeit gegen KI beschreiben – im Blog von @campact:
    👉 blog.campact.de/2025/05/ki-ran

    🙏 @flberger

    ❗Nicht vergessen: 25. Juli ist #SysAdminDay

    #FediAdmins #KIScraping #AI #AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep

  27. 🌐 LLM crawlers continue to DDoS SourceHut | sr_ht status

    「 SourceHut continues to face disruptions due to aggressive LLM crawlers. We are continuously working to deploy mitigations. We have deployed a number of mitigations which are keeping the problem contained for now. However, some of our mitigations may impact end-users 」

    status.sr.ht/issues/2025-03-17

    #sourcehut #ddos #aiscraping

  28. Hi #Admins 👋,

    Can you give me quotes that explain your fight against #AIScraping? I'm looking for (verbal) images, metaphors, comparisons, etc. that explain to non-techies what's going on. (efforts, goals, resources...)

    I intend to publish your quotes in a text on @campact 's blog¹ (DE, German NGO).

    The quotes should make your work🙏 visible in a generally understandable way

    ¹ blog.campact.de/author/friedem

    #TDM #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep #Kudurru #Nightshade #Glaze #FediAdmins

  29. Hi #Admins 👋,

    Can you give me quotes that explain your fight against #AIScraping? I'm looking for (verbal) images, metaphors, comparisons, etc. that explain to non-techies what's going on. (efforts, goals, resources...)

    I intend to publish your quotes in a text on @campact 's blog¹ (DE, German NGO).

    The quotes should make your work🙏 visible in a generally understandable way

    ¹ blog.campact.de/author/friedem

    #TDM #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep #Kudurru #Nightshade #Glaze #FediAdmins

  30. Hi #Admins 👋,

    Can you give me quotes that explain your fight against #AIScraping? I'm looking for (verbal) images, metaphors, comparisons, etc. that explain to non-techies what's going on. (efforts, goals, resources...)

    I intend to publish your quotes in a text on @campact 's blog¹ (DE, German NGO).

    The quotes should make your work🙏 visible in a generally understandable way

    ¹ blog.campact.de/author/friedem

    #TDM #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep #Kudurru #Nightshade #Glaze #FediAdmins

  31. Hi #Admins 👋,

    Can you give me quotes that explain your fight against #AIScraping? I'm looking for (verbal) images, metaphors, comparisons, etc. that explain to non-techies what's going on. (efforts, goals, resources...)

    I intend to publish your quotes in a text on @campact 's blog¹ (DE, German NGO).

    The quotes should make your work🙏 visible in a generally understandable way

    ¹ blog.campact.de/author/friedem

    #TDM #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep #Kudurru #Nightshade #Glaze #FediAdmins

  32. Hi #Admins 👋,

    Can you give me quotes that explain your fight against #AIScraping? I'm looking for (verbal) images, metaphors, comparisons, etc. that explain to non-techies what's going on. (efforts, goals, resources...)

    I intend to publish your quotes in a text on @campact 's blog¹ (DE, German NGO).

    The quotes should make your work🙏 visible in a generally understandable way

    ¹ blog.campact.de/author/friedem

    #TDM #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep #Kudurru #Nightshade #Glaze #FediAdmins

  33. "It’s pretty crazy that not only a) these bots shamelessly harvest all your data without asking for permission and b) they do it in such a brute-force manner.
    My coworker and security expert António pointed me to #DarkVisitors, and I’ll probably be installing their #WordPressPlugin on all my sites. For what it’s worth."
    @john_fisherman on #AIscraping
    fred-rocha.medium.com/ai-crawl

  34. 「 OpenAI and Anthropic have stated publicly that they respect robots.txt and blocks to their specific web crawlers, GPTBot and ClaudeBot.

    However, according to TollBit's findings, such blocks are not being respected, as claimed. AI companies, including OpenAI and Anthropic, are simply choosing to "bypass" robots.txt in order to retrieve or scrape all of the content from a given website or page 」

    businessinsider.com/openai-ant

    #OpenAI #AIScraping #AITheft #AI #Cybersecurity #Infosec #Security

  35. 🥸 Perplexity AI Is Lying about Their User Agent • @robb

    「 I checked a few sites and this is just Google Chrome running on Windows 10. So they're using headless browsers to scrape content, ignoring robots.txt, and not sending their user agent string. I can't even block their IP ranges because it appears these headless browsers are not on their IP ranges 」

    rknight.me/blog/perplexity-ai-

    #PerplexityAI #AI #AIScraping #AITheft

  36. Yeah, you're really gonna see which companies are just gonna allow the AI to scrape all their stuff now. I'm a copyleft/creative commons kinda guy. But if you have art that you don't want stolen, the answer is simple.

    MAKE YOUR OWN WEBSITE and put your art there (edit: and use that Glaze type of stuff on your art that wrecks AI, just to be sure)! Neocities is SO easy to set up! Or your own domain and hosting via porkbun, GoDaddy (non-WordPress) - anything at all other than proprietary/walled stuff!

    #tumblr #WordPress #IndieWeb #WeirdWeb #PersonalWebsites #art #AI #LLM #OpenAI #AIScraping #Midjourney #Automattic #neocities #copyleft #CreativeCommons