#aiscraping — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #aiscraping, aggregated by home.social.
-
The Internet Archive just hit one trillion archived web pages—while major news sites block it over AI scraping fears. The irony? We’re losing history to protect it. https://arstechnica.com/tech-policy/2025/11/the-internet-archive-survived-major-copyright-losses-whats-next/ #DigitalPreservation #InternetArchive #AIScraping #NyxIsAVirus
-
@serigala_tropis ahh. AI scraping is also likely on flipboard.
So I go through the trouble of avoiding ai scraping on my sites, flipboard requires full rss feeds, and suddenly you have a writing honeypot for ai scrapers.
Sneaky.
-
May have to put my rss feeds into flipboard.
Edit: no. Full rss feeds are a requirement. It's an ai scraper honeypot.
I will consider the admin overhead.
I would rather be writing.
-
I heard back on this the other day. The #SJM aka #SJMN aka #mercurynews turned off their feeds due to #aiscraping , as they offered full articles in the feed.
Ugh. Still a -4 on productivity.
-
🚨 Publishers Strike Back: EU Demands “Pay Up” & UK Says “Let Us Opt Out” of AI Search! 🤖💸
The “wild west” of AI scraping just hit a massive roadblock. In a double-whammy update from Europe, lawmakers are finally drawing a line in the sand. If you own a website, create content, or work in SEO, the game is changing fast.
Here is the breakdown of the two massive stories shaking up the tech world this week.
#AI #AIScraping #publishers #AIContent ##AIcontrol #UK #EU #technews #SEO
-
Today in "The end of the open internet“: the Internet Archive @internetarchive is offline
-
https://winbuzzer.com/2026/02/15/publishers-block-internet-archive-ai-scraping-fears-xcxwbn/
Publishers Block Internet Archive Over AI Scraping Fears
#AI #WaybackMachine #InternetArchive #Google #Reddit #OpenAI #BigTech #TheNewYorkTimes #NewsPublishers #AIScraping #OpenWeb #CommonCrawl #PerplexityAI #Media
-
https://winbuzzer.com/2026/02/15/publishers-block-internet-archive-ai-scraping-fears-xcxwbn/
Publishers Block Internet Archive Over AI Scraping Fears
#AI #WaybackMachine #InternetArchive #Google #Reddit #OpenAI #BigTech #TheNewYorkTimes #NewsPublishers #AIScraping #OpenWeb #CommonCrawl #PerplexityAI #Media
-
https://winbuzzer.com/2026/02/15/publishers-block-internet-archive-ai-scraping-fears-xcxwbn/
Publishers Block Internet Archive Over AI Scraping Fears
#AI #WaybackMachine #InternetArchive #Google #Reddit #OpenAI #BigTech #TheNewYorkTimes #NewsPublishers #AIScraping #OpenWeb #CommonCrawl #PerplexityAI #Media
-
https://winbuzzer.com/2026/02/15/publishers-block-internet-archive-ai-scraping-fears-xcxwbn/
Publishers Block Internet Archive Over AI Scraping Fears
#AI #WaybackMachine #InternetArchive #Google #Reddit #OpenAI #BigTech #TheNewYorkTimes #NewsPublishers #AIScraping #OpenWeb #CommonCrawl #PerplexityAI #Media
-
https://winbuzzer.com/2026/02/15/publishers-block-internet-archive-ai-scraping-fears-xcxwbn/
Publishers Block Internet Archive Over AI Scraping Fears
#AI #WaybackMachine #InternetArchive #Google #Reddit #OpenAI #BigTech #TheNewYorkTimes #NewsPublishers #AIScraping #OpenWeb #CommonCrawl #PerplexityAI #Media
-
No surprise here about #aiscraping. The question
Is if it's efficient and produces value that redeems the cost.Yes, i laughed at the #typo in the title. Spellcheck alone should have caught it. 🤣
https://www.wired.com/story/ai-bots-are-now-a-signifigant-source-of-web-traffic/
-
There is an ugly truth in this. I block ai scraping on my sites, but I am not blind to the fact that ai scraping can still happen.
It's not an industry that pays more than lip service to social responsibility.
They don't even need the data. That is how misguided it is.
https://www.axios.com/2026/02/02/iab-ai-accountability-publishers-act
-
Although the bland the "A.I." generated voice is detestable in the extreme the overall concept is mildly amusing if unoriginal, plus, there's a special treat for #DoctorWho fans in trying to fathom where the contents of the Laser Tracking Room were illicitly scraped from...
-
(⌐■-■) Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives
-
Scraping for AI training may or may not be legal. But the effort crawlers put into evading detection and blocking is a smoking gun, an admission this scraping is not fair.
-
Today, Meta's list of sites they've targeted for training their AI was leaked. We're on their list.
I do everything possible to block AI bots. I use Cloudflare AI bot protection. I block what I can. I don't know if they actually get to read anything, but they want to read us.
https://www.dropsitenews.com/p/meta-facebook-tech-copyright-privacy-whistleblower
-
Habe den aktualisierten AGBs von #vinted widersprochen, da sie von Datenauswertung via #Ki #AIscraping sprechen, also Nutzerdaten damit verarbeiten möchten. Da der Account sowieso schon deaktiviert war, habe ich zusätzlich um Löschung gebeten. Nun kriege ich die Antwort, dass vinted ein legitimes Interesse hätte und laut Article 17 of the EU General Data Protection Regulation (GDPR) so ziemlich alle meine Daten weiterhin auswerten darf. Ich könne ja juristische Schritte gehen. (1/2)
-
#KI randaliert im Netz 🤖🪓 – #Admins halten dagegen 🦸
Meine @campact -Kolumne aus Mai ist heute tagesaktuell dabei!
> Herzlichen Dank an alle Admins, die unermüdlich dafür kämpfen, uns Nutzende und den Planeten vor der Gier von KI zu schützen. Ich hoffe, dieser Text ist ein Beitrag für mehr Verständnis zu diesem Thema.
👉 https://blog.campact.de/2025/05/ki-randaliert-im-netz-admins-halten-dagegen/
#SysAdmins #SystemadminAppreciationDay #FediAdmins #AI #KIScraping
#AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #GreenIT -
#KI randaliert im Netz 🤖🪓 – #Admins halten dagegen 🦸
Meine @campact -Kolumne aus Mai ist heute tagesaktuell dabei!
> Herzlichen Dank an alle Admins, die unermüdlich dafür kämpfen, uns Nutzende und den Planeten vor der Gier von KI zu schützen. Ich hoffe, dieser Text ist ein Beitrag für mehr Verständnis zu diesem Thema.
👉 https://blog.campact.de/2025/05/ki-randaliert-im-netz-admins-halten-dagegen/
#SysAdmins #SystemadminAppreciationDay #FediAdmins #AI #KIScraping
#AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #GreenIT -
#KI randaliert im Netz 🤖🪓 – #Admins halten dagegen 🦸
Meine @campact -Kolumne aus Mai ist heute tagesaktuell dabei!
> Herzlichen Dank an alle Admins, die unermüdlich dafür kämpfen, uns Nutzende und den Planeten vor der Gier von KI zu schützen. Ich hoffe, dieser Text ist ein Beitrag für mehr Verständnis zu diesem Thema.
👉 https://blog.campact.de/2025/05/ki-randaliert-im-netz-admins-halten-dagegen/
#SysAdmins #SystemadminAppreciationDay #FediAdmins #AI #KIScraping
#AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #GreenIT -
#KI randaliert im Netz 🤖🪓 – #Admins halten dagegen 🦸
Meine @campact -Kolumne aus Mai ist heute tagesaktuell dabei!
> Herzlichen Dank an alle Admins, die unermüdlich dafür kämpfen, uns Nutzende und den Planeten vor der Gier von KI zu schützen. Ich hoffe, dieser Text ist ein Beitrag für mehr Verständnis zu diesem Thema.
👉 https://blog.campact.de/2025/05/ki-randaliert-im-netz-admins-halten-dagegen/
#SysAdmins #SystemadminAppreciationDay #FediAdmins #AI #KIScraping
#AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #GreenIT -
#KI randaliert im Netz 🤖🪓 – #Admins halten dagegen 🦸
Meine @campact -Kolumne aus Mai ist heute tagesaktuell dabei!
> Herzlichen Dank an alle Admins, die unermüdlich dafür kämpfen, uns Nutzende und den Planeten vor der Gier von KI zu schützen. Ich hoffe, dieser Text ist ein Beitrag für mehr Verständnis zu diesem Thema.
👉 https://blog.campact.de/2025/05/ki-randaliert-im-netz-admins-halten-dagegen/
#SysAdmins #SystemadminAppreciationDay #FediAdmins #AI #KIScraping
#AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #GreenIT -
🔍 / #software / #automation / #scraping
You can build some pretty insane applications using just #LLMs, even if you don't really know what you're doing. But what separates a good AI app from a great AI app is one thing, and that's data.
🐱🔗 https://laravista.altervista.org/CatLink/links/321
#catlink #SoftwareAutomation #SoftwareAutomationScraping #Python #BrightData #AIScraping #AI
-
🔍 / #software / #automation / #scraping
You can build some pretty insane applications using just #LLMs, even if you don't really know what you're doing. But what separates a good AI app from a great AI app is one thing, and that's data.
🐱🔗 https://laravista.altervista.org/CatLink/links/321
#catlink #SoftwareAutomation #SoftwareAutomationScraping #Python #BrightData #AIScraping #AI
-
🔍 / #software / #automation / #scraping
You can build some pretty insane applications using just #LLMs, even if you don't really know what you're doing. But what separates a good AI app from a great AI app is one thing, and that's data.
🐱🔗 https://laravista.altervista.org/CatLink/links/321
#catlink #SoftwareAutomation #SoftwareAutomationScraping #Python #BrightData #AIScraping #AI
-
News Summary: Cloudflare Launches Pay Per Crawl for AI Scraping; Amazon Hits One Million Robots
You’ve heard, of course, of pay-per-view. And we are used to streaming revenue on a pay-per basis from the likes of Audible and Spotify. This week has seen the launch (admittedly at the moment in beta) of possibly the most transformative source of pay-per revenue…
https://selfpublishingadvice.org/cloudflare/#AIscraping #Amazonrobots #Cloudflare #generativeAI #PayPerCrawl
@indieauthors -
A website appears to be scraping hashtags and creating AI articles, and then replying to the OG post
It stole one of my posts (https://oldfriends.live/@paul/114770093020700675) for its AI created article then spammed me from [email protected]
It's doing it with #HashTagGames tags and other trending hashtags.
Edit: making links dead as it appears to serve malware now: www.trend247daily.com/articles
Article created from scraped post: www.trend247daily.com/article/mastering-the-art-of-the-productive-day-wake-up-look-busy-go-to-bed
See this thread above, unless the AI content spammer deletes its reply and breaks the thread.
I don't know where it is getting its content, from it's Mastodon Account ( [email protected] ) account, rss, or the API. If it has an application I would hope [email protected] and [email protected] would shut it down from scraping the API.
-
The web-scraping is aggressive not just to hoard training data, but also to keep other AI bots from doing the same.
They're not satisfied with stealing all your content, they also want exclusivity by any means necessary.
-
Wer sich über die vielen tollen Informationsangebote im Internet freut, sollte wissen:
#KI 🤖 randaliert im Netz – #Admins halten dagegen, damit wir Menschen ungestört surfen können.
Lest mal, wie Admins ihre absolut frustrierende aber unsichtbare Abwehrarbeit gegen KI beschreiben – im Blog von @campact:
👉 https://blog.campact.de/2025/05/ki-randaliert-im-netz-admins-halten-dagegen/❗Nicht vergessen: 25. Juli ist #SysAdminDay
#FediAdmins #KIScraping #AI #AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep
-
Wer sich über die vielen tollen Informationsangebote im Internet freut, sollte wissen:
#KI 🤖 randaliert im Netz – #Admins halten dagegen, damit wir Menschen ungestört surfen können.
Lest mal, wie Admins ihre absolut frustrierende aber unsichtbare Abwehrarbeit gegen KI beschreiben – im Blog von @campact:
👉 https://blog.campact.de/2025/05/ki-randaliert-im-netz-admins-halten-dagegen/❗Nicht vergessen: 25. Juli ist #SysAdminDay
#FediAdmins #KIScraping #AI #AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep
-
Wer sich über die vielen tollen Informationsangebote im Internet freut, sollte wissen:
#KI 🤖 randaliert im Netz – #Admins halten dagegen, damit wir Menschen ungestört surfen können.
Lest mal, wie Admins ihre absolut frustrierende aber unsichtbare Abwehrarbeit gegen KI beschreiben – im Blog von @campact:
👉 https://blog.campact.de/2025/05/ki-randaliert-im-netz-admins-halten-dagegen/❗Nicht vergessen: 25. Juli ist #SysAdminDay
#FediAdmins #KIScraping #AI #AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep
-
Wer sich über die vielen tollen Informationsangebote im Internet freut, sollte wissen:
#KI 🤖 randaliert im Netz – #Admins halten dagegen, damit wir Menschen ungestört surfen können.
Lest mal, wie Admins ihre absolut frustrierende aber unsichtbare Abwehrarbeit gegen KI beschreiben – im Blog von @campact:
👉 https://blog.campact.de/2025/05/ki-randaliert-im-netz-admins-halten-dagegen/❗Nicht vergessen: 25. Juli ist #SysAdminDay
#FediAdmins #KIScraping #AI #AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep
-
Wer sich über die vielen tollen Informationsangebote im Internet freut, sollte wissen:
#KI 🤖 randaliert im Netz – #Admins halten dagegen, damit wir Menschen ungestört surfen können.
Lest mal, wie Admins ihre absolut frustrierende aber unsichtbare Abwehrarbeit gegen KI beschreiben – im Blog von @campact:
👉 https://blog.campact.de/2025/05/ki-randaliert-im-netz-admins-halten-dagegen/❗Nicht vergessen: 25. Juli ist #SysAdminDay
#FediAdmins #KIScraping #AI #AIScraping #TDM #AdminLeiden #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep
-
AI Crawlers Overwhelm Open-Source Projects, Forcing Developers to Block Entire Countries
#AI #Web #Robotstxt #AIScraping #OpenSource #Cybersecurity #DataScraping #Scraping #WebScraping
-
🌐 LLM crawlers continue to DDoS SourceHut | sr_ht status
「 SourceHut continues to face disruptions due to aggressive LLM crawlers. We are continuously working to deploy mitigations. We have deployed a number of mitigations which are keeping the problem contained for now. However, some of our mitigations may impact end-users 」
-
Hi #Admins 👋,
Can you give me quotes that explain your fight against #AIScraping? I'm looking for (verbal) images, metaphors, comparisons, etc. that explain to non-techies what's going on. (efforts, goals, resources...)
I intend to publish your quotes in a text on @campact 's blog¹ (DE, German NGO).
The quotes should make your work🙏 visible in a generally understandable way
¹ https://blog.campact.de/author/friedemann/
#TDM #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep #Kudurru #Nightshade #Glaze #FediAdmins
-
Hi #Admins 👋,
Can you give me quotes that explain your fight against #AIScraping? I'm looking for (verbal) images, metaphors, comparisons, etc. that explain to non-techies what's going on. (efforts, goals, resources...)
I intend to publish your quotes in a text on @campact 's blog¹ (DE, German NGO).
The quotes should make your work🙏 visible in a generally understandable way
¹ https://blog.campact.de/author/friedemann/
#TDM #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep #Kudurru #Nightshade #Glaze #FediAdmins
-
Hi #Admins 👋,
Can you give me quotes that explain your fight against #AIScraping? I'm looking for (verbal) images, metaphors, comparisons, etc. that explain to non-techies what's going on. (efforts, goals, resources...)
I intend to publish your quotes in a text on @campact 's blog¹ (DE, German NGO).
The quotes should make your work🙏 visible in a generally understandable way
¹ https://blog.campact.de/author/friedemann/
#TDM #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep #Kudurru #Nightshade #Glaze #FediAdmins
-
Hi #Admins 👋,
Can you give me quotes that explain your fight against #AIScraping? I'm looking for (verbal) images, metaphors, comparisons, etc. that explain to non-techies what's going on. (efforts, goals, resources...)
I intend to publish your quotes in a text on @campact 's blog¹ (DE, German NGO).
The quotes should make your work🙏 visible in a generally understandable way
¹ https://blog.campact.de/author/friedemann/
#TDM #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep #Kudurru #Nightshade #Glaze #FediAdmins
-
Hi #Admins 👋,
Can you give me quotes that explain your fight against #AIScraping? I'm looking for (verbal) images, metaphors, comparisons, etc. that explain to non-techies what's going on. (efforts, goals, resources...)
I intend to publish your quotes in a text on @campact 's blog¹ (DE, German NGO).
The quotes should make your work🙏 visible in a generally understandable way
¹ https://blog.campact.de/author/friedemann/
#TDM #MastoAdmin #DataPoisoning #aitxt #GPT #TDMRep #Kudurru #Nightshade #Glaze #FediAdmins
-
"It’s pretty crazy that not only a) these bots shamelessly harvest all your data without asking for permission and b) they do it in such a brute-force manner.
My coworker and security expert António pointed me to #DarkVisitors, and I’ll probably be installing their #WordPressPlugin on all my sites. For what it’s worth."
@john_fisherman on #AIscraping
https://fred-rocha.medium.com/ai-crawler-bots-on-the-hunt-caf5a59ff478 -
»Online #publishers face a #dilemma: Allow #AIscraping from #Google or lose #searchvisibility: Blocking the company’s #AIoverviews also blocks its #webcrawler.« https://www.engadget.com/ai/online-publishers-face-a-dilemma-allow-ai-scraping-from-google-or-lose-search-visibility-202246891.html?eicker.news #tech #media
-
「 OpenAI and Anthropic have stated publicly that they respect robots.txt and blocks to their specific web crawlers, GPTBot and ClaudeBot.
However, according to TollBit's findings, such blocks are not being respected, as claimed. AI companies, including OpenAI and Anthropic, are simply choosing to "bypass" robots.txt in order to retrieve or scrape all of the content from a given website or page 」
https://www.businessinsider.com/openai-anthropic-ai-ignore-rule-scraping-web-contect-robotstxt
#OpenAI #AIScraping #AITheft #AI #Cybersecurity #Infosec #Security
-
🥸 Perplexity AI Is Lying about Their User Agent • @robb
「 I checked a few sites and this is just Google Chrome running on Windows 10. So they're using headless browsers to scrape content, ignoring robots.txt, and not sending their user agent string. I can't even block their IP ranges because it appears these headless browsers are not on their IP ranges 」
https://rknight.me/blog/perplexity-ai-is-lying-about-its-user-agent/
-
Yeah, you're really gonna see which companies are just gonna allow the AI to scrape all their stuff now. I'm a copyleft/creative commons kinda guy. But if you have art that you don't want stolen, the answer is simple.
MAKE YOUR OWN WEBSITE and put your art there (edit: and use that Glaze type of stuff on your art that wrecks AI, just to be sure)! Neocities is SO easy to set up! Or your own domain and hosting via porkbun, GoDaddy (non-WordPress) - anything at all other than proprietary/walled stuff!
#tumblr #WordPress #IndieWeb #WeirdWeb #PersonalWebsites #art #AI #LLM #OpenAI #AIScraping #Midjourney #Automattic #neocities #copyleft #CreativeCommons