#ai-crawlers — Public Fediverse posts on home.social

🌈 Barbapulpe 😇 ᴹᵃˢᵗᵒᵈᵒⁿ @[email protected] · 2026-05-09 · 15:41 UTC

C'est assez rigolo, les bots IA se calment le week-end, on passe de >90% en semaine à 70% "seulement" en fin de semaine...

#gayfr #IA #AI #AIBots #AICrawlers

#gayfr #ia #ai #aibots #aicrawlers

🌈 Barbapulpe 😇 ᴹᵃˢᵗᵒᵈᵒⁿ @[email protected] · 2026-05-08 · 16:37 UTC

🇬🇧 If you like numbers, our dashboard lets you view all our sites’ statistics in real time.

https://status.gayfr.social
https://status.gayfr.online

I’ve added new metrics related to Anubis, which protects us against AI bots. So at the bottom, you’ll see the number of bots blocked on sight (red), the number of visits that need to verify the “I’m not a robot” page (orange), the percentage that succeed (meaning they aren’t bots, in yellow), and finally the number of accepted human visits (green).

What’s interesting is that the percentage of bots varies throughout the week but remains very high (between 65% and 96% depending on the time). That’s huge! They attack in swarms, just like a pain in the ass…

That justifies the efforts made to protect against them.

#gayfr #Anubis #IA #AI #ArtificialIntelligence #Bots #AIBots #AICrawlers

#gayfr #anubis #ia #ai #artificialintelligence #bots

🌈 Barbapulpe 😇 ᴹᵃˢᵗᵒᵈᵒⁿ @[email protected] · 2026-05-08 · 16:35 UTC

🇫🇷 Si vous aimez les chiffres, notre tableau de bord vous permet de voir toutes les statistiques de nos sites en temps réel.

https://status.gayfr.social
https://status.gayfr.online

J'ai rajouté les nouveaux indicateurs relatifs à Anubis qui nous protège contre les bots IA. Ainsi à la fin vous verrez le nombre de bots bloqués à vue (rouge), le nombre de visites qui doivent valider la page "vous n'êtes pas un robot" (orange), le pourcentage qui réussissent (donc ne sont pas des bots, en jaune) et enfin le nombre de visites humaines acceptées (vert).

Ce qui est intéressant c'est que le % de bots varie dans la semaine mais reste très important (entre 65% et 96% selon les moments). C'est énorme ! Ils attaquent en escadrille, comme les emmerdes...

Ça justifie les efforts faits pour s'en protéger.

#gayfr #Anubis #IA #AI #IntelligenceArtificielle #ArtificialIntelligence #Bots #AIBots #AICrawlers

#gayfr #anubis #ia #ai #intelligenceartificielle #artificialintelligence

PPC Land @[email protected] · 2026-04-09 · 03:01 UTC

FYI: Blocking AI crawlers doesn't stop citations - new data shows why: New BuzzStream data from 4 million AI citations shows blocking AI crawlers rarely stops ChatGPT or Gemini from citing publisher content - here is why. https://ppc.land/blocking-ai-crawlers-doesnt-stop-citations-new-data-shows-why/ #AICrawlers #AIResearch #ContentCitations #DigitalMarketing #ChatGPT

#aicrawlers #airesearch #contentcitations #digitalmarketing #chatgpt

🌈 Barbapulpe 😇 ᴹᵃˢᵗᵒᵈᵒⁿ @[email protected] · 2026-04-08 · 21:45 UTC

Nos statistiques des consultations hebdomadaires pour nos deux serveurs principaux.

Tout vous paraît normal pour des serveurs francophones ?

Cherchez l’IA...

#IA #AI #AICrawlers #AIBots

#ia #ai #aicrawlers #aibots

PPC Land @[email protected] · 2026-04-07 · 02:59 UTC

ICYMI: Blocking AI crawlers doesn't stop citations - new data shows why: New BuzzStream data from 4 million AI citations shows blocking AI crawlers rarely stops ChatGPT or Gemini from citing publisher content - here is why. https://ppc.land/blocking-ai-crawlers-doesnt-stop-citations-new-data-shows-why/ #AICrawlers #DataPrivacy #ChatGPT #AIResearch #TechNews

#technews #aicrawlers #dataprivacy #chatgpt #airesearch

PPC Land @[email protected] · 2026-04-06 · 02:57 UTC

Blocking AI crawlers doesn't stop citations - new data shows why: New BuzzStream data from 4 million AI citations shows blocking AI crawlers rarely stops ChatGPT or Gemini from citing publisher content - here is why. https://ppc.land/blocking-ai-crawlers-doesnt-stop-citations-new-data-shows-why/ #AICrawlers #AIResearch #ChatGPT #Gemini #DigitalMarketing

#aicrawlers #airesearch #chatgpt #gemini #digitalmarketing

Clemens @[email protected] · 2026-03-19 · 13:22 UTC

It's the time of the month where #AIcrawlers with outdated user agents hit our Trac instance again…

#aicrawlers

Winbuzzer @[email protected] · 2026-02-09 · 18:19 UTC

https://winbuzzer.com/2026/02/09/cloudflare-google-search-monopoly-ai-data-advantage-xcxwbn/

Cloudflare: Google Abuses Search Monopoly for 4.8x AI Data Advantage

#AI #Google #Cloudflare #BigTech #Search #AITrainingData #AICrawlers #AITraining #Content #Publishers #SearchResults #SearchEngines

#ai #google #cloudflare #bigtech #search #aitrainingdata

JCM @[email protected] · 2025-12-18 · 07:27 UTC

Given how aggressively AI crawlers (I presume) are hitting my servers (and of course ignoring the robots.txt), I need to either install some crawler-blocking solution or switch from my denylist for IP ranges to an allowlist…

Does anyone know a software that blocks crawlers that is compatible with traefik and ideally uses very little resources?

#self-hosting #AI-crawlers #self-hosting-headaches #I-literary-had-to-deny-an-entire-/16-subnet-yesterday-to-prevent-my-servers-from-crash-looping

#selfhosting #aicrawlers #selfhostingheadaches #iliteraryhadtodenyanentire16subnetyesterdaytopreventmyserversfromcrashlooping

NERDS.xyz – Real Tech News for Real Nerds [Unofficial] @[email protected] · 2025-09-24 · 13:29 UTC

Cloudflare launches Content Signals Policy to fight AI crawlers and scrapers

https://web.brid.gy/r/https://nerds.xyz/2025/09/cloudflare-content-signals-policy-ai-crawlers/

#artificialintelligence #aicrawlers #aitraining #cloudflare #contentsignalspolicy #datascraping

DrWeb @[email protected] · 2025-09-23 · 23:05 UTC

Helping protect journalists and local news from AI crawlers with Project Galileo – Cloudflare.com

Helping protect journalists and local news from AI crawlers with Project Galileo

2025-09-23, 5 min read

By Patrick Day and Jocelyn Woolbright

We are excited to announce that Project Galileo will now include access to Cloudflare’s Bot Management and AI Crawl Control services. Participants in the program, which include roughly 750 journalists, independent news organizations, and other non-profits supporting news-gathering around the world, will now have the ability to protect their websites from AI crawlers—for free.

Project Galileo is Cloudflare’s free program to help protect important civic voices online. Launched in 2014, it now includes more than 3,000 organizations in 125 countries, and it has served as the foundation for other free Cloudflare programs that help protect democratic elections, public schools, public health clinics, and other critical infrastructure.

Although we think all Project Galileo participants will benefit from these additional free services, we believe they are essential for news organizations.

News organizations, particularly local news, are facing significant challenges in transitioning to the AI-driven web. As people increasingly turn to AI models for information, less of their web traffic is making it to the actual website where that information originated. Industries, like news organizations, that rely on user traffic to generate revenue are increasingly at-risk.

Allowing news organizations to monitor and control how AI crawlers are interacting with their websites, will help them better protect their content and make more informed decisions about engaging with AI companies. Ultimately, our goal is to provide the tools news organizations need to negotiate fair compensation for their work.

Editor’s Note: Read the rest of the story, at the below link.

Continue/Read Original Article Here: Helping protect journalists and local news from AI crawlers with Project Galileo

#2025 #AICrawlers #America #Cloudflare #CloudflareCom #Education #Health #History #Internet #Journalism #Journalists #Libraries #LibraryOfCongress #Opinion #Reading #Science #Technology #UnitedStates #WebTraffic #Writing

#aicrawlers #america #cloudflare #cloudflarecom #education #health

Dr Pen @[email protected] · 2025-09-14 · 19:21 UTC

The metadata for RSL

#rsl #rslstandard #ai #aicrawlers #AICrawlControl

RSL Specification | RSL: Really Simple Licensing
https://rslstandard.org/rsl

#rsl #rslstandard #ai #aicrawlers #aicrawlcontrol

Dr Pen @[email protected] · 2025-09-14 · 19:10 UTC

Very interesting Really Simple Licencing.

#RSLProtocol #ai #aicrawlers #AICrawlControl

https://arstechnica.com/tech-policy/2025/09/pay-per-output-ai-firms-blindsided-by-beefed-up-robots-txt-instructions/

#rslprotocol #ai #aicrawlers #aicrawlcontrol

3dcandy @[email protected] · 2025-07-01 · 10:54 UTC

3dclive uses Cloudflare to block AI crawlers stealing YOUR content https://3dcandy.live/2025/07/3dclive-uses-cloudflare-to-block-ai-crawlers-stealing-your-content/ #3dcandy #3dclive #ai #aicrawlers #boost #cloudflare #content #stealing

#3dcandy #3dclive #ai #aicrawlers #boost #cloudflare

3dcandy @[email protected] · 2025-07-01 · 10:44 UTC

Cloudflare blocking AI crawlers by default
#boost #aicrawlers #ai #cloudflare

https://www.relayeasy.com/cloudflare-blocking-ai-crawlers/

#boost #aicrawlers #ai #cloudflare

Winbuzzer @[email protected] · 2025-05-31 · 11:11 UTC

Google Analyst Warns: AI Bots Risk Internet Gridlock By Server Overload

#AI #AICrawlers #InternetCongestion #WebPerformance #Google #AIethics #CyberSecurity #FutureOfWeb #AISafety #DataPrivacy

https://winbuzzer.com/2025/05/31/google-analyst-warns-ai-bots-risk-internet-gridlock-by-server-overload-xcxwbn/

#ai #aicrawlers #internetcongestion #webperformance #google #aiethics

Catherine Babault @[email protected] · 2025-05-02 · 18:26 UTC

In SquareSpace, you can opt to block AI crawlers in Settings. However it doesn't work since ChatGPT appears in my Analytics. Does anyone know if I could add in Website > Utilities > Website Tools > Code Injection this rule without creating any issues:
User-agent : ChatGPT-User
Disallow: /

#SquareSpace #AICrawlers #ChatGPT

#squarespace #aicrawlers #chatgpt

CrowdSec @[email protected] · 2025-05-02 · 08:17 UTC

AI Crawlers stealing your content? Time to fight back! 💪

LLMs and AI bots are scraping the web, stealing up your data, hogging bandwidth, and even crashing servers under aggressive loads.

Don’t let them freeload! The CrowdSec AI Crawlers Blocklist stops unwanted harvesting before it hurts your site’s performance or privacy.

Regain control over your digital assets: https://crowdsec.net/blog/protect-against-ai-crawlers

#AIcrawlers #blocklists #threatintelligence #cybersecurity #infosec #AIbots #dataprotection

#aicrawlers #blocklists #threatintelligence #cybersecurity #infosec #aibots

Clemens @[email protected] · 2025-04-24 · 21:29 UTC

So according to the request statistics, since the last rotation of the access log file for the #MacPorts trac this morning, there were:

20.8k requests from IE 3
20.9k requests from IE 4
21.3k requests from IE 5
43 requests from IE 6 and
23 requests from IE 7

These requests came from these Windows versions (roughly 4k per version): CE, 95, 98 (9.5k), NT 4, 2000, XP, NT 5.01(?!), Server 2003, Vista, 7, and 8.0.

I'm sure none of those are AI crawler bots.

#aicrawler #aicrawlers

#macports #aicrawler #aicrawlers

CrowdSec @[email protected] · 2025-04-17 · 15:29 UTC

🤖 Calling all FOSS communities!

Worried about AI crawlers scraping your content or overwhelming your servers? We’ve got your back. 💪

To support open source communities, we’re offering free access to our Platinum AI Crawlers Blocklist. 🎉

🔗 Learn how to get started: https://www.crowdsec.net/blog/protecting-foss-with-free-ai-crawlers-blocklist

#FOSS #opensource #community #AICrawlers

#foss #opensource #community #aicrawlers

readbeanicecream @[email protected] · 2025-04-02 · 12:57 UTC

Wikipedia is struggling with voracious AI bot crawlers

https://www.engadget.com/ai/wikipedia-is-struggling-with-voracious-ai-bot-crawlers-121546854.html

#technology #tech #ai #artificialintelligence #generativeAI #web #internet #wikipedia #wikimedia #aicrawlers #bots

#technology #tech #ai #artificialintelligence #generativeai #web

PUPUWEB Blog @[email protected] · 2025-03-27 · 04:03 UTC

Developers report aggressive AI crawlers overwhelming open-source infrastructure, with LibreNews citing up to 97% of traffic from AI bots on some projects. #AI #OpenSource #TechNews #AIcrawlers #Bots #LibreNews #DeveloperCommunity #Infrastructure #TechIndustry

#ai #opensource #technews #aicrawlers #bots #librenews

Hacker News @[email protected] · 2025-03-25 · 22:54 UTC

Devs say AI crawlers dominate traffic, forcing blocks on entire countries

https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/

#HackerNews #AItraffic #AIcrawlers #DevsNews #InternetRegulation #CountryBlocks

#hackernews #aitraffic #aicrawlers #devsnews #internetregulation #countryblocks

Jerry on Mastodon @[email protected] · 2025-03-24 · 16:12 UTC

Over the past 24-hours, #Facebook has been the most determined #AI crawler to scrape data from this server, by far. They never succeed. #Cloudflare always blocks them for being one of the unwanted AI bots.

What is interesting though is its determination to read one particular user invite. I wonder how it picks the other posts it wants to read.

#aicrawlers

#facebook #ai #cloudflare #aicrawlers

Dr Pen @[email protected] · 2025-01-27 · 20:33 UTC

Protecting your blog from the dead eyed #AI crawlers. You can experiment with specific robots txt, and I also run a script in htaccess. I think there are metadata properties you can declare. None of this stops your pages being crawled but may afford some legal protection. (See the German Laion case recently). I'm doing a short blogpost on this, soon.

#robotstxt #aicrawlers #htaccess

#ai #robotstxt #aicrawlers #htaccess