home.social

#ai-crawlers — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #ai-crawlers, aggregated by home.social.

fetched live
  1. C'est assez rigolo, les bots IA se calment le week-end, on passe de >90% en semaine à 70% "seulement" en fin de semaine...

    #gayfr #IA #AI #AIBots #AICrawlers

  2. 🇬🇧 If you like numbers, our dashboard lets you view all our sites’ statistics in real time.

    status.gayfr.social
    status.gayfr.online

    I’ve added new metrics related to Anubis, which protects us against AI bots. So at the bottom, you’ll see the number of bots blocked on sight (red), the number of visits that need to verify the “I’m not a robot” page (orange), the percentage that succeed (meaning they aren’t bots, in yellow), and finally the number of accepted human visits (green).

    What’s interesting is that the percentage of bots varies throughout the week but remains very high (between 65% and 96% depending on the time). That’s huge! They attack in swarms, just like a pain in the ass…

    That justifies the efforts made to protect against them.

    #gayfr #Anubis #IA #AI #ArtificialIntelligence #Bots #AIBots #AICrawlers

  3. 🇫🇷 Si vous aimez les chiffres, notre tableau de bord vous permet de voir toutes les statistiques de nos sites en temps réel.

    status.gayfr.social
    status.gayfr.online

    J'ai rajouté les nouveaux indicateurs relatifs à Anubis qui nous protège contre les bots IA. Ainsi à la fin vous verrez le nombre de bots bloqués à vue (rouge), le nombre de visites qui doivent valider la page "vous n'êtes pas un robot" (orange), le pourcentage qui réussissent (donc ne sont pas des bots, en jaune) et enfin le nombre de visites humaines acceptées (vert).

    Ce qui est intéressant c'est que le % de bots varie dans la semaine mais reste très important (entre 65% et 96% selon les moments). C'est énorme ! Ils attaquent en escadrille, comme les emmerdes...

    Ça justifie les efforts faits pour s'en protéger.

    #gayfr #Anubis #IA #AI #IntelligenceArtificielle #ArtificialIntelligence #Bots #AIBots #AICrawlers

  4. FYI: Blocking AI crawlers doesn't stop citations - new data shows why: New BuzzStream data from 4 million AI citations shows blocking AI crawlers rarely stops ChatGPT or Gemini from citing publisher content - here is why. ppc.land/blocking-ai-crawlers- #AICrawlers #AIResearch #ContentCitations #DigitalMarketing #ChatGPT

  5. Nos statistiques des consultations hebdomadaires pour nos deux serveurs principaux.

    Tout vous paraît normal pour des serveurs francophones ?

    Cherchez l’IA...

    #IA #AI #AICrawlers #AIBots

  6. ICYMI: Blocking AI crawlers doesn't stop citations - new data shows why: New BuzzStream data from 4 million AI citations shows blocking AI crawlers rarely stops ChatGPT or Gemini from citing publisher content - here is why. ppc.land/blocking-ai-crawlers- #AICrawlers #DataPrivacy #ChatGPT #AIResearch #TechNews

  7. Blocking AI crawlers doesn't stop citations - new data shows why: New BuzzStream data from 4 million AI citations shows blocking AI crawlers rarely stops ChatGPT or Gemini from citing publisher content - here is why. ppc.land/blocking-ai-crawlers- #AICrawlers #AIResearch #ChatGPT #Gemini #DigitalMarketing

  8. It's the time of the month where #AIcrawlers with outdated user agents hit our Trac instance again…

  9. Given how aggressively AI crawlers (I presume) are hitting my servers (and of course ignoring the robots.txt), I need to either install some crawler-blocking solution or switch from my denylist for IP ranges to an allowlist…

    Does anyone know a software that blocks crawlers that is compatible with traefik and ideally uses very little resources?


    #self-hosting #AI-crawlers #self-hosting-headaches #I-literary-had-to-deny-an-entire-/16-subnet-yesterday-to-prevent-my-servers-from-crash-looping
  10. Helping protect journalists and local news from AI crawlers with Project Galileo – Cloudflare.com

     Helping protect journalists and local news from AI crawlers with Project Galileo

    2025-09-23, 5 min read

    By Patrick Day and Jocelyn Woolbright

    We are excited to announce that Project Galileo will now include access to Cloudflare’s Bot Management and AI Crawl Control services. Participants in the program, which include roughly 750 journalists, independent news organizations, and other non-profits supporting news-gathering around the world, will now have the ability to protect their websites from AI crawlers—for free. 

    Project Galileo is Cloudflare’s free program to help protect important civic voices online. Launched in 2014, it now includes more than 3,000 organizations in 125 countries, and it has served as the foundation for other free Cloudflare programs that help protect democratic elections, public schools, public health clinics, and other critical infrastructure.  

    Although we think all Project Galileo participants will benefit from these additional free services, we believe they are essential for news organizations. 

    News organizations, particularly local news, are facing significant challenges in transitioning to the AI-driven web. As people increasingly turn to AI models for information, less of their web traffic is making it to the actual website where that information originated. Industries, like news organizations, that rely on user traffic to generate revenue are increasingly at-risk. 

    Allowing news organizations to monitor and control how AI crawlers are interacting with their websites, will help them better protect their content and make more informed decisions about engaging with AI companies. Ultimately, our goal is to provide the tools news organizations need to negotiate fair compensation for their work.

    Editor’s Note: Read the rest of the story, at the below link.

    Continue/Read Original Article Here: Helping protect journalists and local news from AI crawlers with Project Galileo

    #2025 #AICrawlers #America #Cloudflare #CloudflareCom #Education #Health #History #Internet #Journalism #Journalists #Libraries #LibraryOfCongress #Opinion #Reading #Science #Technology #UnitedStates #WebTraffic #Writing

  11. In SquareSpace, you can opt to block AI crawlers in Settings. However it doesn't work since ChatGPT appears in my Analytics. Does anyone know if I could add in Website > Utilities > Website Tools > Code Injection this rule without creating any issues:
    User-agent : ChatGPT-User 
    Disallow: /

    #SquareSpace #AICrawlers #ChatGPT

  12. AI Crawlers stealing your content? Time to fight back! 💪

    LLMs and AI bots are scraping the web, stealing up your data, hogging bandwidth, and even crashing servers under aggressive loads.

    Don’t let them freeload! The CrowdSec AI Crawlers Blocklist stops unwanted harvesting before it hurts your site’s performance or privacy.

    Regain control over your digital assets: crowdsec.net/blog/protect-agai

    #AIcrawlers #blocklists #threatintelligence #cybersecurity #infosec #AIbots #dataprotection

  13. So according to the request statistics, since the last rotation of the access log file for the #MacPorts trac this morning, there were:

    20.8k requests from IE 3
    20.9k requests from IE 4
    21.3k requests from IE 5
    43 requests from IE 6 and
    23 requests from IE 7

    These requests came from these Windows versions (roughly 4k per version): CE, 95, 98 (9.5k), NT 4, 2000, XP, NT 5.01(?!), Server 2003, Vista, 7, and 8.0.

    I'm sure none of those are AI crawler bots.

    #aicrawler #aicrawlers

  14. 🤖 Calling all FOSS communities!

    Worried about AI crawlers scraping your content or overwhelming your servers? We’ve got your back. 💪

    To support open source communities, we’re offering free access to our Platinum AI Crawlers Blocklist. 🎉

    🔗 Learn how to get started: crowdsec.net/blog/protecting-f

    #FOSS #opensource #community #AICrawlers

  15. Developers report aggressive AI crawlers overwhelming open-source infrastructure, with LibreNews citing up to 97% of traffic from AI bots on some projects. #AI #OpenSource #TechNews #AIcrawlers #Bots #LibreNews #DeveloperCommunity #Infrastructure #TechIndustry

  16. Over the past 24-hours, #Facebook has been the most determined #AI crawler to scrape data from this server, by far. They never succeed. #Cloudflare always blocks them for being one of the unwanted AI bots.

    What is interesting though is its determination to read one particular user invite. I wonder how it picks the other posts it wants to read.

    #aicrawlers

  17. Protecting your blog from the dead eyed #AI crawlers. You can experiment with specific robots txt, and I also run a script in htaccess. I think there are metadata properties you can declare. None of this stops your pages being crawled but may afford some legal protection. (See the German Laion case recently). I'm doing a short blogpost on this, soon.

    #robotstxt #aicrawlers #htaccess