#crawler — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #crawler, aggregated by home.social.

PriEco Search Engine @prieco · 2026-05-14 · 16:00 UTC

Added https://github.com/laylavish/uBlockOrigin-HUGE-AI-Blocklist to PriEco #crawler
PriEco will no longer create results out of clearly #AI slop #websites
Our fight against AI #slop doesn't end here, and we are figuring out better ways to handle them
#crawler #AI #websites #slop

#crawler #ai #websites #slop
Walled Culture @[email protected] · 2026-04-29 · 12:00 UTC

Welcome to the future, where AI agents hunt down alleged online copyright infringement
As readers of this blog have doubtless noticed, the latest hot tech – and investment – area involves “agentic AI”, where AI systems are allowed to operative autonomously on allocated tasks. There’s no doubt there are some exciting possibilities here, as well as some troubling issues concerning lack of control. It’s a rapidly-evolving area of research and experimentation, which makes […]
#agenticAi #agents #ai #ceaseAndDesist #crawler #digitalWatermarks #infringement #licensing #llms #patents #pricing #takedowns #universalMusicGroup https://walledculture.org/welcome-to-the-future-where-ai-agents-hunt-down-alleged-online-copyright-infringement/

#agenticai #agents #ai #ceaseanddesist #crawler #digitalwatermarks
🍂:･*Fweeblies*:･🍂 @[email protected] · 2026-04-22 · 04:54 UTC

MAN I love drawing these guys lmao
https://bsky.app/profile/cynthiathedragon.bsky.social - https://bsky.app/profile/sappytheaxxy.bsky.social - https://www.furaffinity.net/user/tigermigo/ - https://www.furaffinity.net/user/mislakane/
#furryart #fresnonightcrawler #crawler #cryptid #cryptidapril

#furryart #fresnonightcrawler #crawler #cryptid #cryptidapril
🍂:･*Fweeblies*:･🍂 @[email protected] · 2026-04-22 · 04:54 UTC

MAN I love drawing these guys lmao
https://bsky.app/profile/cynthiathedragon.bsky.social - https://bsky.app/profile/sappytheaxxy.bsky.social - https://www.furaffinity.net/user/tigermigo/ - https://www.furaffinity.net/user/mislakane/
#furryart #fresnonightcrawler #crawler #cryptid #cryptidapril

#furryart #fresnonightcrawler #crawler #cryptid #cryptidapril
🍂:･*Fweeblies*:･🍂 @[email protected] · 2026-04-22 · 04:54 UTC

MAN I love drawing these guys lmao
https://bsky.app/profile/cynthiathedragon.bsky.social - https://bsky.app/profile/sappytheaxxy.bsky.social - https://www.furaffinity.net/user/tigermigo/ - https://www.furaffinity.net/user/mislakane/
#furryart #fresnonightcrawler #crawler #cryptid #cryptidapril

#furryart #fresnonightcrawler #crawler #cryptid #cryptidapril
🍂:･*Fweeblies*:･🍂 @[email protected] · 2026-04-22 · 04:54 UTC

MAN I love drawing these guys lmao
https://bsky.app/profile/cynthiathedragon.bsky.social - https://bsky.app/profile/sappytheaxxy.bsky.social - https://www.furaffinity.net/user/tigermigo/ - https://www.furaffinity.net/user/mislakane/
#furryart #fresnonightcrawler #crawler #cryptid #cryptidapril

#cryptidapril #cryptid #crawler #fresnonightcrawler #furryart
🍂:･*Fweeblies*:･🍂 @[email protected] · 2026-04-22 · 04:54 UTC

MAN I love drawing these guys lmao
https://bsky.app/profile/cynthiathedragon.bsky.social - https://bsky.app/profile/sappytheaxxy.bsky.social - https://www.furaffinity.net/user/tigermigo/ - https://www.furaffinity.net/user/mislakane/
#furryart #fresnonightcrawler #crawler #cryptid #cryptidapril

#furryart #fresnonightcrawler #crawler #cryptid #cryptidapril
Tarnkappe.info @[email protected] · 2026-04-16 · 10:15 UTC

📬 Google-Ranking verstehen: Was hinter den Suchergebnissen steckt
#Empfehlungen #Internet #Absprungrate #Crawler #GoogleRanking #Keywords #SEO #Suchmaschinen #URLStrukturen https://sc.tarnkappe.info/d8af41

#empfehlungen #internet #absprungrate #crawler #googleranking #keywords
AliveDevil @[email protected] · 2026-03-17 · 23:51 UTC

Who do you think you are?
47.128.32.0 - - [18/Mar/2026:00:48:01 +0100] "GET /robots.txt HTTP/1.1" 403 239 "-" "-" 1650 4269
#Amazon #AWS Singapore.
Good on you that #CrowdSec won't immediately block on a missing user-agent, but my httpd-ACL does.
#DarkVisitors #AI #Crawler #GenAI #SocialPermissionToBurnEnergy

#amazon #aws #crowdsec #darkvisitors #ai #crawler
Lukas Rotermund @[email protected] · 2026-03-15 · 10:56 UTC

:ablobcatheartsqueeze: I have been running iocaine on my server for a week now. During this time, 7,076,701 requests have passed through iocaine, 3,312,318 of which were identified as AI crawlers/bots. 3,741,577 requests came from crawlers/bots that got stuck in iocaine's deadly maze, consuming an infinite amount of poisoned garbage. Furthermore, 972 crawlers/bots were detected that were routed into the maze via major browsers.
All of this is managed by iocaine with just ~80 MB of memory and ~0.1% direct CPU usage. Now that’s what I call efficient! Well done, @algernon.
Let's fight back against AI crawlers and bots. Thanks to projects like iocaine, this is entirely possible, not just theory :blobcat_thisisfine:
#iocaine #ai #llm #FckAI #FckLLMs #selfhosting #crawler #bots

#iocaine #ai #llm #fckai #fckllms #selfhosting
Lukas Rotermund @[email protected] · 2026-03-07 · 22:47 UTC

I have just installed iocaine 3.2.0 by @algernon and have already started successfully serving poisoned garbage to the AI agents. I love it! I especially like how simple the setup was, and how easy it was to expand my existing Caddyfile. My monthly donation is set up too. What a great project!
#iocaine #ai #llm #FckAI #FckLLMs #bot #crawler

#iocaine #ai #llm #fckai #fckllms #bot
Gea-Suan Lin @[email protected] · 2026-03-03 · 00:42 UTC

https://blog.gslin.org/archives/2026/03/03/12911/2025-%e5%b9%b4%e7%88%ac%e5%8d%81%e5%84%84%e5%80%8b%e9%a0%81%e9%9d%a2%e7%9a%84%e6%88%90%e6%9c%ac/
2025 年爬十億個頁面的成本
#amazon #aws #cloud #crawler #crawling #engine #html #https #javascript #js #library #lxml #performance #search #selectolax #service #speed #ssl #tls #web #webpage

#amazon #aws #cloud #crawler #crawling #engine
Patrick :neocat_flag_bi: @[email protected] · 2026-01-31 · 03:00 UTC

One Open-source Project Daily

Fast and simple video download library and CLI tool written in Go

https://github.com/iawia002/lux

#1ospd #opensource #bilibili #crawler #download #downloader #go #golang #iqiyi #qq #scraper #tumblr #video #youku #youtube

#1ospd #opensource #bilibili #crawler #download #downloader
teufelswerk @[email protected] · 2026-01-24 · 10:46 UTC

Robots.txt Generator - Retro Terminal Edition - Mehr als 200 Bots in der kostenfreien Version. Pures HTML, Javascript und ein bisschen CSS. Keine Third Parties, kein Framework, kein CDN, keine Cookies, kein Tracking, keine Werbung, kein BigTech-Gedönse, keine KI, sehr datenschutzfreundlich. Simple und effektiv im Retro-Style. Demnächst online.
#teufelswerk #HTML #javascript #app #entwicklung #code #retro #css #robotstxt #generator #stopbots #bots #crawler #scraper #keineKI #cookieless #datenschutz

#teufelswerk #html #javascript #app #entwicklung #code
OptimusPrimeBot @[email protected] · 2026-01-15 · 01:04 UTC

Crawler Preps for Entry into VAB for Artemis II Rollout Ops 🌑🧑‍🚀
#Artemis #ArtemisII #crawler #rollout
⏩ 7 new pictures from NASA (Image Library) https://commons.wikimedia.org/wiki/Special:ListFiles?limit=7&user=OptimusPrimeBot&ilshowall=1&offset=20260115010334

#artemis #artemisii #crawler #rollout
Bagolina @[email protected] · 2025-12-31 · 16:05 UTC

Vers un #web toujours plus fragile https://siecledigital.fr/2025/12/31/etude-cloudflare-2025-un-web-plus-vaste-plus-automatise-et-plus-fragile
À eux seuls, les #bots représenteraient près de 30% du trafic web mondial, avec des pics capables de générer des volumes comparables à des attaques DDoS
#Googlebot est le #crawler dominant avec 4,5% des requêtes HTML
En 2025, le #smartphone s’impose avec environ 43% des utilisateurs mondiaux, contre 57% pour les ordinateurs. #Android domine largement le trafic mobile à l’échelle mondiale, tandis qu’#iOS conserve une position forte

#web #bots #googlebot #crawler #smartphone #android
steve cooley @[email protected] · 2025-12-28 · 07:17 UTC

my #dwarffortress experience comes after #moria #roguelike #mincraft #rpg #dungeon #crawler #dnd #ascii #angband #shatteredpixel ... so far, after 8 hours, I sort of feel aimless and like maybe I made a bad purchase. Is there an "arcade" version of Dwarf Fortress? to get started with? How does one engage with this #lore... ?

#dwarffortress #moria #roguelike #mincraft #rpg #dungeon
Marcel SIneM(S)US @[email protected] · 2025-12-22 · 16:07 UTC

#RSL 1.0 statt robots.txt: Neuer Standard für Internet-Inhalte | heise online https://www.heise.de/news/RSL-1-0-Standard-soll-Verwendung-von-Inhalten-regeln-11111422.html #searchengines #searchengine #ArtificialIntelligence #crawler #ReallySimpleLicensing #robotsTXT

#rsl #searchengines #searchengine #artificialintelligence #crawler #reallysimplelicensing
C.Suthorn :prn: @[email protected] · 2025-12-17 · 11:14 UTC

Wenn ich nachsehen möchte, ob im #Formatstring für das Datum das kleine s für Sekunden und das große S für Millisekunden steht, dann frage ich das einen beliebigen #GPTbot (den, der nicht sagt #Quota exceded, weil ich mich nicht per #Api-Key identifiziert habe). Warum?
In #Wikipedia steht die Antwort möglicherweise. Es dauert aber, herauszufinden, in welchem Artikel, Listenartikel oder Unterartikel. Die #Suche von Wikipedia verwendet zwar #ElasticSearch, aber um die Vorteile von dieser starken Engine auch zu erhalten, hätten 100000e Menschen, die Wikipedia-Artikel auch verschlagworten müssen (#wikidata). Ausserdem kann es sein, dass etwas so praktisches wie formatstrings als #unenzyklpädisch eingestuft wurde und daher entfernt.
In #Stackexchange muss ich mehrfach bestätigen, dass ich ein Mensch bin, finde dann einen Artikel, der unbeantwortet geschlossen wurde, weil #Duplikat. Dann zwei veraltete, die inzwischen falsch sind, dann welche mit einem nicht mehr funktionierenden link auf die Lösung.
Bei #archive_org, archive.is und #AnnasArchive muss ich die #URL des gesuchten Artikels wissen, um suchen zu können.
Eine #Suchmaschine sucht nicht. Eine Suchmaschine liest die "Sutemap.XML" Dateien aus, die websitebetreiber online stellen für die #crawler der Suchmaschinen. Ich finde also fünf Jahre alte Artikel auf Websites die seit fünf Jahren nicht mehr gepflegt werden. Und maximal ein jahr alte Artikel, die meine Frage nicht beantworten aber in der #sitemap stehen. Die 100 Websites, die die richtige Antwort in einem zwei bis vier Jahre alte Artikel enthalten, finde ich nicht, weil diese Artikel nicht mehr in der sitemap stehen.
Die GPTbots haben Wikipedia, stackexchange, Archiv.org, Annas archive und alle Websites gescrapt und dabei #robots.txt und sitemap ignoriert. Ich bekomme die richtige Antwort und zwar schneller als mit allen zuvor genannten Varianten.
Oder ich suche in #Grokipedia. Grokipedia besteht aus 1Million statischen seiten im #CDN von #Cloudflare die von wikipedia gescrapt wurden. Die suche ist ein GPTbot und 57mal besser als die suche in wikipedia.
@malteengeler @awinkler @evawolfangel @bkastl @Raymond @wikipedia

#formatstring #gptbot #quota #api #wikipedia #suche
Strange Quark @[email protected] · 2025-10-16 · 07:56 UTC

An obituary to robots.txt
https://www.heise.de/en/background/Obituary-Farewell-to-robots-txt-1994-2025-10766991.html
#search #index #indexing #www #web #w3 #robot #robots #robotstxt #ai #crawler

#search #index #indexing #www #web #w3
Aktion Freiheit statt Angst @[email protected] · 2025-10-08 · 08:24 UTC

Leben in der Welt der Bots
Klicks sind nicht gleich Klicks
Mehr dazu bei https://t3n.de/news/studie-anstieg-bot-traffic-seo-1710540/
a-fsa.de/d/3KH
Link zu dieser Seite: https://www.a-fsa.de/de/articles/9305-20251008-leben-in-der-welt-der-bots.html
Link im Tor-Netzwerk: http://a6pdp5vmmw4zm5tifrc3qo2pyz7mvnk4zzimpesnckvzinubzmioddad.onion/de/articles/9305-20251008-leben-in-der-welt-der-bots.html
Tags: #Bots #Crawler #Spider #Klicks #AI #KI #OpenAI #Meta #Web #Internet #Publisher #Inhalte #Verdienst #Zensur #Transparenz #Informationsfreiheit #Meinungsmonopol #Meinungsfreiheit #Pressefreiheit #Internetsperren #Verhaltensänderung

#bots #crawler #spider #klicks #ai #ki
Gea-Suan Lin @[email protected] · 2025-09-09 · 06:06 UTC

https://blog.gslin.org/archives/2025/09/09/12617/%e5%a4%a7%e5%ae%b6%e9%83%bd%e8%a2%ab-facebookexternalhit-%e6%89%93/
大家都被 facebookexternalhit 打...
#ai #bug #crawler #external #facebook #facebookexternalhit #meta

#ai #bug #crawler #external #facebook #facebookexternalhit
h o ʍ l e t t @[email protected] · 2025-08-11 · 15:18 UTC

→ Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives
https://blog.cloudflare.com/perplexity-is-using-stealth-undeclared-crawlers-to-evade-website-no-crawl-directives/
“We observed that #Perplexity [an AI-powered answer engine] uses not only their declared #user_agent, but also a generic browser intended to #impersonate Google Chrome on macOS when their declared crawler was blocked.”
“This activity was observed across tens of thousands of domains and millions of requests per day.”
#AI #evade #stealth #website #browser #crawler #blocked

#perplexity #user_agent #impersonate #ai #evade #stealth
Max Resing @[email protected] · 2025-08-07 · 09:22 UTC

@dgouttegattat - I believe you found evidence of a #GenerativeAI #crawler leveraging #residentialproxy infrastructure, like the one offered through #brightdata or competitors.
It even advertises itself for #AI crawling use cases: https://brightdata.com/

#generativeai #crawler #residentialproxy #brightdata #ai
Marcel SIneM(S)US @[email protected] · 2025-07-31 · 06:01 UTC

#IETF diskutiert Maßnahmen gegen den Ansturm der KI-#Crawler | heise online https://www.heise.de/news/Technische-Massnahmen-gegen-den-Ansturm-der-KI-Crawler-10497930.html #Webcrawler #ArtificialIntelligence

#ietf #crawler #webcrawler #artificialintelligence
@francks @[email protected] · 2025-07-07 · 10:29 UTC

Our small team vs millions of bots
https://www.fsf.org/blogs/sysadmin/our-small-team-vs-millions-of-bots
#fsf #freesoftware #ddos #javascrip #anubis #botnet #llm #scraper #crawler #proprietarysoftware #malware

#malware #proprietarysoftware #crawler #scraper #llm #botnet
Gea-Suan Lin @[email protected] · 2025-04-13 · 20:23 UTC

在 LLM crawler 盛行的年代擋 bot...
在「把 wiki 搬回到家裡的機器上」之後，就更容易看出來上面的 loading 了 (因為目前上面只有一個站台)。這個是 monitorix 的週圖：這個是月圖：搬回來後就一直有看到 crawler 的量在上面掃，一開始還沒管太多，後來發現愈來愈嚴重 (幾乎所有的 bot 都會因為你撐的住就加速)，還是研究了在 Caddy 上擋 bot 的方案。這邊採用兩個方案，一個是 IP-based 的，另外一個是 User-Agent-based 的。 IP-based 的部分用的是 caddy-defender 的方案，擋掉所有常見的 bot 網段 (包括了 cloud 以及 VPS 的網段)： defender block { ranges aws azurepubliccloud…
https://blog.gslin.org/archives/2025/04/14/12344/%e5%9c%a8-llm-crawler-%e7%9b%9b%e8%a1%8c%e7%9a%84%e5%b9%b4%e4%bb%a3%e6%93%8b-bot/
#blocker #bot #caddy #crawler #defender #llm #php #web #wiki

#blocker #bot #caddy #crawler #defender #llm
Hacker News @[email protected] · 2025-03-22 · 14:25 UTC

Improved ways to operate a rude crawler
https://www.marginalia.nu/log/a_115_rude_crawler/
#HackerNews #Improved #Crawler #Rude #Crawler #Techniques #Web #Scraping #Automation

#hackernews #improved #crawler #rude #techniques #web
MOVED to: @[email protected] @[email protected] · 2025-03-20 · 23:59 UTC

Meta's AI Bot, cannot be blocked by JavaScript detection. That is because Meta's AI Bot, is running a real web browser, just like a user. The script side of things is on their server - Not your typical crawler.

#WebCrawler #Crawler #AI #ArtificialIntelligence #Meta

#webcrawler #crawler #ai #artificialintelligence #meta
Inautilo @[email protected] · 2025-03-08 · 10:05 UTC

#Development #Reports
Redirecting 404s to homepage? · Google’s Martin Splitt warns against it https://ilo.im/162p5g
_____
#Business #Google #SearchEngine #Crawler #Bot #SEO #TechnicalSEO #Redirects #WebDev #Backend

#development #reports #business #google #searchengine #crawler
Max Resing @[email protected] · 2025-01-11 · 12:52 UTC
It looks like LLM-producing companies that are massively #crawling the #web require the owners of a website to take action to opt out. Albeit I am not intrinsically against #generativeai and the acquisition of #opendata, reading about hundreds of dollars of rising #cloud costs for hobby projects is quite concerning. How is it accepted that hypergiants skyrocket the costs of tightly budgeted projects through massive spikes in egress traffic and increased processing requirements? Projects that run on a shoestring budget and are operated by volunteers who dedicated hundreds of hours without any reward other than believing in their mission?
I am mostly concerned about the default of opting out. Are the owners of those projects required to take action? Seriously? As an #operator, it would be my responsibility to methodically work myself through the crawling documentation of the hundreds of #LLM #web #crawlers? I am the one responsible for configuring a unique crawling specification in my robots.txt because hypergiants make it immanently hard to have generic #opt-out configurations that tackle LLM projects specifically?
I reject to accept that this is our new norm. A norm in which hypergiants are not only methodically exploiting the work of thousands of individuals for their own benefit and without returning a penny. But also a norm, in which the resource owner is required to prevent these crawlers from skyrocketing one's own operational costs?
We require a new #opt-in. Often, public and open projects are keen to share their data. They just don't like the idea of carrying the unpredictable, multitudinous financial burden of sharing the data without notice from said crawlers. Even #CommonCrawl has safe-fail mechanisms to reduce the burden on website owners. Why are LLM crawlers above the guidelines of good #Internet citizenship?
To counter the most common argument already: Yes, you can deny-by-default in your robots.txt, but that excludes any non-mainstream browser, too.
Some concerning #news articles on the topic:
- https://archive.is/nQ6Gk
- https://archive.is/CRwVs
#webcrawling #crawler #web #opensource
#crawling #web #generativeai #opendata #cloud #operator
Bastian Greshake Tzovaras @[email protected] · 2024-07-25 · 09:26 UTC

Here's some details on the .htaccess we use for minimizing the impact of the "AI" crawlers: https://wiki.openhumans.org/wiki/PersonalScienceWiki:Spam#.htaccess_User-Agent_blocks /cc @jascha
#ai #crawler #crawl #htaccess

#ai #crawler #crawl #htaccess
Inautilo @[email protected] · 2024-07-03 · 14:05 UTC

#Business #Reports
Google warns of soft 404 errors · They can impact a website’s crawlability and indexing https://ilo.im/15zcu2
_____
#Google #SearchEngine #SEO #Crawler #HTTP #StatusCode #Error404 #Development #WebDev #Backend

#business #reports #google #searchengine #seo #crawler
Paolo Fabio Zaino ☮️🌍💻🎸🎮☕️🍩🍕 @[email protected] · 2024-06-05 · 15:30 UTC

Given all the recent updates to the #CROWler #gpt I have decided to rename it to "The CROWler Support" as it can now provide support on everything, not just the rulesets creation/debugging. The link has changed, so here is the new link for everyone. Enjoy and happy content discovery development!
#CyberSecurity #ContentDiscovery #crawler #AI #ChatGPT
https://chatgpt.com/g/g-dEfqHkqrW-the-crowler-support

#crowler #gpt #cybersecurity #contentdiscovery #crawler #ai
Vedran Mandić @[email protected] · 2024-02-03 · 22:26 UTC

Released Tris - v1.3.1, added a killer feature: "Clear" button to help you clear input easier while on mobile UI 🙂
Release notes: https://github.com/vmandic/tris-web-crawler/releases/tag/v1.3.1
I focused mostly on CSS animating a spider emoji to walk a web emoji! https://tris.fly.dev/
#indiedev #crawler #tris #triswebcrawler #seo #seotools #webcrawler #scraper #nodejs #release #dev #indieapp

#indiedev #crawler #tris #triswebcrawler #seo #seotools
Vedran Mandić @[email protected] · 2024-02-03 · 22:26 UTC

Released Tris - v1.3.1, added a killer feature: "Clear" button to help you clear input easier while on mobile UI 🙂
Release notes: https://github.com/vmandic/tris-web-crawler/releases/tag/v1.3.1
I focused mostly on CSS animating a spider emoji to walk a web emoji! https://tris.fly.dev/
#indiedev #crawler #tris #triswebcrawler #seo #seotools #webcrawler #scraper #nodejs #release #dev #indieapp

#indiedev #crawler #tris #triswebcrawler #seo #seotools
Vedran Mandić @vekzdran · 2024-02-03 · 22:26 UTC

Released Tris - v1.3.1, added a killer feature: "Clear" button to help you clear input easier while on mobile UI 🙂
Release notes: https://github.com/vmandic/tris-web-crawler/releases/tag/v1.3.1
I focused mostly on CSS animating a spider emoji to walk a web emoji! https://tris.fly.dev/
#indiedev #crawler #tris #triswebcrawler #seo #seotools #webcrawler #scraper #nodejs #release #dev #indieapp

#indiedev #crawler #tris #triswebcrawler #seo #seotools
Vedran Mandić @[email protected] · 2024-02-03 · 22:26 UTC

Released Tris - v1.3.1, added a killer feature: "Clear" button to help you clear input easier while on mobile UI 🙂
Release notes: https://github.com/vmandic/tris-web-crawler/releases/tag/v1.3.1
I focused mostly on CSS animating a spider emoji to walk a web emoji! https://tris.fly.dev/
#indiedev #crawler #tris #triswebcrawler #seo #seotools #webcrawler #scraper #nodejs #release #dev #indieapp

#indieapp #dev #release #nodejs #scraper #webcrawler
Vedran Mandić @[email protected] · 2024-01-27 · 23:17 UTC

I am so happy with the first own web application 🎉 I have developed: Tris, a simple and free web crawler 🕸️ 🕷️ !
You can try it for free online: https://tris.fly.dev, limited to 3 parallel crawls and 100 links of path depth of 3.
Next thing I will add will be a text input to set a target domain hhh, now I am making it hard! 🙈
#node #nodejs #web #webcrawler #crawler #seo #datatools #webscraper #scraping #seotools #seotool #tris #triswebcrawler #webapp #indie #indiedev

#node #nodejs #web #webcrawler #crawler #seo
cutterkom @[email protected] · 2023-08-18 · 08:43 UTC

Robots.txt, OpenAI’s GPTBot, Common Crawl’s CCBot: How to block AI crawlers from gathering text and images from your website: https://katharinabrunner.de/2023/08/robots-txt-openais-gptbot-common-crawls-ccbot-how-to-block-ai-crawlers-from-gathering-text-and-images-from-your-website/
#ai #openAI #crawler #commoncrawl #ccbot #GPTBot #robotstxt #wordpress

#ai #openai #crawler #commoncrawl #ccbot #gptbot
PrivacyDigest @[email protected] · 2023-08-13 · 04:06 UTC

Sites scramble to block #ChatGPT web #crawler after instructions emerge
Without announcement, #OpenAI recently added details about its web crawler, #GPTBot, to its online documentation site.
#privacy
https://arstechnica.com/?p=1960108

#privacy #gptbot #openai #crawler #chatgpt
José Pedro Mayo @[email protected] · 2023-01-23 · 08:52 UTC

offsec.tools - A vast collection of security tools
https://offsec.tools/
#CyberSecurity #osint #pentest #scanner #cve #vulnerabilities #burpsuite #endpoints #passwords #cloud #secrets #fuzzing #dns #ips #framework #network #directories #crawler #screeenshots #git #cms #allinone #proxy #probing

#cybersecurity #osint #pentest #scanner #cve #vulnerabilities