#scrapingbots — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #scrapingbots, aggregated by home.social.
-
Nuestra instancia de PeerTube esta siendo bombardeada por bots haciendo scraping y haciendo requests de transcodificado de video. Afortunadamente estoy terminando de pulir un script que les hace sonar el orto cuando detecta varias cosas que los delatan... Por suerte el server se la banca bastante... ampliaremos... #peertube #bots #scraping #scrapingbots #ddos
-
Nuestra instancia de PeerTube esta siendo bombardeada por bots haciendo scraping y haciendo requests de transcodificado de video. Afortunadamente estoy terminando de pulir un script que les hace sonar el orto cuando detecta varias cosas que los delatan... Por suerte el server se la banca bastante... ampliaremos... #peertube #bots #scraping #scrapingbots #ddos
-
Nuestra instancia de PeerTube esta siendo bombardeada por bots haciendo scraping y haciendo requests de transcodificado de video. Afortunadamente estoy terminando de pulir un script que les hace sonar el orto cuando detecta varias cosas que los delatan... Por suerte el server se la banca bastante... ampliaremos... #peertube #bots #scraping #scrapingbots #ddos
-
Nuestra instancia de PeerTube esta siendo bombardeada por bots haciendo scraping y haciendo requests de transcodificado de video. Afortunadamente estoy terminando de pulir un script que les hace sonar el orto cuando detecta varias cosas que los delatan... Por suerte el server se la banca bastante... ampliaremos... #peertube #bots #scraping #scrapingbots #ddos
-
Nuestra instancia de PeerTube esta siendo bombardeada por bots haciendo scraping y haciendo requests de transcodificado de video. Afortunadamente estoy terminando de pulir un script que les hace sonar el orto cuando detecta varias cosas que los delatan... Por suerte el server se la banca bastante... ampliaremos... #peertube #bots #scraping #scrapingbots #ddos
-
Quo Vadis, Crawlers? Progress and what’s next on safeguarding our infrastructure https://diff.wikimedia.org/2026/03/26/quo-vadis-crawlers-progress-and-whats-next-on-safeguarding-our-infrastructure/ #AI, #AIDataCrawlers, #Crawlers, #Infrastructure, #Knowledge, #KnowledgeAsAService, #Scraping, #ScrapingBots, #WebScraping, #WikimediaFoundation, #WikimediaProjects
-
Quo Vadis, Crawlers? Progress and what’s next on safeguarding our infrastructure https://diff.wikimedia.org/2026/03/26/quo-vadis-crawlers-progress-and-whats-next-on-safeguarding-our-infrastructure/ #AI, #AIDataCrawlers, #Crawlers, #Infrastructure, #Knowledge, #KnowledgeAsAService, #Scraping, #ScrapingBots, #WebScraping, #WikimediaFoundation, #WikimediaProjects
-
Quo Vadis, Crawlers? Progress and what’s next on safeguarding our infrastructure https://diff.wikimedia.org/2026/03/26/quo-vadis-crawlers-progress-and-whats-next-on-safeguarding-our-infrastructure/ #AI, #AIDataCrawlers, #Crawlers, #Infrastructure, #Knowledge, #KnowledgeAsAService, #Scraping, #ScrapingBots, #WebScraping, #WikimediaFoundation, #WikimediaProjects
-
Quo Vadis, Crawlers? Progress and what’s next on safeguarding our infrastructure https://diff.wikimedia.org/2026/03/26/quo-vadis-crawlers-progress-and-whats-next-on-safeguarding-our-infrastructure/ #AI, #AIDataCrawlers, #Crawlers, #Infrastructure, #Knowledge, #KnowledgeAsAService, #Scraping, #ScrapingBots, #WebScraping, #WikimediaFoundation, #WikimediaProjects
-
Quo Vadis, Crawlers? Progress and what’s next on safeguarding our infrastructure https://diff.wikimedia.org/2026/03/26/quo-vadis-crawlers-progress-and-whats-next-on-safeguarding-our-infrastructure/ #AI, #AIDataCrawlers, #Crawlers, #Infrastructure, #Knowledge, #KnowledgeAsAService, #Scraping, #ScrapingBots, #WebScraping, #WikimediaFoundation, #WikimediaProjects
-
Well, this is a step in the right direction:
https://www.theverge.com/news/841222/rsl-licensing-ai-spec-launch
-
Well, this is a step in the right direction:
https://www.theverge.com/news/841222/rsl-licensing-ai-spec-launch
-
Well, this is a step in the right direction:
https://www.theverge.com/news/841222/rsl-licensing-ai-spec-launch
-
Well, this is a step in the right direction:
https://www.theverge.com/news/841222/rsl-licensing-ai-spec-launch
-
Well, this is a step in the right direction:
https://www.theverge.com/news/841222/rsl-licensing-ai-spec-launch
-
Wikimedia Infrastructure is being mass-scraped for AI Usage — the content is free, the infrastructure is not. https://diff.wikimedia.org/2025/04/01/how-crawlers-impact-the-operations-of-the-wikimedia-projects/ #AI, #Crawlers, #Infrastructure, #KnowledgeAsAService, #KnowledgeContent, #Operations, #Scraping, #ScrapingBots, #Traffic, #WikimediaFoundation, #WikimediaProjects
(original repost on lobsters: https://lobste.rs/s/autpsf/how_crawlers_impact_operations)
-
If #Cloudflare is to be believed, #Lemmy instances have a built-in AI scraping bot operating beneath the covers. Do you think the developers have snuck it in?
Looking through my logs, these requests have all been blocked by Cloudflare because they are identified as "AI Bots". There are many more requests by Lemmy instances blocked in the logs. This is just a sample. Other Lemmy requests from these servers get through. Only a few are blocked as AI Bots.
Cloudflare says they use AI to determine if a request is a legitimate request or an AI bot trying to scrape.
207.204.58.144
AS19045 DIRECTCOM
United States
User agent: Lemmy/0.19.5; +https://lemmy.cryonex.net23.127.223.238
AS7018 ATT-INTERNET4
United States
User agent: Lemmy/0.19.3; +https://lemux.minnix.dev2a01:cb19:f85:ec00:82fa:5bff:fe51:ed4a
AS3215 France Telecom - Orange
France
User agent: Lemmy/0.19.5; +https://lemmy.sidh.bzh50.247.53.42
AS7922 COMCAST-7922
United States
User agent: Lemmy/0.19.5; +https://toast.ooo69.42.19.234
AS11404 AS-WAVE-1
United States
User agent: Lemmy/0.19.5; +https://lemmy.schlunker.com155.138.226.183
AS20473 AS-CHOOPA
United States
User agent: Lemmy/0.19.5; +https://lemmy.mbl.social#MastoAdmin #AIBots #Scrapers #Scraping #ScrapingBots #privacy
-
If #Cloudflare is to be believed, #Lemmy instances have a built-in AI scraping bot operating beneath the covers. Do you think the developers have snuck it in?
Looking through my logs, these requests have all been blocked by Cloudflare because they are identified as "AI Bots". There are many more requests by Lemmy instances blocked in the logs. This is just a sample. Other Lemmy requests from these servers get through. Only a few are blocked as AI Bots.
Cloudflare says they use AI to determine if a request is a legitimate request or an AI bot trying to scrape.
207.204.58.144
AS19045 DIRECTCOM
United States
User agent: Lemmy/0.19.5; +https://lemmy.cryonex.net23.127.223.238
AS7018 ATT-INTERNET4
United States
User agent: Lemmy/0.19.3; +https://lemux.minnix.dev2a01:cb19:f85:ec00:82fa:5bff:fe51:ed4a
AS3215 France Telecom - Orange
France
User agent: Lemmy/0.19.5; +https://lemmy.sidh.bzh50.247.53.42
AS7922 COMCAST-7922
United States
User agent: Lemmy/0.19.5; +https://toast.ooo69.42.19.234
AS11404 AS-WAVE-1
United States
User agent: Lemmy/0.19.5; +https://lemmy.schlunker.com155.138.226.183
AS20473 AS-CHOOPA
United States
User agent: Lemmy/0.19.5; +https://lemmy.mbl.social#MastoAdmin #AIBots #Scrapers #Scraping #ScrapingBots #privacy
-
If #Cloudflare is to be believed, #Lemmy instances have a built-in AI scraping bot operating beneath the covers. Do you think the developers have snuck it in?
Looking through my logs, these requests have all been blocked by Cloudflare because they are identified as "AI Bots". There are many more requests by Lemmy instances blocked in the logs. This is just a sample. Other Lemmy requests from these servers get through. Only a few are blocked as AI Bots.
Cloudflare says they use AI to determine if a request is a legitimate request or an AI bot trying to scrape.
207.204.58.144
AS19045 DIRECTCOM
United States
User agent: Lemmy/0.19.5; +https://lemmy.cryonex.net23.127.223.238
AS7018 ATT-INTERNET4
United States
User agent: Lemmy/0.19.3; +https://lemux.minnix.dev2a01:cb19:f85:ec00:82fa:5bff:fe51:ed4a
AS3215 France Telecom - Orange
France
User agent: Lemmy/0.19.5; +https://lemmy.sidh.bzh50.247.53.42
AS7922 COMCAST-7922
United States
User agent: Lemmy/0.19.5; +https://toast.ooo69.42.19.234
AS11404 AS-WAVE-1
United States
User agent: Lemmy/0.19.5; +https://lemmy.schlunker.com155.138.226.183
AS20473 AS-CHOOPA
United States
User agent: Lemmy/0.19.5; +https://lemmy.mbl.social#MastoAdmin #AIBots #Scrapers #Scraping #ScrapingBots #privacy
-
If #Cloudflare is to be believed, #Lemmy instances have a built-in AI scraping bot operating beneath the covers. Do you think the developers have snuck it in?
Looking through my logs, these requests have all been blocked by Cloudflare because they are identified as "AI Bots". There are many more requests by Lemmy instances blocked in the logs. This is just a sample. Other Lemmy requests from these servers get through. Only a few are blocked as AI Bots.
Cloudflare says they use AI to determine if a request is a legitimate request or an AI bot trying to scrape.
207.204.58.144
AS19045 DIRECTCOM
United States
User agent: Lemmy/0.19.5; +https://lemmy.cryonex.net23.127.223.238
AS7018 ATT-INTERNET4
United States
User agent: Lemmy/0.19.3; +https://lemux.minnix.dev2a01:cb19:f85:ec00:82fa:5bff:fe51:ed4a
AS3215 France Telecom - Orange
France
User agent: Lemmy/0.19.5; +https://lemmy.sidh.bzh50.247.53.42
AS7922 COMCAST-7922
United States
User agent: Lemmy/0.19.5; +https://toast.ooo69.42.19.234
AS11404 AS-WAVE-1
United States
User agent: Lemmy/0.19.5; +https://lemmy.schlunker.com155.138.226.183
AS20473 AS-CHOOPA
United States
User agent: Lemmy/0.19.5; +https://lemmy.mbl.social#MastoAdmin #AIBots #Scrapers #Scraping #ScrapingBots #privacy
-
If #Cloudflare is to be believed, #Lemmy instances have a built-in AI scraping bot operating beneath the covers. Do you think the developers have snuck it in?
Looking through my logs, these requests have all been blocked by Cloudflare because they are identified as "AI Bots". There are many more requests by Lemmy instances blocked in the logs. This is just a sample. Other Lemmy requests from these servers get through. Only a few are blocked as AI Bots.
Cloudflare says they use AI to determine if a request is a legitimate request or an AI bot trying to scrape.
207.204.58.144
AS19045 DIRECTCOM
United States
User agent: Lemmy/0.19.5; +https://lemmy.cryonex.net23.127.223.238
AS7018 ATT-INTERNET4
United States
User agent: Lemmy/0.19.3; +https://lemux.minnix.dev2a01:cb19:f85:ec00:82fa:5bff:fe51:ed4a
AS3215 France Telecom - Orange
France
User agent: Lemmy/0.19.5; +https://lemmy.sidh.bzh50.247.53.42
AS7922 COMCAST-7922
United States
User agent: Lemmy/0.19.5; +https://toast.ooo69.42.19.234
AS11404 AS-WAVE-1
United States
User agent: Lemmy/0.19.5; +https://lemmy.schlunker.com155.138.226.183
AS20473 AS-CHOOPA
United States
User agent: Lemmy/0.19.5; +https://lemmy.mbl.social#MastoAdmin #AIBots #Scrapers #Scraping #ScrapingBots #privacy
-
😤 #Scraperbots are automating data theft, extracting your website's content without permission! 🌐
💣 Learn about the impact of scraper bots and how to prevent them: https://bit.ly/3RiXgya
#contentscraping #bots #webscrapers #webcrawlers #scraping #waf #botmanagement #waap #scrapingbots #apptrana #indusface
-
😤 #Scraperbots are automating data theft, extracting your website's content without permission! 🌐
💣 Learn about the impact of scraper bots and how to prevent them: https://bit.ly/3RiXgya
#contentscraping #bots #webscrapers #webcrawlers #scraping #waf #botmanagement #waap #scrapingbots #apptrana #indusface
-
😤 #Scraperbots are automating data theft, extracting your website's content without permission! 🌐
💣 Learn about the impact of scraper bots and how to prevent them: https://bit.ly/3RiXgya
#contentscraping #bots #webscrapers #webcrawlers #scraping #waf #botmanagement #waap #scrapingbots #apptrana #indusface