#scraping — Public Fediverse posts

https://cookie-script.com/guides/beyond-robots-txt-implementing-ai-txt-and-llms-txt-for-purpose-based-scraping-control?ref=frontenddogma.com

#comparisons #llmstxt #crawling #scraping #ai

Frontend Dogma @[email protected] · 2026-05-24 · 12:40 UTC

Beyond robots.txt: Implementing ai.txt and llms.txt for Purpose-Based Scraping Control, by (not on Mastodon or Bluesky):

https://cookie-script.com/guides/beyond-robots-txt-implementing-ai-txt-and-llms-txt-for-purpose-based-scraping-control?ref=frontenddogma.com

#comparisons #llmstxt #crawling #scraping #ai

Frontend Dogma @[email protected] · 2026-05-24 · 12:40 UTC

Beyond robots.txt: Implementing ai.txt and llms.txt for Purpose-Based Scraping Control, by (not on Mastodon or Bluesky):

https://cookie-script.com/guides/beyond-robots-txt-implementing-ai-txt-and-llms-txt-for-purpose-based-scraping-control?ref=frontenddogma.com

#comparisons #llmstxt #crawling #scraping #ai

Frontend Dogma @frontenddogma · 2026-05-24 · 12:40 UTC

Beyond robots.txt: Implementing ai.txt and llms.txt for Purpose-Based Scraping Control, by (not on Mastodon or Bluesky):

https://cookie-script.com/guides/beyond-robots-txt-implementing-ai-txt-and-llms-txt-for-purpose-based-scraping-control?ref=frontenddogma.com

#comparisons #llmstxt #crawling #scraping #ai

Frontend Dogma @[email protected] · 2026-05-24 · 12:40 UTC

Beyond robots.txt: Implementing ai.txt and llms.txt for Purpose-Based Scraping Control, by (not on Mastodon or Bluesky):

#comparisons #llmstxt #crawling #scraping #ai

Some Bits: Nelson's Linkblog @[email protected] · 2026-05-21 · 13:29 UTC

Scrapers vs Wikis: Person who runs a bunch of custom Wiki websites writes about abuse from scrapers
https://weirdgloop.org/blog/clankers
#via:lobsters #robotstxt #scraping #scaling #wiki #web #ai #+

#via #robotstxt #scraping #scaling #wiki #web

Some Bits: Nelson's Linkblog @[email protected] · 2026-05-21 · 13:29 UTC

Scrapers vs Wikis: Person who runs a bunch of custom Wiki websites writes about abuse from scrapers
https://weirdgloop.org/blog/clankers
#via:lobsters #robotstxt #scraping #scaling #wiki #web #ai #+

#via #robotstxt #scraping #scaling #wiki #web

Some Bits: Nelson's Linkblog @[email protected] · 2026-05-21 · 13:29 UTC

Scrapers vs Wikis: Person who runs a bunch of custom Wiki websites writes about abuse from scrapers
https://weirdgloop.org/blog/clankers
#via:lobsters #robotstxt #scraping #scaling #wiki #web #ai #+

#via #robotstxt #scraping #scaling #wiki #web

Some Bits: Nelson's Linkblog @[email protected] · 2026-05-21 · 13:29 UTC

Scrapers vs Wikis: Person who runs a bunch of custom Wiki websites writes about abuse from scrapers
https://weirdgloop.org/blog/clankers
#via:lobsters #robotstxt #scraping #scaling #wiki #web #ai #+

#ai #web #wiki #scaling #scraping #robotstxt

Inautilo @[email protected] · 2026-05-21 · 07:23 UTC

#Development #Demos
WebMCP Demo · How AI agents interact with web pages today and tomorrow https://ilo.im/16d3mq

_____
#Comparisons #AI #AiAgents #Content #Website #AgenticWeb #WebMCP #Scraping #WebDev #Frontend

#development #demos #comparisons #ai #aiagents #content

Inautilo @[email protected] · 2026-05-21 · 07:23 UTC

#Development #Demos
WebMCP Demo · How AI agents interact with web pages today and tomorrow https://ilo.im/16d3mq

_____
#Comparisons #AI #AiAgents #Content #Website #AgenticWeb #WebMCP #Scraping #WebDev #Frontend

#development #demos #comparisons #ai #aiagents #content

Inautilo @[email protected] · 2026-05-21 · 07:23 UTC

#Development #Demos
WebMCP Demo · How AI agents interact with web pages today and tomorrow https://ilo.im/16d3mq

_____
#Comparisons #AI #AiAgents #Content #Website #AgenticWeb #WebMCP #Scraping #WebDev #Frontend

#frontend #webdev #scraping #webmcp #agenticweb #website

Inautilo @[email protected] · 2026-05-21 · 07:23 UTC

#Development #Demos
WebMCP Demo · How AI agents interact with web pages today and tomorrow https://ilo.im/16d3mq

_____
#Comparisons #AI #AiAgents #Content #Website #AgenticWeb #WebMCP #Scraping #WebDev #Frontend

#development #demos #comparisons #ai #aiagents #content

d'aïeux et d'ailleurs @[email protected] · 2026-05-21 · 05:46 UTC

@RainbowFrog moi j'utilise l'extension webscraper pour ce genre de truc https://webscraper.io/ (dans Firefox ou Chrome)
#webscraping #scraping
@belett @Ash_Crow

#webscraping #scraping

d'aïeux et d'ailleurs @[email protected] · 2026-05-21 · 05:46 UTC

@RainbowFrog moi j'utilise l'extension webscraper pour ce genre de truc https://webscraper.io/ (dans Firefox ou Chrome)
#webscraping #scraping
@belett @Ash_Crow

#webscraping #scraping

d'aïeux et d'ailleurs @[email protected] · 2026-05-21 · 05:46 UTC

@RainbowFrog moi j'utilise l'extension webscraper pour ce genre de truc https://webscraper.io/ (dans Firefox ou Chrome)
#webscraping #scraping
@belett @Ash_Crow

#webscraping #scraping

d'aïeux et d'ailleurs @[email protected] · 2026-05-21 · 05:46 UTC

@RainbowFrog moi j'utilise l'extension webscraper pour ce genre de truc https://webscraper.io/ (dans Firefox ou Chrome)
#webscraping #scraping
@belett @Ash_Crow

#scraping #webscraping

d'aïeux et d'ailleurs @[email protected] · 2026-05-21 · 05:46 UTC

@RainbowFrog moi j'utilise l'extension webscraper pour ce genre de truc https://webscraper.io/ (dans Firefox ou Chrome)
#webscraping #scraping
@belett @Ash_Crow

#webscraping #scraping

Alex / catileptic @[email protected] · 2026-05-19 · 15:16 UTC

Meta seems to be doing some truly hostile anti-scraping webdev

all HTML classes have random IDs and i would not be surprised to see that they also change frequently (i will find out soon enough) (example: https://privacycenter.instagram.com/policy )

is there some sort of "fingerprinting" technique for DOM elements for a situation like this? let's say I have the entire page (downloaded with wget) and i want to answer the question "has the content of this page changed?"

Alex / catileptic @[email protected] · 2026-05-19 · 15:16 UTC

Meta seems to be doing some truly hostile anti-scraping webdev

all HTML classes have random IDs and i would not be surprised to see that they also change frequently (i will find out soon enough) (example: https://privacycenter.instagram.com/policy )

is there some sort of "fingerprinting" technique for DOM elements for a situation like this? let's say I have the entire page (downloaded with wget) and i want to answer the question "has the content of this page changed?"

Alex / catileptic @[email protected] · 2026-05-19 · 15:16 UTC

Meta seems to be doing some truly hostile anti-scraping webdev

all HTML classes have random IDs and i would not be surprised to see that they also change frequently (i will find out soon enough) (example: https://privacycenter.instagram.com/policy )

is there some sort of "fingerprinting" technique for DOM elements for a situation like this? let's say I have the entire page (downloaded with wget) and i want to answer the question "has the content of this page changed?"

Alex / catileptic @[email protected] · 2026-05-19 · 15:16 UTC

Meta seems to be doing some truly hostile anti-scraping webdev

all HTML classes have random IDs and i would not be surprised to see that they also change frequently (i will find out soon enough) (example: https://privacycenter.instagram.com/policy )

is there some sort of "fingerprinting" technique for DOM elements for a situation like this? let's say I have the entire page (downloaded with wget) and i want to answer the question "has the content of this page changed?"

#css #html #research #webdev #scraping

Alex / catileptic @[email protected] · 2026-05-19 · 15:16 UTC

Meta seems to be doing some truly hostile anti-scraping webdev

all HTML classes have random IDs and i would not be surprised to see that they also change frequently (i will find out soon enough) (example: https://privacycenter.instagram.com/policy )

is there some sort of "fingerprinting" technique for DOM elements for a situation like this? let's say I have the entire page (downloaded with wget) and i want to answer the question "has the content of this page changed?"

N-gated Hacker News @[email protected] · 2026-05-14 · 21:58 UTC

🎉 Look, another web scraper! 🎉 Because we *definitely* needed one more tool to fetch JSON from #Wikipedia faster than a cheetah on Red Bull. 🐆💨 No doubt, this will revolutionize the already groundbreaking field of #scraping celebrity birthdates. 🙄✨
https://scrapewithruno.com/ #webscraping #JSONtools #celebritybirthdates #technology #HackerNews #ngated

#wikipedia #scraping #webscraping #jsontools #celebritybirthdates #technology

N-gated Hacker News @[email protected] · 2026-05-14 · 21:58 UTC

🎉 Look, another web scraper! 🎉 Because we *definitely* needed one more tool to fetch JSON from #Wikipedia faster than a cheetah on Red Bull. 🐆💨 No doubt, this will revolutionize the already groundbreaking field of #scraping celebrity birthdates. 🙄✨
https://scrapewithruno.com/ #webscraping #JSONtools #celebritybirthdates #technology #HackerNews #ngated

#wikipedia #scraping #webscraping #jsontools #celebritybirthdates #technology

N-gated Hacker News @[email protected] · 2026-05-14 · 21:58 UTC

🎉 Look, another web scraper! 🎉 Because we *definitely* needed one more tool to fetch JSON from #Wikipedia faster than a cheetah on Red Bull. 🐆💨 No doubt, this will revolutionize the already groundbreaking field of #scraping celebrity birthdates. 🙄✨
https://scrapewithruno.com/ #webscraping #JSONtools #celebritybirthdates #technology #HackerNews #ngated

#wikipedia #scraping #webscraping #jsontools #celebritybirthdates #technology

N-gated Hacker News @[email protected] · 2026-05-14 · 21:58 UTC

🎉 Look, another web scraper! 🎉 Because we *definitely* needed one more tool to fetch JSON from #Wikipedia faster than a cheetah on Red Bull. 🐆💨 No doubt, this will revolutionize the already groundbreaking field of #scraping celebrity birthdates. 🙄✨
https://scrapewithruno.com/ #webscraping #JSONtools #celebritybirthdates #technology #HackerNews #ngated

#ngated #hackernews #technology #celebritybirthdates #jsontools #webscraping

N-gated Hacker News @[email protected] · 2026-05-14 · 21:58 UTC

🎉 Look, another web scraper! 🎉 Because we *definitely* needed one more tool to fetch JSON from #Wikipedia faster than a cheetah on Red Bull. 🐆💨 No doubt, this will revolutionize the already groundbreaking field of #scraping celebrity birthdates. 🙄✨
https://scrapewithruno.com/ #webscraping #JSONtools #celebritybirthdates #technology #HackerNews #ngated

#wikipedia #scraping #webscraping #jsontools #celebritybirthdates #technology

🪑Jeffrey Sabarese ♫ @[email protected] · 2026-05-13 · 18:28 UTC

NO SOUP FOR YOU!

Playwright
+ Ollama
==TRANSLITERATE==
BEAUTIFUL DATA

Build a self-auditing data pipeline that keeps my MariaDB in perfect sync.

Full workflow: https://dufospy.com/artificial-intelligence/data-mining-web-scraping-playwright-ollama

#beautifulsoup #playwright #data #scraping

🪑Jeffrey Sabarese ♫ @[email protected] · 2026-05-13 · 18:28 UTC

NO SOUP FOR YOU!

Playwright
+ Ollama
==TRANSLITERATE==
BEAUTIFUL DATA

Build a self-auditing data pipeline that keeps my MariaDB in perfect sync.

Full workflow: https://dufospy.com/artificial-intelligence/data-mining-web-scraping-playwright-ollama

#beautifulsoup #playwright #data #scraping

🪑Jeffrey Sabarese ♫ @[email protected] · 2026-05-13 · 18:28 UTC

NO SOUP FOR YOU!

Playwright
+ Ollama
==TRANSLITERATE==
BEAUTIFUL DATA

Build a self-auditing data pipeline that keeps my MariaDB in perfect sync.

Full workflow: https://dufospy.com/artificial-intelligence/data-mining-web-scraping-playwright-ollama

#beautifulsoup #playwright #data #scraping

🪑Jeffrey Sabarese ♫ @[email protected] · 2026-05-13 · 18:28 UTC

NO SOUP FOR YOU!

Playwright
+ Ollama
==TRANSLITERATE==
BEAUTIFUL DATA

Build a self-auditing data pipeline that keeps my MariaDB in perfect sync.

Full workflow: https://dufospy.com/artificial-intelligence/data-mining-web-scraping-playwright-ollama

#scraping #data #playwright #beautifulsoup

🪑Jeffrey Sabarese ♫ @[email protected] · 2026-05-13 · 18:28 UTC

NO SOUP FOR YOU!

Playwright
+ Ollama
==TRANSLITERATE==
BEAUTIFUL DATA

Build a self-auditing data pipeline that keeps my MariaDB in perfect sync.

Full workflow: https://dufospy.com/artificial-intelligence/data-mining-web-scraping-playwright-ollama