#googlebot — Public Fediverse posts on home.social

AliveDevil @[email protected] · 2026-05-22 · 18:50 UTC

Is it now okay to consider GoogleBot hostile and just 403 them away?

#WebAdmin #FreeWeb #GoogleBot #Google

#webadmin #freeweb #googlebot #google

AliveDevil @[email protected] · 2026-05-22 · 18:50 UTC

Is it now okay to consider GoogleBot hostile and just 403 them away?

#WebAdmin #FreeWeb #GoogleBot #Google

#webadmin #freeweb #googlebot #google

AliveDevil @[email protected] · 2026-05-22 · 18:50 UTC

Is it now okay to consider GoogleBot hostile and just 403 them away?

#WebAdmin #FreeWeb #GoogleBot #Google

#webadmin #freeweb #googlebot #google

AliveDevil @[email protected] · 2026-05-22 · 18:50 UTC

Is it now okay to consider GoogleBot hostile and just 403 them away?

#WebAdmin #FreeWeb #GoogleBot #Google

#google #googlebot #freeweb #webadmin

AliveDevil @[email protected] · 2026-05-22 · 18:50 UTC

Is it now okay to consider GoogleBot hostile and just 403 them away?

#WebAdmin #FreeWeb #GoogleBot #Google

#webadmin #freeweb #googlebot #google

Grumpy Old Techie 🕊️ @[email protected] · 2026-05-21 · 09:04 UTC

After Google’s announcement that they will start showing AI results rather than links in search results a lot of people showed interest in blocking them from scanning their sites.
If you want to do this rather block their IP ranges than use robots.txt. I recommend blocking all Google Cloud IP ranges as well. I only see malicious bot traffic from there.
Here is a good resource.

https://www.searchengineworld.com/googles-full-list-of-googlebot-ip-addresses-spider-ip-addresses

#google #googlebot #googleai #hosting #AI #seo #googlesearch

#google #googlebot #googleai #hosting #ai #seo

Habr @[email protected] · 2026-04-25 · 14:42 UTC

Пять неочевидных вещей, которые я узнал, запуская кино-соцсеть: от robots.txt-ловушки до 24-мерной математики вкуса

Последние полгода я работаю над VibeMuvik — кино-соцсетью с рецензиями, дебатами и синхронным просмотром фильмов. Одна из тех штук, которые «ну вроде несложно», пока не начинаешь копать. Эта статья — про неожиданные находки . Не про «как я выбрал стек» (скучно) и не про «туториал по WebRTC» (и без меня есть). Это пять ситуаций, в которых я споткнулся, обнаружил что-то интересное, и подумал «об этом стоит рассказать — другим пригодится». Поехали.

https://habr.com/ru/articles/1027876/

#robotstxt #SEO #WebRTC #Nextjs #IndexNow #sitemap #Googlebot #Cinema_DNA #синхронный_просмотр #рекомендательные_системы

#рекомендательные_системы #синхронный_просмотр #cinema_dna #googlebot #sitemap #indexnow

Habr @[email protected] · 2026-04-25 · 14:42 UTC

Пять неочевидных вещей, которые я узнал, запуская кино-соцсеть: от robots.txt-ловушки до 24-мерной математики вкуса

Последние полгода я работаю над VibeMuvik — кино-соцсетью с рецензиями, дебатами и синхронным просмотром фильмов. Одна из тех штук, которые «ну вроде несложно», пока не начинаешь копать. Эта статья — про неожиданные находки . Не про «как я выбрал стек» (скучно) и не про «туториал по WebRTC» (и без меня есть). Это пять ситуаций, в которых я споткнулся, обнаружил что-то интересное, и подумал «об этом стоит рассказать — другим пригодится». Поехали.

https://habr.com/ru/articles/1027876/

#robotstxt #SEO #WebRTC #Nextjs #IndexNow #sitemap #Googlebot #Cinema_DNA #синхронный_просмотр #рекомендательные_системы

#рекомендательные_системы #синхронный_просмотр #cinema_dna #googlebot #sitemap #indexnow

Habr @[email protected] · 2026-04-25 · 14:42 UTC

Пять неочевидных вещей, которые я узнал, запуская кино-соцсеть: от robots.txt-ловушки до 24-мерной математики вкуса

Последние полгода я работаю над VibeMuvik — кино-соцсетью с рецензиями, дебатами и синхронным просмотром фильмов. Одна из тех штук, которые «ну вроде несложно», пока не начинаешь копать. Эта статья — про неожиданные находки . Не про «как я выбрал стек» (скучно) и не про «туториал по WebRTC» (и без меня есть). Это пять ситуаций, в которых я споткнулся, обнаружил что-то интересное, и подумал «об этом стоит рассказать — другим пригодится». Поехали.

https://habr.com/ru/articles/1027876/

#robotstxt #SEO #WebRTC #Nextjs #IndexNow #sitemap #Googlebot #Cinema_DNA #синхронный_просмотр #рекомендательные_системы

#рекомендательные_системы #синхронный_просмотр #cinema_dna #googlebot #sitemap #indexnow

Habr @[email protected] · 2026-04-25 · 14:42 UTC

Пять неочевидных вещей, которые я узнал, запуская кино-соцсеть: от robots.txt-ловушки до 24-мерной математики вкуса

Последние полгода я работаю над VibeMuvik — кино-соцсетью с рецензиями, дебатами и синхронным просмотром фильмов. Одна из тех штук, которые «ну вроде несложно», пока не начинаешь копать. Эта статья — про неожиданные находки . Не про «как я выбрал стек» (скучно) и не про «туториал по WebRTC» (и без меня есть). Это пять ситуаций, в которых я споткнулся, обнаружил что-то интересное, и подумал «об этом стоит рассказать — другим пригодится». Поехали.

https://habr.com/ru/articles/1027876/

#robotstxt #SEO #WebRTC #Nextjs #IndexNow #sitemap #Googlebot #Cinema_DNA #синхронный_просмотр #рекомендательные_системы

#robotstxt #seo #webrtc #nextjs #indexnow #sitemap

PPC Land @[email protected] · 2026-04-03 · 20:40 UTC

FYI: Google rewrites Googlebot's rulebook: 2MB limits, IP moves, and what crawlers really are: Google today published two blog posts revealing Googlebot's true architecture as a shared SaaS platform, a 2MB fetch limit, and a new IP ranges directory path. https://ppc.land/google-rewrites-googlebots-rulebook-2mb-limits-ip-moves-and-what-crawlers-really-are/ #Googlebot #SEO #WebCrawlers #DigitalMarketing #SaaS

#googlebot #seo #webcrawlers #digitalmarketing #saas

PPC Land @[email protected] · 2026-04-01 · 20:39 UTC

ICYMI: Google rewrites Googlebot's rulebook: 2MB limits, IP moves, and what crawlers really are: Google today published two blog posts revealing Googlebot's true architecture as a shared SaaS platform, a 2MB fetch limit, and a new IP ranges directory path. https://ppc.land/google-rewrites-googlebots-rulebook-2mb-limits-ip-moves-and-what-crawlers-really-are/ #Google #Googlebot #SEO #Crawlers #WebMaster

#google #googlebot #seo #crawlers #webmaster

PPC Land @[email protected] · 2026-04-01 · 20:39 UTC

ICYMI: Google rewrites Googlebot's rulebook: 2MB limits, IP moves, and what crawlers really are: Google today published two blog posts revealing Googlebot's true architecture as a shared SaaS platform, a 2MB fetch limit, and a new IP ranges directory path. https://ppc.land/google-rewrites-googlebots-rulebook-2mb-limits-ip-moves-and-what-crawlers-really-are/ #Google #Googlebot #SEO #Crawlers #WebMaster

#webmaster #crawlers #seo #googlebot #google

PPC Land @[email protected] · 2026-04-01 · 20:39 UTC

ICYMI: Google rewrites Googlebot's rulebook: 2MB limits, IP moves, and what crawlers really are: Google today published two blog posts revealing Googlebot's true architecture as a shared SaaS platform, a 2MB fetch limit, and a new IP ranges directory path. https://ppc.land/google-rewrites-googlebots-rulebook-2mb-limits-ip-moves-and-what-crawlers-really-are/ #Google #Googlebot #SEO #Crawlers #WebMaster

#google #googlebot #seo #crawlers #webmaster

Thomas Steiner :chrome: @[email protected] · 2026-04-01 · 13:32 UTC

Inside Googlebot: demystifying crawling, fetching, and the bytes we process: https://developers.google.com/search/blog/2026/03/crawler-blog-post. A great post that clarifies how #Googlebot does its business for people in the #SEO community.

#googlebot #seo

PPC Land @[email protected] · 2026-03-31 · 20:38 UTC

Google rewrites Googlebot's rulebook: 2MB limits, IP moves, and what crawlers really are: Google today published two blog posts revealing Googlebot's true architecture as a shared SaaS platform, a 2MB fetch limit, and a new IP ranges directory path. https://ppc.land/google-rewrites-googlebots-rulebook-2mb-limits-ip-moves-and-what-crawlers-really-are/ #Google #Googlebot #SEO #WebCrawlers #DigitalMarketing

#google #googlebot #seo #webcrawlers #digitalmarketing

PPC Land @[email protected] · 2026-03-15 · 21:36 UTC

FYI: Googlebot is not a program - Google engineers finally explain what it really is: Google engineers reveal Googlebot is a misnomer for a central SaaS crawling platform serving dozens of products, with a 15 MB default file size limit and geo-crawling constraints. https://ppc.land/googlebot-is-not-a-program-google-engineers-finally-explain-what-it-really-is/ #Googlebot #SEO #WebCrawling #DigitalMarketing #SaaS

#googlebot #seo #webcrawling #digitalmarketing #saas

PPC Land @[email protected] · 2026-03-13 · 21:35 UTC

ICYMI: Googlebot is not a program - Google engineers finally explain what it really is: Google engineers reveal Googlebot is a misnomer for a central SaaS crawling platform serving dozens of products, with a 15 MB default file size limit and geo-crawling constraints. https://ppc.land/googlebot-is-not-a-program-google-engineers-finally-explain-what-it-really-is/ #Googlebot #SEO #Crawling #SaaS #DigitalMarketing

#googlebot #seo #crawling #saas #digitalmarketing

PPC Land @[email protected] · 2026-03-13 · 21:35 UTC

ICYMI: Googlebot is not a program - Google engineers finally explain what it really is: Google engineers reveal Googlebot is a misnomer for a central SaaS crawling platform serving dozens of products, with a 15 MB default file size limit and geo-crawling constraints. https://ppc.land/googlebot-is-not-a-program-google-engineers-finally-explain-what-it-really-is/ #Googlebot #SEO #Crawling #SaaS #DigitalMarketing

#googlebot #seo #crawling #saas #digitalmarketing

PPC Land @[email protected] · 2026-03-13 · 21:35 UTC

ICYMI: Googlebot is not a program - Google engineers finally explain what it really is: Google engineers reveal Googlebot is a misnomer for a central SaaS crawling platform serving dozens of products, with a 15 MB default file size limit and geo-crawling constraints. https://ppc.land/googlebot-is-not-a-program-google-engineers-finally-explain-what-it-really-is/ #Googlebot #SEO #Crawling #SaaS #DigitalMarketing

#digitalmarketing #saas #crawling #seo #googlebot

PPC Land @[email protected] · 2026-03-13 · 21:35 UTC

ICYMI: Googlebot is not a program - Google engineers finally explain what it really is: Google engineers reveal Googlebot is a misnomer for a central SaaS crawling platform serving dozens of products, with a 15 MB default file size limit and geo-crawling constraints. https://ppc.land/googlebot-is-not-a-program-google-engineers-finally-explain-what-it-really-is/ #Googlebot #SEO #Crawling #SaaS #DigitalMarketing

#googlebot #seo #crawling #saas #digitalmarketing

PPC Land @[email protected] · 2026-03-12 · 21:34 UTC

Googlebot is not a program - Google engineers finally explain what it really is: Google engineers reveal Googlebot is a misnomer for a central SaaS crawling platform serving dozens of products, with a 15 MB default file size limit and geo-crawling constraints. https://ppc.land/googlebot-is-not-a-program-google-engineers-finally-explain-what-it-really-is/ #Googlebot #SEO #WebCrawling #SaaS #DigitalMarketing

#googlebot #seo #webcrawling #saas #digitalmarketing

PPC Land @[email protected] · 2026-03-11 · 06:26 UTC

FYI: Google's secret crawl logic, finally explained in one page: Google published a new web crawling overview on March 3, 2026, detailing how Googlebot discovers, renders, and manages site access across 30+ years of web indexing. https://ppc.land/googles-secret-crawl-logic-finally-explained-in-one-page/ #Google #SEO #WebCrawling #Googlebot #DigitalMarketing

#google #seo #webcrawling #googlebot #digitalmarketing

PPC Land @[email protected] · 2026-03-09 · 06:25 UTC

ICYMI: Google's secret crawl logic, finally explained in one page: Google published a new web crawling overview on March 3, 2026, detailing how Googlebot discovers, renders, and manages site access across 30+ years of web indexing. https://ppc.land/googles-secret-crawl-logic-finally-explained-in-one-page/ #Google #SEO #WebCrawling #Googlebot #DigitalMarketing

#google #seo #webcrawling #googlebot #digitalmarketing

PPC Land @[email protected] · 2026-03-08 · 06:24 UTC

Google's secret crawl logic, finally explained in one page: Google published a new web crawling overview on March 3, 2026, detailing how Googlebot discovers, renders, and manages site access across 30+ years of web indexing. https://ppc.land/googles-secret-crawl-logic-finally-explained-in-one-page/ #Google #SEO #WebCrawling #Googlebot #DigitalMarketing

#google #seo #webcrawling #googlebot #digitalmarketing

Inautilo @[email protected] · 2026-02-10 · 20:31 UTC

#Development #Reports
Google lists Googlebot file limits · Do Google’s crawling limits affect your website? https://ilo.im/16adna

_____
#Business #Google #SearchEngine #Crawlers #Googlebot #Files #HTML #PDF #WebDev #Frontend

#development #reports #business #google #searchengine #crawlers

PPC Land @[email protected] · 2026-02-07 · 21:03 UTC

Testing tool simulates Google's 2MB HTML limit as SEO professionals assess crawling impact: Dave Smart added 2MB truncation feature to Tame the Bots fetch tool on February 6, enabling technical SEO professionals to simulate Googlebot's reduced file size limits. https://ppc.land/testing-tool-simulates-googles-2mb-html-limit-as-seo-professionals-assess-crawling-impact/ #SEO #GoogleBot #HTML #Crawling #DigitalMarketing

#seo #googlebot #html #crawling #digitalmarketing

Bagolina @[email protected] · 2025-12-31 · 16:05 UTC

Vers un #web toujours plus fragile https://siecledigital.fr/2025/12/31/etude-cloudflare-2025-un-web-plus-vaste-plus-automatise-et-plus-fragile
À eux seuls, les #bots représenteraient près de 30% du trafic web mondial, avec des pics capables de générer des volumes comparables à des attaques DDoS
#Googlebot est le #crawler dominant avec 4,5% des requêtes HTML
En 2025, le #smartphone s’impose avec environ 43% des utilisateurs mondiaux, contre 57% pour les ordinateurs. #Android domine largement le trafic mobile à l’échelle mondiale, tandis qu’#iOS conserve une position forte

#web #bots #googlebot #crawler #smartphone #android

Jack Yan (甄爵恩) @[email protected] · 2025-12-10 · 00:01 UTC

Why would #Googlebot go through #Hetzner?

#googlebot #hetzner

ppom @[email protected] · 2025-10-29 · 23:58 UTC

So, someone in the issue made me realize that some bots impersonate the user agents of big actors, such as Googlebot. I checked my webserver logs and found a lot of them actually!

I liked the challenge, so I just wrote an article about how to do this in less than 40 SLOC 🏆
https://reaction.ppom.me/filters/useragent-impersonators.html

#reactionrust #bots #badbots #google #googlebot

ppom @[email protected] · 2025-10-29 · 23:58 UTC

So, someone in the issue made me realize that some bots impersonate the user agents of big actors, such as Googlebot. I checked my webserver logs and found a lot of them actually!

I liked the challenge, so I just wrote an article about how to do this in less than 40 SLOC 🏆
https://reaction.ppom.me/filters/useragent-impersonators.html

#reactionrust #bots #badbots #google #googlebot

Djoerd Hiemstra 🍉 @[email protected] · 2025-10-18 · 08:35 UTC

@jackyan I suspect they created #GoogleOther to break the crawling / robots.txt / nettiquette rules without getting too many repurcusions on #GoogleBot.

#googleother #googlebot

Jack Yan (甄爵恩) @[email protected] · 2025-09-27 · 20:58 UTC

Is #Googlebot hacking? #Google

#googlebot #google

JdeBP @[email protected] · 2025-09-13 · 22:43 UTC

@cks

Early results are not promising. I've had a handful of HEAD requests in the past day. Only 2 appear legitimate, in that they hit genuine page URLs. The others were attempts to exploit WordPress vulnerabilities.

#HTTP #httpd #GoogleBot #djbwares #WordPress

#http #httpd #googlebot #djbwares #wordpress

JdeBP @[email protected] · 2025-09-13 · 22:43 UTC

@cks

Early results are not promising. I've had a handful of HEAD requests in the past day. Only 2 appear legitimate, in that they hit genuine page URLs. The others were attempts to exploit WordPress vulnerabilities.

#HTTP #httpd #GoogleBot #djbwares #WordPress

#http #httpd #googlebot #djbwares #wordpress

JdeBP @[email protected] · 2025-09-13 · 22:43 UTC

@cks

Early results are not promising. I've had a handful of HEAD requests in the past day. Only 2 appear legitimate, in that they hit genuine page URLs. The others were attempts to exploit WordPress vulnerabilities.

#HTTP #httpd #GoogleBot #djbwares #WordPress

#http #httpd #googlebot #djbwares #wordpress

JdeBP @[email protected] · 2025-09-12 · 11:03 UTC

@cks

It makes me think that there's one well-behaved 'bot drowned in a sea of ill-behaved ones.

I'm just instrumenting #djbwares httpd to log GET and HEAD differently. I wonder what I'll see.

#HTTP #httpd #GoogleBot

#djbwares #http #httpd #googlebot

JdeBP @[email protected] · 2025-09-12 · 11:03 UTC

@cks

It makes me think that there's one well-behaved 'bot drowned in a sea of ill-behaved ones.

I'm just instrumenting #djbwares httpd to log GET and HEAD differently. I wonder what I'll see.

#HTTP #httpd #GoogleBot

#djbwares #http #httpd #googlebot

JdeBP @[email protected] · 2025-09-12 · 11:03 UTC

@cks

It makes me think that there's one well-behaved 'bot drowned in a sea of ill-behaved ones.

I'm just instrumenting #djbwares httpd to log GET and HEAD differently. I wonder what I'll see.

#HTTP #httpd #GoogleBot

#djbwares #http #httpd #googlebot

JdeBP @[email protected] · 2025-09-12 · 11:03 UTC

@cks

It makes me think that there's one well-behaved 'bot drowned in a sea of ill-behaved ones.

I'm just instrumenting #djbwares httpd to log GET and HEAD differently. I wonder what I'll see.

#HTTP #httpd #GoogleBot

#googlebot #httpd #http #djbwares

JdeBP @[email protected] · 2025-09-12 · 11:03 UTC

@cks

It makes me think that there's one well-behaved 'bot drowned in a sea of ill-behaved ones.

I'm just instrumenting #djbwares httpd to log GET and HEAD differently. I wonder what I'll see.

#HTTP #httpd #GoogleBot

#djbwares #http #httpd #googlebot

JdeBP @[email protected] · 2025-09-11 · 06:43 UTC

@cks

Is it doing straightforward GETs? Or is it doing HEAD? Or using If-Modified-Since?

#HTTP #GoogleBot #httpd

#http #googlebot #httpd

Habr @[email protected] · 2024-08-20 · 09:32 UTC

[Перевод] Как Google обрабатывает JavaScript в процессе индексации веб-страниц

Понимание того, как поисковые системы изучают, рендерят и индексируют веб-страницы, имеет решающее значение для оптимизации сайтов под поисковые системы. По мере изменений в работе поисковых систем (например, Google), отслеживать, что работает, а что нет, становится все сложнее, особенно в случае с клиентским JS.

https://habr.com/ru/companies/timeweb/articles/836866/

#timeweb_статьи_перевод #javascript #seo #googlebot #поисковая_оптимизация #индексация #indexing #crawling

#crawling #indexing #индексация #поисковая_оптимизация #googlebot #seo

Habr @[email protected] · 2024-08-20 · 09:32 UTC

[Перевод] Как Google обрабатывает JavaScript в процессе индексации веб-страниц

Понимание того, как поисковые системы изучают, рендерят и индексируют веб-страницы, имеет решающее значение для оптимизации сайтов под поисковые системы. По мере изменений в работе поисковых систем (например, Google), отслеживать, что работает, а что нет, становится все сложнее, особенно в случае с клиентским JS.

https://habr.com/ru/companies/timeweb/articles/836866/

#timeweb_статьи_перевод #javascript #seo #googlebot #поисковая_оптимизация #индексация #indexing #crawling

#crawling #indexing #индексация #поисковая_оптимизация #googlebot #seo

Habr @[email protected] · 2024-08-20 · 09:32 UTC

[Перевод] Как Google обрабатывает JavaScript в процессе индексации веб-страниц

Понимание того, как поисковые системы изучают, рендерят и индексируют веб-страницы, имеет решающее значение для оптимизации сайтов под поисковые системы. По мере изменений в работе поисковых систем (например, Google), отслеживать, что работает, а что нет, становится все сложнее, особенно в случае с клиентским JS.

https://habr.com/ru/companies/timeweb/articles/836866/

#timeweb_статьи_перевод #javascript #seo #googlebot #поисковая_оптимизация #индексация #indexing #crawling

#timeweb_статьи_перевод #javascript #seo #googlebot #поисковая_оптимизация #индексация

Inautilo @[email protected] · 2024-03-04 · 02:05 UTC

#Business #Explainers
How Google Search crawls pages · Google’s web crawling process demystified in videos https://ilo.im/15y4qr

_____
#Google #Googlebot #SearchEngine #SEO #TechnicalSEO #Website #Content

#business #explainers #google #googlebot #searchengine #seo