#fail-over — Public Fediverse posts on home.social

Paolo Valsecchi @nolabnoparty · 2026-05-12 · 13:16 UTC

[ Blog ] Veeam High Availability Cluster: #failover and automation - pt.2

Once the Veeam High Availability #Cluster has been created, there are two ways to perform the failover: manual and automated through Veeam ONE.

Manual failover does not require additional components to be installed, whereas automatic failover requires Veeam ONE to be installed and configured within your http://rviv.ly/yhouMN #veeamone

#veeamone #cluster #failover

2meterdba | Reitse Eskens @[email protected] · 2026-05-12 · 07:57 UTC

Blog alert!

This time on checking your options when a cloud service goes down. Are you aware of what happens in your cloud?

#Redundancy
#BusinessContinuity
#FailOver
#HADR

http://sqlreitse.com/2026/05/12/remember-your-redundancy/

#hadr #failover #businesscontinuity #redundancy

2meterdba | Reitse Eskens @[email protected] · 2026-05-12 · 07:57 UTC

Blog alert!

This time on checking your options when a cloud service goes down. Are you aware of what happens in your cloud?

#Redundancy
#BusinessContinuity
#FailOver
#HADR

http://sqlreitse.com/2026/05/12/remember-your-redundancy/

#hadr #failover #businesscontinuity #redundancy

2meterdba | Reitse Eskens @[email protected] · 2026-05-12 · 07:57 UTC

Blog alert!

This time on checking your options when a cloud service goes down. Are you aware of what happens in your cloud?

#Redundancy
#BusinessContinuity
#FailOver
#HADR

http://sqlreitse.com/2026/05/12/remember-your-redundancy/

#redundancy #businesscontinuity #failover #hadr

2meterdba | Reitse Eskens @[email protected] · 2026-05-12 · 07:57 UTC

Blog alert!

This time on checking your options when a cloud service goes down. Are you aware of what happens in your cloud?

#Redundancy
#BusinessContinuity
#FailOver
#HADR

http://sqlreitse.com/2026/05/12/remember-your-redundancy/

#hadr #failover #businesscontinuity #redundancy

Habr @[email protected] · 2026-05-06 · 08:32 UTC

Строим шину данных для микросервисов на ZeroMQ: failover, гарантии доставки и E2E-шифрование

Асинхронная клиент-серверная библиотека для обмена сообщениями между микросервисами на базе ZeroMQ. Реализует гарантированную доставку сообщений (At-Least-Once) с персистентной файловой очередью при обрывах связи, автоматический failover сервера переадресации (клиенты могут подхватывать роль сервера на лету) и два уровня защиты: шифрование канала (CurveZMQ) и сквозное шифрование сообщений (HMAC). Лёгкая альтернатива брокерам вроде RabbitMQ, не требующая отдельного сервера.

https://habr.com/ru/articles/1030020/

#python #zeromq #zmq #failover #atleastonce #endtoend_шифрование #микросервисы #распределенные_системы #hmac #криптография

#криптография #hmac #распределенные_системы #микросервисы #endtoend_шифрование #atleastonce

Habr @[email protected] · 2026-05-06 · 08:32 UTC

Строим шину данных для микросервисов на ZeroMQ: failover, гарантии доставки и E2E-шифрование

Асинхронная клиент-серверная библиотека для обмена сообщениями между микросервисами на базе ZeroMQ. Реализует гарантированную доставку сообщений (At-Least-Once) с персистентной файловой очередью при обрывах связи, автоматический failover сервера переадресации (клиенты могут подхватывать роль сервера на лету) и два уровня защиты: шифрование канала (CurveZMQ) и сквозное шифрование сообщений (HMAC). Лёгкая альтернатива брокерам вроде RabbitMQ, не требующая отдельного сервера.

https://habr.com/ru/articles/1030020/

#python #zeromq #zmq #failover #atleastonce #endtoend_шифрование #микросервисы #распределенные_системы #hmac #криптография

#криптография #hmac #распределенные_системы #микросервисы #endtoend_шифрование #atleastonce

Habr @[email protected] · 2026-05-06 · 08:32 UTC

Строим шину данных для микросервисов на ZeroMQ: failover, гарантии доставки и E2E-шифрование

Асинхронная клиент-серверная библиотека для обмена сообщениями между микросервисами на базе ZeroMQ. Реализует гарантированную доставку сообщений (At-Least-Once) с персистентной файловой очередью при обрывах связи, автоматический failover сервера переадресации (клиенты могут подхватывать роль сервера на лету) и два уровня защиты: шифрование канала (CurveZMQ) и сквозное шифрование сообщений (HMAC). Лёгкая альтернатива брокерам вроде RabbitMQ, не требующая отдельного сервера.

https://habr.com/ru/articles/1030020/

#python #zeromq #zmq #failover #atleastonce #endtoend_шифрование #микросервисы #распределенные_системы #hmac #криптография

#криптография #hmac #распределенные_системы #микросервисы #endtoend_шифрование #atleastonce

Habr @[email protected] · 2026-05-06 · 08:32 UTC

Строим шину данных для микросервисов на ZeroMQ: failover, гарантии доставки и E2E-шифрование

Асинхронная клиент-серверная библиотека для обмена сообщениями между микросервисами на базе ZeroMQ. Реализует гарантированную доставку сообщений (At-Least-Once) с персистентной файловой очередью при обрывах связи, автоматический failover сервера переадресации (клиенты могут подхватывать роль сервера на лету) и два уровня защиты: шифрование канала (CurveZMQ) и сквозное шифрование сообщений (HMAC). Лёгкая альтернатива брокерам вроде RabbitMQ, не требующая отдельного сервера.

https://habr.com/ru/articles/1030020/

#python #zeromq #zmq #failover #atleastonce #endtoend_шифрование #микросервисы #распределенные_системы #hmac #криптография

#python #zeromq #zmq #failover #atleastonce #endtoend_шифрование

IT Horror Stories Podcast @[email protected] · 2026-04-28 · 10:50 UTC

Turns out, failover success is subjective. Apparently, being ‘active’ just means you get tested harder. Ever wondered how ‘best intentions’ can invent new incidents? Let’s talk IT wisdom in the replies.

Find out more in Episode 12 : The Failover That Failed Successfully

https://youtube.com/shorts/L3s3K2E4-1I

Listen here : https://ithorrorstories.eu/#ep12

All other things : https://links.ithorrorstories.eu/

#podcasts #failover #drtest #disaster #technology #operationalreadyness #tech

#podcasts #failover #drtest #disaster #technology #operationalreadyness

IT Horror Stories Podcast @[email protected] · 2026-04-28 · 10:50 UTC

Turns out, failover success is subjective. Apparently, being ‘active’ just means you get tested harder. Ever wondered how ‘best intentions’ can invent new incidents? Let’s talk IT wisdom in the replies.

Find out more in Episode 12 : The Failover That Failed Successfully

https://youtube.com/shorts/L3s3K2E4-1I

Listen here : https://ithorrorstories.eu/#ep12

All other things : https://links.ithorrorstories.eu/

#podcasts #failover #drtest #disaster #technology #operationalreadyness #tech

#podcasts #failover #drtest #disaster #technology #operationalreadyness

IT Horror Stories Podcast @[email protected] · 2026-04-28 · 10:50 UTC

Turns out, failover success is subjective. Apparently, being ‘active’ just means you get tested harder. Ever wondered how ‘best intentions’ can invent new incidents? Let’s talk IT wisdom in the replies.

Find out more in Episode 12 : The Failover That Failed Successfully

https://youtube.com/shorts/L3s3K2E4-1I

Listen here : https://ithorrorstories.eu/#ep12

All other things : https://links.ithorrorstories.eu/

#podcasts #failover #drtest #disaster #technology #operationalreadyness #tech

#podcasts #failover #drtest #disaster #technology #operationalreadyness

IT Horror Stories Podcast @[email protected] · 2026-04-28 · 10:50 UTC

Turns out, failover success is subjective. Apparently, being ‘active’ just means you get tested harder. Ever wondered how ‘best intentions’ can invent new incidents? Let’s talk IT wisdom in the replies.

Find out more in Episode 12 : The Failover That Failed Successfully

https://youtube.com/shorts/L3s3K2E4-1I

Listen here : https://ithorrorstories.eu/#ep12

All other things : https://links.ithorrorstories.eu/

#podcasts #failover #drtest #disaster #technology #operationalreadyness #tech

#tech #operationalreadyness #technology #disaster #drtest #failover

IT Horror Stories Podcast @ithorrorstories · 2026-04-28 · 10:50 UTC

Turns out, failover success is subjective. Apparently, being ‘active’ just means you get tested harder. Ever wondered how ‘best intentions’ can invent new incidents? Let’s talk IT wisdom in the replies.

Find out more in Episode 12 : The Failover That Failed Successfully

https://youtube.com/shorts/L3s3K2E4-1I

Listen here : https://ithorrorstories.eu/#ep12

All other things : https://links.ithorrorstories.eu/

#podcasts #failover #drtest #disaster #technology #operationalreadyness #tech

#podcasts #failover #drtest #disaster #technology #operationalreadyness

IT Horror Stories Podcast @[email protected] · 2026-04-16 · 09:33 UTC

Ever run a failover test that worked perfectly… and still felt like everything was falling apart?

In Episode 12, we take you into a disaster recovery test during a busy release weekend — where the tech held up, but communication didn’t.

Subcontractors weren’t aligned, assumptions didn’t match reality, and suddenly a ‘simple test’ turned into a full coordination puzzle.

No production impact — but plenty of lessons.

Because resilience isn’t just about systems… it’s about people, timing, and actually talking to each other.

Listen now to IT Horror Stories with Jack Smith
You can find us on Spotify, Apple Music, Youtube, Deezer and of course at ITHorrorStories.eu

You are one of us.

https://youtube.com/shorts/k_SyFbQ71TU

#podcast #technology #failover #failure #techlife

IT Horror Stories Podcast @[email protected] · 2026-04-16 · 09:33 UTC

Ever run a failover test that worked perfectly… and still felt like everything was falling apart?

In Episode 12, we take you into a disaster recovery test during a busy release weekend — where the tech held up, but communication didn’t.

Subcontractors weren’t aligned, assumptions didn’t match reality, and suddenly a ‘simple test’ turned into a full coordination puzzle.

No production impact — but plenty of lessons.

Because resilience isn’t just about systems… it’s about people, timing, and actually talking to each other.

Listen now to IT Horror Stories with Jack Smith
You can find us on Spotify, Apple Music, Youtube, Deezer and of course at ITHorrorStories.eu

You are one of us.

https://youtube.com/shorts/k_SyFbQ71TU

#podcast #technology #failover #failure #techlife

IT Horror Stories Podcast @[email protected] · 2026-04-16 · 09:33 UTC

Ever run a failover test that worked perfectly… and still felt like everything was falling apart?

In Episode 12, we take you into a disaster recovery test during a busy release weekend — where the tech held up, but communication didn’t.

Subcontractors weren’t aligned, assumptions didn’t match reality, and suddenly a ‘simple test’ turned into a full coordination puzzle.

No production impact — but plenty of lessons.

Because resilience isn’t just about systems… it’s about people, timing, and actually talking to each other.

Listen now to IT Horror Stories with Jack Smith
You can find us on Spotify, Apple Music, Youtube, Deezer and of course at ITHorrorStories.eu

You are one of us.

https://youtube.com/shorts/k_SyFbQ71TU

#podcast #technology #failover #failure #techlife

IT Horror Stories Podcast @[email protected] · 2026-04-16 · 09:33 UTC

Ever run a failover test that worked perfectly… and still felt like everything was falling apart?

In Episode 12, we take you into a disaster recovery test during a busy release weekend — where the tech held up, but communication didn’t.

Subcontractors weren’t aligned, assumptions didn’t match reality, and suddenly a ‘simple test’ turned into a full coordination puzzle.

No production impact — but plenty of lessons.

Because resilience isn’t just about systems… it’s about people, timing, and actually talking to each other.

Listen now to IT Horror Stories with Jack Smith
You can find us on Spotify, Apple Music, Youtube, Deezer and of course at ITHorrorStories.eu

You are one of us.

https://youtube.com/shorts/k_SyFbQ71TU

#podcast #technology #failover #failure #techlife

#techlife #failure #failover #technology #podcast

IT Horror Stories Podcast @ithorrorstories · 2026-04-16 · 09:33 UTC

Ever run a failover test that worked perfectly… and still felt like everything was falling apart?

In Episode 12, we take you into a disaster recovery test during a busy release weekend — where the tech held up, but communication didn’t.

Subcontractors weren’t aligned, assumptions didn’t match reality, and suddenly a ‘simple test’ turned into a full coordination puzzle.

No production impact — but plenty of lessons.

Because resilience isn’t just about systems… it’s about people, timing, and actually talking to each other.

Listen now to IT Horror Stories with Jack Smith
You can find us on Spotify, Apple Music, Youtube, Deezer and of course at ITHorrorStories.eu

You are one of us.

https://youtube.com/shorts/k_SyFbQ71TU

#podcast #technology #failover #failure #techlife

Isadora @[email protected] · 2026-04-06 · 22:05 UTC

For those who run #ProsodyIM as #xmpp server, I did something simple but effective in my failover architecture:

2 Prosody instances in two different regions in a datacenter
lsyncd syncing from primary to stand by instance all data
an entrypoint script supervising Prosody execution
a lock file controlling if entrypoint script can up Prosody
a daemon checking if floating ip is linked to hosts and controlling the lock file and the lsyncd execution and configuration to primary/standby modes

Perfect solution? Of course not.
Effective solution? Hell yeah.

:isacloud: :isacloudim:

#xmpp #failover #container #vrrp #prosodyim #prosodyim

#prosodyim #xmpp #failover #container #vrrp

Isadora @[email protected] · 2026-04-06 · 22:05 UTC

For those who run #ProsodyIM as #xmpp server, I did something simple but effective in my failover architecture:

2 Prosody instances in two different regions in a datacenter
lsyncd syncing from primary to stand by instance all data
an entrypoint script supervising Prosody execution
a lock file controlling if entrypoint script can up Prosody
a daemon checking if floating ip is linked to hosts and controlling the lock file and the lsyncd execution and configuration to primary/standby modes

Perfect solution? Of course not.
Effective solution? Hell yeah.

:isacloud: :isacloudim:

#xmpp #failover #container #vrrp #prosodyim #prosodyim

#prosodyim #xmpp #failover #container #vrrp

Isadora @[email protected] · 2026-04-06 · 22:05 UTC

For those who run #ProsodyIM as #xmpp server, I did something simple but effective in my failover architecture:

2 Prosody instances in two different regions in a datacenter
lsyncd syncing from primary to stand by instance all data
an entrypoint script supervising Prosody execution
a lock file controlling if entrypoint script can up Prosody
a daemon checking if floating ip is linked to hosts and controlling the lock file and the lsyncd execution and configuration to primary/standby modes

Perfect solution? Of course not.
Effective solution? Hell yeah.

:isacloud: :isacloudim:

#xmpp #failover #container #vrrp #prosodyim #prosodyim

#prosodyim #xmpp #failover #container #vrrp

Isadora @[email protected] · 2026-04-06 · 22:05 UTC

For those who run #ProsodyIM as #xmpp server, I did something simple but effective in my failover architecture:

2 Prosody instances in two different regions in a datacenter
lsyncd syncing from primary to stand by instance all data
an entrypoint script supervising Prosody execution
a lock file controlling if entrypoint script can up Prosody
a daemon checking if floating ip is linked to hosts and controlling the lock file and the lsyncd execution and configuration to primary/standby modes

Perfect solution? Of course not.
Effective solution? Hell yeah.

:isacloud: :isacloudim:

#xmpp #failover #container #vrrp #prosodyim #prosodyim

#vrrp #container #failover #xmpp #prosodyim

Isadora @[email protected] · 2026-04-06 · 22:05 UTC

For those who run #ProsodyIM as #xmpp server, I did something simple but effective in my failover architecture:

2 Prosody instances in two different regions in a datacenter
lsyncd syncing from primary to stand by instance all data
an entrypoint script supervising Prosody execution
a lock file controlling if entrypoint script can up Prosody
a daemon checking if floating ip is linked to hosts and controlling the lock file and the lsyncd execution and configuration to primary/standby modes

Perfect solution? Of course not.
Effective solution? Hell yeah.

:isacloud: :isacloudim:

#xmpp #failover #container #vrrp #prosodyim #prosodyim

#prosodyim #xmpp #failover #container #vrrp

IT Horror Stories Podcast @[email protected] · 2026-04-06 · 18:11 UTC

During a busy release weekend, a planned failover exposed not technical flaws, but something more familiar: misaligned teams, unclear responsibilities, and communication that didn’t quite arrive when it should have. Production stayed safe — but confidence took a hit.

This episode dives into how a “simple test” turned into a coordination challenge, and why resilience is just as much about people and processes as it is about systems.

Find all links to listen on our website : https://ithorrorstories.eu/#ep12

#podcast #datarecovery #failover #test #technology

You can find our podcast on :

Spotify : https://open.spotify.com/show/7LqbtykS0IQctSCucvQVHW
Apple Music : https://podcasts.apple.com/us/podcast/it-horror-stories-with-jack-smith/id1812612272
YouTube : https://music.youtube.com/playlist?list=PL9A9yzpnkOdVQvmFjgTsZRrE-zDCuIVcX
Deezer : https://link.deezer.com/s/30dyH3RoKvN8N24zgsbhj

#podcast #datarecovery #failover #test #technology

IT Horror Stories Podcast @[email protected] · 2026-04-06 · 18:11 UTC

During a busy release weekend, a planned failover exposed not technical flaws, but something more familiar: misaligned teams, unclear responsibilities, and communication that didn’t quite arrive when it should have. Production stayed safe — but confidence took a hit.

This episode dives into how a “simple test” turned into a coordination challenge, and why resilience is just as much about people and processes as it is about systems.

Find all links to listen on our website : https://ithorrorstories.eu/#ep12

#podcast #datarecovery #failover #test #technology

You can find our podcast on :

Spotify : https://open.spotify.com/show/7LqbtykS0IQctSCucvQVHW
Apple Music : https://podcasts.apple.com/us/podcast/it-horror-stories-with-jack-smith/id1812612272
YouTube : https://music.youtube.com/playlist?list=PL9A9yzpnkOdVQvmFjgTsZRrE-zDCuIVcX
Deezer : https://link.deezer.com/s/30dyH3RoKvN8N24zgsbhj

#podcast #datarecovery #failover #test #technology

IT Horror Stories Podcast @[email protected] · 2026-04-06 · 18:11 UTC

During a busy release weekend, a planned failover exposed not technical flaws, but something more familiar: misaligned teams, unclear responsibilities, and communication that didn’t quite arrive when it should have. Production stayed safe — but confidence took a hit.

This episode dives into how a “simple test” turned into a coordination challenge, and why resilience is just as much about people and processes as it is about systems.

Find all links to listen on our website : https://ithorrorstories.eu/#ep12

#podcast #datarecovery #failover #test #technology

You can find our podcast on :

Spotify : https://open.spotify.com/show/7LqbtykS0IQctSCucvQVHW
Apple Music : https://podcasts.apple.com/us/podcast/it-horror-stories-with-jack-smith/id1812612272
YouTube : https://music.youtube.com/playlist?list=PL9A9yzpnkOdVQvmFjgTsZRrE-zDCuIVcX
Deezer : https://link.deezer.com/s/30dyH3RoKvN8N24zgsbhj

#podcast #datarecovery #failover #test #technology

IT Horror Stories Podcast @[email protected] · 2026-04-06 · 18:11 UTC

During a busy release weekend, a planned failover exposed not technical flaws, but something more familiar: misaligned teams, unclear responsibilities, and communication that didn’t quite arrive when it should have. Production stayed safe — but confidence took a hit.

This episode dives into how a “simple test” turned into a coordination challenge, and why resilience is just as much about people and processes as it is about systems.

Find all links to listen on our website : https://ithorrorstories.eu/#ep12

#podcast #datarecovery #failover #test #technology

You can find our podcast on :

Spotify : https://open.spotify.com/show/7LqbtykS0IQctSCucvQVHW
Apple Music : https://podcasts.apple.com/us/podcast/it-horror-stories-with-jack-smith/id1812612272
YouTube : https://music.youtube.com/playlist?list=PL9A9yzpnkOdVQvmFjgTsZRrE-zDCuIVcX
Deezer : https://link.deezer.com/s/30dyH3RoKvN8N24zgsbhj

#technology #test #failover #datarecovery #podcast

IT Horror Stories Podcast @ithorrorstories · 2026-04-06 · 18:11 UTC

During a busy release weekend, a planned failover exposed not technical flaws, but something more familiar: misaligned teams, unclear responsibilities, and communication that didn’t quite arrive when it should have. Production stayed safe — but confidence took a hit.

This episode dives into how a “simple test” turned into a coordination challenge, and why resilience is just as much about people and processes as it is about systems.

Find all links to listen on our website : https://ithorrorstories.eu/#ep12

#podcast #datarecovery #failover #test #technology

You can find our podcast on :

Spotify : https://open.spotify.com/show/7LqbtykS0IQctSCucvQVHW
Apple Music : https://podcasts.apple.com/us/podcast/it-horror-stories-with-jack-smith/id1812612272
YouTube : https://music.youtube.com/playlist?list=PL9A9yzpnkOdVQvmFjgTsZRrE-zDCuIVcX
Deezer : https://link.deezer.com/s/30dyH3RoKvN8N24zgsbhj

#podcast #datarecovery #failover #test #technology

Habr @[email protected] · 2026-04-02 · 14:32 UTC

[Перевод] Осваиваем replication slots в Postgres: как предотвратить разрастание WAL и другие проблемы в продакшене

Логическая репликация в Postgres редко ломает прод внезапно — чаще она долго и методично копит проблему, пока replication slot удерживает всё больше WAL, потребитель отстаёт, а свободное место на диске начинает таять. В этой статье разбирается именно такая зона риска: как устроена работа replication slots, почему одних базовых настроек здесь недостаточно и какие практики реально помогают держать под контролем WAL, публикации, heartbeats, failover и мониторинг. Материал особенно полезен тем, кто работает с CDC, Debezium и production-инстансами Postgres, где цена ошибки измеряется уже не теорией, а стабильностью системы. Разбор PostgreSQL

https://habr.com/ru/companies/otus/articles/1018444/

#PostgreSQL #replication_slots #логическая_репликация #WAL #CDC #Debezium #pgoutput #failover #мониторинг_Postgres

#failover #pgoutput #debezium #cdc #wal #логическая_репликация

Habr @[email protected] · 2026-04-02 · 14:32 UTC

[Перевод] Осваиваем replication slots в Postgres: как предотвратить разрастание WAL и другие проблемы в продакшене

Логическая репликация в Postgres редко ломает прод внезапно — чаще она долго и методично копит проблему, пока replication slot удерживает всё больше WAL, потребитель отстаёт, а свободное место на диске начинает таять. В этой статье разбирается именно такая зона риска: как устроена работа replication slots, почему одних базовых настроек здесь недостаточно и какие практики реально помогают держать под контролем WAL, публикации, heartbeats, failover и мониторинг. Материал особенно полезен тем, кто работает с CDC, Debezium и production-инстансами Postgres, где цена ошибки измеряется уже не теорией, а стабильностью системы. Разбор PostgreSQL

https://habr.com/ru/companies/otus/articles/1018444/

#PostgreSQL #replication_slots #логическая_репликация #WAL #CDC #Debezium #pgoutput #failover #мониторинг_Postgres

#мониторинг_postgres #failover #pgoutput #debezium #cdc #wal

Habr @[email protected] · 2026-04-02 · 14:32 UTC

[Перевод] Осваиваем replication slots в Postgres: как предотвратить разрастание WAL и другие проблемы в продакшене

Логическая репликация в Postgres редко ломает прод внезапно — чаще она долго и методично копит проблему, пока replication slot удерживает всё больше WAL, потребитель отстаёт, а свободное место на диске начинает таять. В этой статье разбирается именно такая зона риска: как устроена работа replication slots, почему одних базовых настроек здесь недостаточно и какие практики реально помогают держать под контролем WAL, публикации, heartbeats, failover и мониторинг. Материал особенно полезен тем, кто работает с CDC, Debezium и production-инстансами Postgres, где цена ошибки измеряется уже не теорией, а стабильностью системы. Разбор PostgreSQL

https://habr.com/ru/companies/otus/articles/1018444/

#PostgreSQL #replication_slots #логическая_репликация #WAL #CDC #Debezium #pgoutput #failover #мониторинг_Postgres

#мониторинг_postgres #failover #pgoutput #debezium #cdc #wal

Habr @[email protected] · 2026-04-02 · 14:32 UTC

[Перевод] Осваиваем replication slots в Postgres: как предотвратить разрастание WAL и другие проблемы в продакшене

Логическая репликация в Postgres редко ломает прод внезапно — чаще она долго и методично копит проблему, пока replication slot удерживает всё больше WAL, потребитель отстаёт, а свободное место на диске начинает таять. В этой статье разбирается именно такая зона риска: как устроена работа replication slots, почему одних базовых настроек здесь недостаточно и какие практики реально помогают держать под контролем WAL, публикации, heartbeats, failover и мониторинг. Материал особенно полезен тем, кто работает с CDC, Debezium и production-инстансами Postgres, где цена ошибки измеряется уже не теорией, а стабильностью системы. Разбор PostgreSQL

https://habr.com/ru/companies/otus/articles/1018444/

#PostgreSQL #replication_slots #логическая_репликация #WAL #CDC #Debezium #pgoutput #failover #мониторинг_Postgres

#postgresql #replication_slots #логическая_репликация #wal #cdc #debezium

Habr @[email protected] · 2026-04-01 · 08:12 UTC

Мониторинг SQL Server Always On в Zabbix

Если у вас стоит Always On Availability Groups, вы наверняка бывали в такой ситуации: в SSMS всё зелёное, дашборд показывает «Synchronized», а пользователи звонят с жалобами на тормоза. Смотришь на secondary — а там redo_queue_size 600 МБ, реплика отстаёт на полчаса. Ни одного алерта. У нас это случилось на продуктивном кластере с 1С: secondary молча отвалился в SYNCHRONIZING, а мы узнали только при плановом переключении. Полтора часа redo queue. Стало понятно, что встроенный дашборд SSMS — это не мониторинг. Дальше — как мы это закрыли Zabbix'ом за вечер.

https://habr.com/ru/companies/cloud4y/articles/1017578/

#SQL_Server #Always_On #Zabbix #мониторинг #DMV #WSFC #кворум #failover #DBA

#dba #failover #кворум #wsfc #dmv #мониторинг

Habr @[email protected] · 2026-04-01 · 08:12 UTC

Мониторинг SQL Server Always On в Zabbix

Если у вас стоит Always On Availability Groups, вы наверняка бывали в такой ситуации: в SSMS всё зелёное, дашборд показывает «Synchronized», а пользователи звонят с жалобами на тормоза. Смотришь на secondary — а там redo_queue_size 600 МБ, реплика отстаёт на полчаса. Ни одного алерта. У нас это случилось на продуктивном кластере с 1С: secondary молча отвалился в SYNCHRONIZING, а мы узнали только при плановом переключении. Полтора часа redo queue. Стало понятно, что встроенный дашборд SSMS — это не мониторинг. Дальше — как мы это закрыли Zabbix'ом за вечер.

https://habr.com/ru/companies/cloud4y/articles/1017578/

#SQL_Server #Always_On #Zabbix #мониторинг #DMV #WSFC #кворум #failover #DBA

#dba #failover #кворум #wsfc #dmv #мониторинг

Habr @[email protected] · 2026-04-01 · 08:12 UTC

Мониторинг SQL Server Always On в Zabbix

Если у вас стоит Always On Availability Groups, вы наверняка бывали в такой ситуации: в SSMS всё зелёное, дашборд показывает «Synchronized», а пользователи звонят с жалобами на тормоза. Смотришь на secondary — а там redo_queue_size 600 МБ, реплика отстаёт на полчаса. Ни одного алерта. У нас это случилось на продуктивном кластере с 1С: secondary молча отвалился в SYNCHRONIZING, а мы узнали только при плановом переключении. Полтора часа redo queue. Стало понятно, что встроенный дашборд SSMS — это не мониторинг. Дальше — как мы это закрыли Zabbix'ом за вечер.

https://habr.com/ru/companies/cloud4y/articles/1017578/

#SQL_Server #Always_On #Zabbix #мониторинг #DMV #WSFC #кворум #failover #DBA

#dba #failover #кворум #wsfc #dmv #мониторинг

Habr @[email protected] · 2026-04-01 · 08:12 UTC

Мониторинг SQL Server Always On в Zabbix

Если у вас стоит Always On Availability Groups, вы наверняка бывали в такой ситуации: в SSMS всё зелёное, дашборд показывает «Synchronized», а пользователи звонят с жалобами на тормоза. Смотришь на secondary — а там redo_queue_size 600 МБ, реплика отстаёт на полчаса. Ни одного алерта. У нас это случилось на продуктивном кластере с 1С: secondary молча отвалился в SYNCHRONIZING, а мы узнали только при плановом переключении. Полтора часа redo queue. Стало понятно, что встроенный дашборд SSMS — это не мониторинг. Дальше — как мы это закрыли Zabbix'ом за вечер.

https://habr.com/ru/companies/cloud4y/articles/1017578/

#SQL_Server #Always_On #Zabbix #мониторинг #DMV #WSFC #кворум #failover #DBA

#sql_server #always_on #zabbix #мониторинг #dmv #wsfc

Hacker News @[email protected] · 2026-03-16 · 09:04 UTC

Starlink Mini as a Failover

https://www.jackpearce.co.uk/posts/starlink-failover/

#HackerNews #Starlink #Mini #Failover #SpaceTech #Connectivity #InternetSolutions

#hackernews #starlink #mini #failover #spacetech #connectivity

Hacker News @[email protected] · 2026-03-16 · 09:04 UTC

Starlink Mini as a Failover

https://www.jackpearce.co.uk/posts/starlink-failover/

#HackerNews #Starlink #Mini #Failover #SpaceTech #Connectivity #InternetSolutions

#hackernews #starlink #mini #failover #spacetech #connectivity

Hacker News @[email protected] · 2026-03-16 · 09:04 UTC

Starlink Mini as a Failover

https://www.jackpearce.co.uk/posts/starlink-failover/

#HackerNews #Starlink #Mini #Failover #SpaceTech #Connectivity #InternetSolutions

#hackernews #starlink #mini #failover #spacetech #connectivity

Hacker News @[email protected] · 2026-03-16 · 09:04 UTC

Starlink Mini as a Failover

https://www.jackpearce.co.uk/posts/starlink-failover/

#HackerNews #Starlink #Mini #Failover #SpaceTech #Connectivity #InternetSolutions

#internetsolutions #connectivity #spacetech #failover #mini #starlink

Hacker News @[email protected] · 2026-03-16 · 09:04 UTC

Starlink Mini as a Failover

https://www.jackpearce.co.uk/posts/starlink-failover/

#HackerNews #Starlink #Mini #Failover #SpaceTech #Connectivity #InternetSolutions

#hackernews #starlink #mini #failover #spacetech #connectivity

Paul - Antifa. LGBTQ+ safe. @[email protected] · 2026-02-04 · 07:51 UTC

And suddenly the NAS had switched itself off. It's up and running again but this is not good. Glad I have Nastig set up and ready to take over if the NAS is dying.

#recovery #NAS #failover

#recovery #nas #failover

Paul - Antifa. LGBTQ+ safe. @[email protected] · 2026-02-04 · 07:51 UTC

And suddenly the NAS had switched itself off. It's up and running again but this is not good. Glad I have Nastig set up and ready to take over if the NAS is dying.

#recovery #NAS #failover

#recovery #nas #failover

Paul - Antifa. LGBTQ+ safe. @[email protected] · 2026-02-04 · 07:51 UTC

And suddenly the NAS had switched itself off. It's up and running again but this is not good. Glad I have Nastig set up and ready to take over if the NAS is dying.

#recovery #NAS #failover

#recovery #nas #failover

Paul - Antifa. LGBTQ+ safe. @[email protected] · 2026-02-04 · 07:51 UTC

And suddenly the NAS had switched itself off. It's up and running again but this is not good. Glad I have Nastig set up and ready to take over if the NAS is dying.

#recovery #NAS #failover

#failover #nas #recovery

Paul - Antifa. LGBTQ+ safe. @[email protected] · 2026-02-04 · 07:51 UTC

And suddenly the NAS had switched itself off. It's up and running again but this is not good. Glad I have Nastig set up and ready to take over if the NAS is dying.

#recovery #NAS #failover

#recovery #nas #failover

Prague PostgreSQL Dev Day @[email protected] · 2026-01-14 · 13:09 UTC

#throwback What really happens inside a PostgreSQL cluster after failover? 🔄 David Pech dives deep into failover, switchover, split-brain scenarios, and recovery strategies—manually breaking down how tools like Patroni work.

▶️ Watch the video now! https://lnkd.in/dQiiBvXX

#PostgreSQL #PGDay #PPDD #Failover #HighAvailability

#throwback #postgresql #pgday #ppdd #failover #highavailability

Prague PostgreSQL Dev Day @[email protected] · 2026-01-14 · 13:09 UTC

#throwback What really happens inside a PostgreSQL cluster after failover? 🔄 David Pech dives deep into failover, switchover, split-brain scenarios, and recovery strategies—manually breaking down how tools like Patroni work.

▶️ Watch the video now! https://lnkd.in/dQiiBvXX

#PostgreSQL #PGDay #PPDD #Failover #HighAvailability

#throwback #postgresql #pgday #ppdd #failover #highavailability

Prague PostgreSQL Dev Day @[email protected] · 2026-01-14 · 13:09 UTC

#throwback What really happens inside a PostgreSQL cluster after failover? 🔄 David Pech dives deep into failover, switchover, split-brain scenarios, and recovery strategies—manually breaking down how tools like Patroni work.

▶️ Watch the video now! https://lnkd.in/dQiiBvXX

#PostgreSQL #PGDay #PPDD #Failover #HighAvailability

#throwback #postgresql #pgday #ppdd #failover #highavailability

Prague PostgreSQL Dev Day @[email protected] · 2026-01-14 · 13:09 UTC

#throwback What really happens inside a PostgreSQL cluster after failover? 🔄 David Pech dives deep into failover, switchover, split-brain scenarios, and recovery strategies—manually breaking down how tools like Patroni work.

▶️ Watch the video now! https://lnkd.in/dQiiBvXX

#PostgreSQL #PGDay #PPDD #Failover #HighAvailability

#highavailability #failover #ppdd #pgday #postgresql #throwback

Prague PostgreSQL Dev Day @[email protected] · 2026-01-14 · 13:09 UTC

#throwback What really happens inside a PostgreSQL cluster after failover? 🔄 David Pech dives deep into failover, switchover, split-brain scenarios, and recovery strategies—manually breaking down how tools like Patroni work.

▶️ Watch the video now! https://lnkd.in/dQiiBvXX

#PostgreSQL #PGDay #PPDD #Failover #HighAvailability

#throwback #postgresql #pgday #ppdd #failover #highavailability

Reddit Tech VN Bot @[email protected] · 2026-01-08 · 14:19 UTC

Giới thiệu pg-status — microservice nhẹ giúp kiểm tra trạng thái PostgreSQL host, xác định master và replica đồng bộ, hỗ trợ failover và cân bằng tải. Dễ triển khai dạng sidecar, viết bằng C, hiệu năng cao (1500 RPS), API đơn giản qua HTTP. Hỗ trợ đo độ trễ theo thời gian/bytes, tích hợp với libpq hoặc proxy. Phù hợp cho hệ thống cần độ tin cậy cao và phản ứng nhanh với sự cố. #PostgreSQL #Database #DevOps #Microservice #C #HighAvailability #pgstatus #sidecar #replication #failover #trạng_thái_C

#postgresql #database #devops #microservice #c #highavailability

JavaScriptBuzz @[email protected] · 2026-01-06 · 10:30 UTC

JS vs PHP Scraper Failover: Outsmart IP Bans

Cache, retry, and switch sources before sales tank.

#php #javascript #scraping #cache #failover #pricing #viralcoding #codecomparison #growthhacks #reliability

https://www.youtube.com/watch?v=wJoX5yxh5U8

#php #javascript #scraping #cache #failover #pricing

JavaScriptBuzz @[email protected] · 2026-01-06 · 10:30 UTC

JS vs PHP Scraper Failover: Outsmart IP Bans

Cache, retry, and switch sources before sales tank.

#php #javascript #scraping #cache #failover #pricing #viralcoding #codecomparison #growthhacks #reliability

https://www.youtube.com/watch?v=wJoX5yxh5U8

#php #javascript #scraping #cache #failover #pricing

JavaScriptBuzz @[email protected] · 2026-01-06 · 10:30 UTC

JS vs PHP Scraper Failover: Outsmart IP Bans

Cache, retry, and switch sources before sales tank.

#php #javascript #scraping #cache #failover #pricing #viralcoding #codecomparison #growthhacks #reliability

https://www.youtube.com/watch?v=wJoX5yxh5U8

#php #javascript #scraping #cache #failover #pricing

JavaScriptBuzz @[email protected] · 2026-01-06 · 10:30 UTC

JS vs PHP Scraper Failover: Outsmart IP Bans

Cache, retry, and switch sources before sales tank.

#php #javascript #scraping #cache #failover #pricing #viralcoding #codecomparison #growthhacks #reliability

https://www.youtube.com/watch?v=wJoX5yxh5U8

#reliability #growthhacks #codecomparison #viralcoding #pricing #failover

JavaScriptBuzz @[email protected] · 2026-01-06 · 10:30 UTC

JS vs PHP Scraper Failover: Outsmart IP Bans

Cache, retry, and switch sources before sales tank.

#php #javascript #scraping #cache #failover #pricing #viralcoding #codecomparison #growthhacks #reliability

https://www.youtube.com/watch?v=wJoX5yxh5U8

#php #javascript #scraping #cache #failover #pricing

Habr @[email protected] · 2025-12-23 · 11:02 UTC

Я наконец-то понял, как открытость может помешать — и отчёт об аварии

В прошлый понедельник у нас случилась очередная крайне идиотская авария. Идиоты тут мы, если что, и сейчас я расскажу детали. Пострадало четыре сервера из всего ЦОДа — и все наши публичные коммуникации. Потому что владельцы виртуальных машин пришли под все посты и везде оставили комментарии. Параллельно была ещё одна история — под статьёй про то, что случалось за год, написал человек, мол, чего у вас всё постоянно ломается. Я вот размещаюсь у регионального провайдера, и у него за 7 лет ни одной проблемы. Так вот. Разница в том, что мы про всё это рассказываем. Тот провайдер наверняка уже раз 10 падал, останавливался и оставался без сети, но грамотно заталкивал косяки под ковёр. Это значит — никаких блогов на Хабре, никаких публичных коммуникаций с комментариями (типа канала в Телеграме), никаких объяснений кроме лицемерных ответов от службы поддержки и т.п. И тогда, внезапно, вас будут воспринимать более стабильным и надёжным. Наверное. Ну а я продолжаю рассказывать, что у нас происходило. Добро пожаловать в очередной RCA, где главное в поиске root cause было не выйти на самих себя. Но мы вышли!

https://habr.com/ru/companies/ruvds/articles/979616/

#ruvds_статьи #цод #авария #rca #ибп #резервное_питание #дизельгенераторные_установки #клиентский_сервис #failover

#failover #клиентский_сервис #дизельгенераторные_установки #резервное_питание #ибп #rca

Habr @[email protected] · 2025-12-23 · 11:02 UTC

Я наконец-то понял, как открытость может помешать — и отчёт об аварии

В прошлый понедельник у нас случилась очередная крайне идиотская авария. Идиоты тут мы, если что, и сейчас я расскажу детали. Пострадало четыре сервера из всего ЦОДа — и все наши публичные коммуникации. Потому что владельцы виртуальных машин пришли под все посты и везде оставили комментарии. Параллельно была ещё одна история — под статьёй про то, что случалось за год, написал человек, мол, чего у вас всё постоянно ломается. Я вот размещаюсь у регионального провайдера, и у него за 7 лет ни одной проблемы. Так вот. Разница в том, что мы про всё это рассказываем. Тот провайдер наверняка уже раз 10 падал, останавливался и оставался без сети, но грамотно заталкивал косяки под ковёр. Это значит — никаких блогов на Хабре, никаких публичных коммуникаций с комментариями (типа канала в Телеграме), никаких объяснений кроме лицемерных ответов от службы поддержки и т.п. И тогда, внезапно, вас будут воспринимать более стабильным и надёжным. Наверное. Ну а я продолжаю рассказывать, что у нас происходило. Добро пожаловать в очередной RCA, где главное в поиске root cause было не выйти на самих себя. Но мы вышли!

https://habr.com/ru/companies/ruvds/articles/979616/

#ruvds_статьи #цод #авария #rca #ибп #резервное_питание #дизельгенераторные_установки #клиентский_сервис #failover

#failover #клиентский_сервис #дизельгенераторные_установки #резервное_питание #ибп #rca