#postmortem — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #postmortem, aggregated by home.social.
-
When two Hetzner servers died at the same time
On May 12, 2026, two of my Arch Linux + LUKS servers at Hetzner became unreachable at the same moment. Both had been running for 4+ months without issue. Both had received the same
pacman -Syyuthe day before, but had stayed on the old kernel until the morning the websites stopped responding. I rebooted — SSH never came back.nmap -Pn -p 22showedfilteredfrom anywhere. No ping. No banner. The Hetzner Robot panel insisted the hardware was fine.Several hours went into hypotheses that turned out to be wrong:
- The
encryptsshinitcpio hook referencing a/usr/lib/initcpio/udev/11-dm-initramfs.rulesfile that no longer exists. Real bug, no boot impact — the initramfs rebuilds anyway. PermitRootLogin noinsshd_config. Real misconfiguration, fixed it, didn’t help. A refusing sshd showsclosed, notfiltered.- Predictable interface-naming drift after the systemd 260 upgrade. Patched the
.networkconfig to match by MAC. Useful hardening; not the cause. - Stale GRUB stage1 +
core.imgin the MBR. Arch never re-runsgrub-installafter agrubpackage upgrade. Refreshed it. Still filtered. - Kernel 7.0.5 regression. Downgraded to 6.18.3, the kernel that had run for 4 months. Still filtered. So the kernel itself wasn’t it either.
The clue was in the persistent journal: a single recorded boot from December 31 to May 12 10:13 UTC, and absolutely nothing after. Every reboot since the upgrade was failing before
systemd-journaldcould flush to disk — so the failure had to be in the initramfs, before the root filesystem was even mounted.What it almost certainly was
Hetzner Dedicated servers configure the initramfs network with
ip=dhcpon the kernel command line. That depends on Hetzner’s DHCP server replying to whatever request format the current kernel sends. Somewhere between kernel 6.18 / iproute2 6.18 and kernel 7.0 / iproute2 7.0, the request format changed enough that Hetzner’s DHCP stopped responding. Effects:- Old kernel at runtime kept the interface already configured (Phase A — 32 hours of healthy operation after the package upgrade).
- New kernel cold-boots, hits DHCP, never gets an IP, dropbear cannot listen, port 22 stays
filtered.
Hetzner’s own documentation has been quietly moving away from
ip=dhcptoward static IPv4 in the kernel command line. The fix is exactly that:GRUB_CMDLINE_LINUX="cryptdevice=/dev/md1:cryptroot ip=A.B.C.D::GATEWAY:255.255.255.255:hostname:eth0:none"One line in
/etc/default/grub,grub-mkconfig, reboot. No more dependency on Hetzner’s DHCP responding to whatever your current kernel sends.Why it matters for anyone running this stack
If you run Arch on Hetzner Dedicated with full-disk encryption and remote unlock via dropbear, the
ip=dhcpshipped byinstallimageis a latent bug. It can keep working for years and then break overnight, on every machine you have, after a routinepacman -Syyu. The static-IP version is what Hetzner now recommends and removes the entire dependency.Tooling
While debugging, I turned the whole rescue / chroot / diagnose / fix workflow into a Python CLI (
hal) — includinghal fix static-ip, which derives the static cmdline directly from your existingsystemd-networkd.networkfile:→ github.com/kevinveenbirkenbach/hetzner-arch-luks
Single command, idempotent, reversible (the original
#ArchLinux #bootFailure #debugging #DevOps #DHCP #Dropbear #fullDiskEncryption #GRUB #Hetzner #initramfs #kernelUpgrade #Linux #LUKS #mkinitcpio #pacman #postmortem #PythonCLI #serverOutage #sysadmin #systemdNetworkd/etc/default/grubis backed up to.hal-backup). If you’re on this stack, switch to static IP before the next kernel upgrade catches you. - The
-
When two Hetzner servers died at the same time
On May 12, 2026, two of my Arch Linux + LUKS servers at Hetzner became unreachable at the same moment. Both had been running for 4+ months without issue. Both had received the same
pacman -Syyuthe day before, but had stayed on the old kernel until the morning the websites stopped responding. I rebooted — SSH never came back.nmap -Pn -p 22showedfilteredfrom anywhere. No ping. No banner. The Hetzner Robot panel insisted the hardware was fine.Several hours went into hypotheses that turned out to be wrong:
- The
encryptsshinitcpio hook referencing a/usr/lib/initcpio/udev/11-dm-initramfs.rulesfile that no longer exists. Real bug, no boot impact — the initramfs rebuilds anyway. PermitRootLogin noinsshd_config. Real misconfiguration, fixed it, didn’t help. A refusing sshd showsclosed, notfiltered.- Predictable interface-naming drift after the systemd 260 upgrade. Patched the
.networkconfig to match by MAC. Useful hardening; not the cause. - Stale GRUB stage1 +
core.imgin the MBR. Arch never re-runsgrub-installafter agrubpackage upgrade. Refreshed it. Still filtered. - Kernel 7.0.5 regression. Downgraded to 6.18.3, the kernel that had run for 4 months. Still filtered. So the kernel itself wasn’t it either.
The clue was in the persistent journal: a single recorded boot from December 31 to May 12 10:13 UTC, and absolutely nothing after. Every reboot since the upgrade was failing before
systemd-journaldcould flush to disk — so the failure had to be in the initramfs, before the root filesystem was even mounted.What it almost certainly was
Hetzner Dedicated servers configure the initramfs network with
ip=dhcpon the kernel command line. That depends on Hetzner’s DHCP server replying to whatever request format the current kernel sends. Somewhere between kernel 6.18 / iproute2 6.18 and kernel 7.0 / iproute2 7.0, the request format changed enough that Hetzner’s DHCP stopped responding. Effects:- Old kernel at runtime kept the interface already configured (Phase A — 32 hours of healthy operation after the package upgrade).
- New kernel cold-boots, hits DHCP, never gets an IP, dropbear cannot listen, port 22 stays
filtered.
Hetzner’s own documentation has been quietly moving away from
ip=dhcptoward static IPv4 in the kernel command line. The fix is exactly that:GRUB_CMDLINE_LINUX="cryptdevice=/dev/md1:cryptroot ip=A.B.C.D::GATEWAY:255.255.255.255:hostname:eth0:none"One line in
/etc/default/grub,grub-mkconfig, reboot. No more dependency on Hetzner’s DHCP responding to whatever your current kernel sends.Why it matters for anyone running this stack
If you run Arch on Hetzner Dedicated with full-disk encryption and remote unlock via dropbear, the
ip=dhcpshipped byinstallimageis a latent bug. It can keep working for years and then break overnight, on every machine you have, after a routinepacman -Syyu. The static-IP version is what Hetzner now recommends and removes the entire dependency.Tooling
While debugging, I turned the whole rescue / chroot / diagnose / fix workflow into a Python CLI (
hal) — includinghal fix static-ip, which derives the static cmdline directly from your existingsystemd-networkd.networkfile:→ github.com/kevinveenbirkenbach/hetzner-arch-luks
Single command, idempotent, reversible (the original
#ArchLinux #bootFailure #debugging #DevOps #DHCP #Dropbear #fullDiskEncryption #GRUB #Hetzner #initramfs #kernelUpgrade #Linux #LUKS #mkinitcpio #pacman #postmortem #PythonCLI #serverOutage #sysadmin #systemdNetworkd/etc/default/grubis backed up to.hal-backup). If you’re on this stack, switch to static IP before the next kernel upgrade catches you. - The
-
When two Hetzner servers died at the same time
On May 12, 2026, two of my Arch Linux + LUKS servers at Hetzner became unreachable at the same moment. Both had been running for 4+ months without issue. Both had received the same
pacman -Syyuthe day before, but had stayed on the old kernel until the morning the websites stopped responding. I rebooted — SSH never came back.nmap -Pn -p 22showedfilteredfrom anywhere. No ping. No banner. The Hetzner Robot panel insisted the hardware was fine.Several hours went into hypotheses that turned out to be wrong:
- The
encryptsshinitcpio hook referencing a/usr/lib/initcpio/udev/11-dm-initramfs.rulesfile that no longer exists. Real bug, no boot impact — the initramfs rebuilds anyway. PermitRootLogin noinsshd_config. Real misconfiguration, fixed it, didn’t help. A refusing sshd showsclosed, notfiltered.- Predictable interface-naming drift after the systemd 260 upgrade. Patched the
.networkconfig to match by MAC. Useful hardening; not the cause. - Stale GRUB stage1 +
core.imgin the MBR. Arch never re-runsgrub-installafter agrubpackage upgrade. Refreshed it. Still filtered. - Kernel 7.0.5 regression. Downgraded to 6.18.3, the kernel that had run for 4 months. Still filtered. So the kernel itself wasn’t it either.
The clue was in the persistent journal: a single recorded boot from December 31 to May 12 10:13 UTC, and absolutely nothing after. Every reboot since the upgrade was failing before
systemd-journaldcould flush to disk — so the failure had to be in the initramfs, before the root filesystem was even mounted.What it almost certainly was
Hetzner Dedicated servers configure the initramfs network with
ip=dhcpon the kernel command line. That depends on Hetzner’s DHCP server replying to whatever request format the current kernel sends. Somewhere between kernel 6.18 / iproute2 6.18 and kernel 7.0 / iproute2 7.0, the request format changed enough that Hetzner’s DHCP stopped responding. Effects:- Old kernel at runtime kept the interface already configured (Phase A — 32 hours of healthy operation after the package upgrade).
- New kernel cold-boots, hits DHCP, never gets an IP, dropbear cannot listen, port 22 stays
filtered.
Hetzner’s own documentation has been quietly moving away from
ip=dhcptoward static IPv4 in the kernel command line. The fix is exactly that:GRUB_CMDLINE_LINUX="cryptdevice=/dev/md1:cryptroot ip=A.B.C.D::GATEWAY:255.255.255.255:hostname:eth0:none"One line in
/etc/default/grub,grub-mkconfig, reboot. No more dependency on Hetzner’s DHCP responding to whatever your current kernel sends.Why it matters for anyone running this stack
If you run Arch on Hetzner Dedicated with full-disk encryption and remote unlock via dropbear, the
ip=dhcpshipped byinstallimageis a latent bug. It can keep working for years and then break overnight, on every machine you have, after a routinepacman -Syyu. The static-IP version is what Hetzner now recommends and removes the entire dependency.Tooling
While debugging, I turned the whole rescue / chroot / diagnose / fix workflow into a Python CLI (
hal) — includinghal fix static-ip, which derives the static cmdline directly from your existingsystemd-networkd.networkfile:→ github.com/kevinveenbirkenbach/hetzner-arch-luks
Single command, idempotent, reversible (the original
#ArchLinux #bootFailure #debugging #DevOps #DHCP #Dropbear #fullDiskEncryption #GRUB #Hetzner #initramfs #kernelUpgrade #Linux #LUKS #mkinitcpio #pacman #postmortem #PythonCLI #serverOutage #sysadmin #systemdNetworkd/etc/default/grubis backed up to.hal-backup). If you’re on this stack, switch to static IP before the next kernel upgrade catches you. - The
-
When two Hetzner servers died at the same time
On May 12, 2026, two of my Arch Linux + LUKS servers at Hetzner became unreachable at the same moment. Both had been running for 4+ months without issue. Both had received the same
pacman -Syyuthe day before, but had stayed on the old kernel until the morning the websites stopped responding. I rebooted — SSH never came back.nmap -Pn -p 22showedfilteredfrom anywhere. No ping. No banner. The Hetzner Robot panel insisted the hardware was fine.Several hours went into hypotheses that turned out to be wrong:
- The
encryptsshinitcpio hook referencing a/usr/lib/initcpio/udev/11-dm-initramfs.rulesfile that no longer exists. Real bug, no boot impact — the initramfs rebuilds anyway. PermitRootLogin noinsshd_config. Real misconfiguration, fixed it, didn’t help. A refusing sshd showsclosed, notfiltered.- Predictable interface-naming drift after the systemd 260 upgrade. Patched the
.networkconfig to match by MAC. Useful hardening; not the cause. - Stale GRUB stage1 +
core.imgin the MBR. Arch never re-runsgrub-installafter agrubpackage upgrade. Refreshed it. Still filtered. - Kernel 7.0.5 regression. Downgraded to 6.18.3, the kernel that had run for 4 months. Still filtered. So the kernel itself wasn’t it either.
The clue was in the persistent journal: a single recorded boot from December 31 to May 12 10:13 UTC, and absolutely nothing after. Every reboot since the upgrade was failing before
systemd-journaldcould flush to disk — so the failure had to be in the initramfs, before the root filesystem was even mounted.What it almost certainly was
Hetzner Dedicated servers configure the initramfs network with
ip=dhcpon the kernel command line. That depends on Hetzner’s DHCP server replying to whatever request format the current kernel sends. Somewhere between kernel 6.18 / iproute2 6.18 and kernel 7.0 / iproute2 7.0, the request format changed enough that Hetzner’s DHCP stopped responding. Effects:- Old kernel at runtime kept the interface already configured (Phase A — 32 hours of healthy operation after the package upgrade).
- New kernel cold-boots, hits DHCP, never gets an IP, dropbear cannot listen, port 22 stays
filtered.
Hetzner’s own documentation has been quietly moving away from
ip=dhcptoward static IPv4 in the kernel command line. The fix is exactly that:GRUB_CMDLINE_LINUX="cryptdevice=/dev/md1:cryptroot ip=A.B.C.D::GATEWAY:255.255.255.255:hostname:eth0:none"One line in
/etc/default/grub,grub-mkconfig, reboot. No more dependency on Hetzner’s DHCP responding to whatever your current kernel sends.Why it matters for anyone running this stack
If you run Arch on Hetzner Dedicated with full-disk encryption and remote unlock via dropbear, the
ip=dhcpshipped byinstallimageis a latent bug. It can keep working for years and then break overnight, on every machine you have, after a routinepacman -Syyu. The static-IP version is what Hetzner now recommends and removes the entire dependency.Tooling
While debugging, I turned the whole rescue / chroot / diagnose / fix workflow into a Python CLI (
hal) — includinghal fix static-ip, which derives the static cmdline directly from your existingsystemd-networkd.networkfile:→ github.com/kevinveenbirkenbach/hetzner-arch-luks
Single command, idempotent, reversible (the original
#ArchLinux #bootFailure #debugging #DevOps #DHCP #Dropbear #fullDiskEncryption #GRUB #Hetzner #initramfs #kernelUpgrade #Linux #LUKS #mkinitcpio #pacman #postmortem #PythonCLI #serverOutage #sysadmin #systemdNetworkd/etc/default/grubis backed up to.hal-backup). If you’re on this stack, switch to static IP before the next kernel upgrade catches you. - The
-
When two Hetzner servers died at the same time
On May 12, 2026, two of my Arch Linux + LUKS servers at Hetzner became unreachable at the same moment. Both had been running for 4+ months without issue. Both had received the same
pacman -Syyuthe day before, but had stayed on the old kernel until the morning the websites stopped responding. I rebooted — SSH never came back.nmap -Pn -p 22showedfilteredfrom anywhere. No ping. No banner. The Hetzner Robot panel insisted the hardware was fine.Several hours went into hypotheses that turned out to be wrong:
- The
encryptsshinitcpio hook referencing a/usr/lib/initcpio/udev/11-dm-initramfs.rulesfile that no longer exists. Real bug, no boot impact — the initramfs rebuilds anyway. PermitRootLogin noinsshd_config. Real misconfiguration, fixed it, didn’t help. A refusing sshd showsclosed, notfiltered.- Predictable interface-naming drift after the systemd 260 upgrade. Patched the
.networkconfig to match by MAC. Useful hardening; not the cause. - Stale GRUB stage1 +
core.imgin the MBR. Arch never re-runsgrub-installafter agrubpackage upgrade. Refreshed it. Still filtered. - Kernel 7.0.5 regression. Downgraded to 6.18.3, the kernel that had run for 4 months. Still filtered. So the kernel itself wasn’t it either.
The clue was in the persistent journal: a single recorded boot from December 31 to May 12 10:13 UTC, and absolutely nothing after. Every reboot since the upgrade was failing before
systemd-journaldcould flush to disk — so the failure had to be in the initramfs, before the root filesystem was even mounted.What it almost certainly was
Hetzner Dedicated servers configure the initramfs network with
ip=dhcpon the kernel command line. That depends on Hetzner’s DHCP server replying to whatever request format the current kernel sends. Somewhere between kernel 6.18 / iproute2 6.18 and kernel 7.0 / iproute2 7.0, the request format changed enough that Hetzner’s DHCP stopped responding. Effects:- Old kernel at runtime kept the interface already configured (Phase A — 32 hours of healthy operation after the package upgrade).
- New kernel cold-boots, hits DHCP, never gets an IP, dropbear cannot listen, port 22 stays
filtered.
Hetzner’s own documentation has been quietly moving away from
ip=dhcptoward static IPv4 in the kernel command line. The fix is exactly that:GRUB_CMDLINE_LINUX="cryptdevice=/dev/md1:cryptroot ip=A.B.C.D::GATEWAY:255.255.255.255:hostname:eth0:none"One line in
/etc/default/grub,grub-mkconfig, reboot. No more dependency on Hetzner’s DHCP responding to whatever your current kernel sends.Why it matters for anyone running this stack
If you run Arch on Hetzner Dedicated with full-disk encryption and remote unlock via dropbear, the
ip=dhcpshipped byinstallimageis a latent bug. It can keep working for years and then break overnight, on every machine you have, after a routinepacman -Syyu. The static-IP version is what Hetzner now recommends and removes the entire dependency.Tooling
While debugging, I turned the whole rescue / chroot / diagnose / fix workflow into a Python CLI (
hal) — includinghal fix static-ip, which derives the static cmdline directly from your existingsystemd-networkd.networkfile:→ github.com/kevinveenbirkenbach/hetzner-arch-luks
Single command, idempotent, reversible (the original
#ArchLinux #bootFailure #debugging #DevOps #DHCP #Dropbear #fullDiskEncryption #GRUB #Hetzner #initramfs #kernelUpgrade #Linux #LUKS #mkinitcpio #pacman #postmortem #PythonCLI #serverOutage #sysadmin #systemdNetworkd/etc/default/grubis backed up to.hal-backup). If you’re on this stack, switch to static IP before the next kernel upgrade catches you. - The
-
Post Mortem: axios NPM supply chain compromise
https://github.com/axios/axios/issues/10636
#HackerNews #PostMortem #axios #NPM #supplyChain #Compromise #CyberSecurity #OpenSource
-
Post Mortem: axios NPM supply chain compromise
https://github.com/axios/axios/issues/10636
#HackerNews #PostMortem #axios #NPM #supplyChain #Compromise #CyberSecurity #OpenSource
-
Post Mortem: axios NPM supply chain compromise
https://github.com/axios/axios/issues/10636
#HackerNews #PostMortem #axios #NPM #supplyChain #Compromise #CyberSecurity #OpenSource
-
Post Mortem: axios NPM supply chain compromise
https://github.com/axios/axios/issues/10636
#HackerNews #PostMortem #axios #NPM #supplyChain #Compromise #CyberSecurity #OpenSource
-
Post Mortem: axios NPM supply chain compromise
https://github.com/axios/axios/issues/10636
#HackerNews #PostMortem #axios #NPM #supplyChain #Compromise #CyberSecurity #OpenSource
-
A quotation from George Washington
Though in reviewing the incidents of my Administration I am unconscious of intentional error, I am nevertheless too sensible of my defects not to think it probable that I may have committed many errors. Whatever they may be, I fervently beseech the Almighty to avert or mitigate the evils to which they may tend. I shall also carry with me the hope that my country will never cease to view them with indulgence, and that, after forty-five years of my life dedicated to its service with an upright zeal, the faults of incompetent abilities will be consigned to oblivion, as myself must soon be to the mansions of rest.
George Washington (1732–1799) American military leader, Founding Father, US President (1789–1797)
Letter (1796-09-17), "Farewell Address" [with J. Madison, A. Hamilton]More about this quote: wist.info/washington-george/81…
#quote #quotes #quotation #qotd #georgewashington #selfcriticism #selfassessment #retirement #administration #blame #error #fault #forgiveness #honestmistake #humility #mistake #oblivion #postmortem #president #review #selfdeprecation
-
Awesome! 👌🏽
“How Complex Systems Fail” [1998], Richard Cook (https://how.complexsystems.fail/).
On HN: https://news.ycombinator.com/item?id=25550685
On Lobsters: https://lobste.rs/s/smzguc/how_complex_systems_fail
#Complexity #Failure #Reliability #SRE #RCA #PostMortem #RootCauseAnalysis #Fragility
-
Как анализировать инциденты. История об ошибках
Стоимость минуты простоя в iGaming может приносить миллионы упущенной прибыли и более тяжелые репутационные потери. Когда real‑time ставки замирают, а букмекерские терминалы уходят в ступор — это не просто баг. Это экзамен на зрелость команды и процессов. Что мы делаем после — определяет, повторится ли он снова.