home.social

#openebs — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #openebs, aggregated by home.social.

  1. I've been a little rough and irresponsible with my #baremetal #Kubernetes cluster, especially when it comes to randomly rebooting nodes. Today I fixed that.

    I'm running a bunch of somewhat delicate workloads, including database clusters with CSIs like #Longhorn and #OpenEBS. Checking if everything is in working order has been demanding task and often something I've skipped before rebooting or upgrading nodes - occasionally with horrific results.

    Last night I finally took the time and wrote a pretty thorough script that checks that everything is working and healthy, before politely cordoning off a node, draining it and applying upgrades.

    I felt so confident today that I tested it by running this new safe upgrade script for all the nodes in the cluster - and it worked! All nodes are now fully upgraded and running kernel 6.12.73 on Debian 13.

    This also fixes the outstanding issue caused by #Hetzner no longer supporting obtaining IP addresses through DHCP.

    #Linux #MSTDNDK #K8s

  2. I've been a little rough and irresponsible with my #baremetal #Kubernetes cluster, especially when it comes to randomly rebooting nodes. Today I fixed that.

    I'm running a bunch of somewhat delicate workloads, including database clusters with CSIs like #Longhorn and #OpenEBS. Checking if everything is in working order has been demanding task and often something I've skipped before rebooting or upgrading nodes - occasionally with horrific results.

    Last night I finally took the time and wrote a pretty thorough script that checks that everything is working and healthy, before politely cordoning off a node, draining it and applying upgrades.

    I felt so confident today that I tested it by running this new safe upgrade script for all the nodes in the cluster - and it worked! All nodes are now fully upgraded and running kernel 6.12.73 on Debian 13.

    This also fixes the outstanding issue caused by #Hetzner no longer supporting obtaining IP addresses through DHCP.

    #Linux #MSTDNDK #K8s

  3. I've been a little rough and irresponsible with my #baremetal #Kubernetes cluster, especially when it comes to randomly rebooting nodes. Today I fixed that.

    I'm running a bunch of somewhat delicate workloads, including database clusters with CSIs like #Longhorn and #OpenEBS. Checking if everything is in working order has been demanding task and often something I've skipped before rebooting or upgrading nodes - occasionally with horrific results.

    Last night I finally took the time and wrote a pretty thorough script that checks that everything is working and healthy, before politely cordoning off a node, draining it and applying upgrades.

    I felt so confident today that I tested it by running this new safe upgrade script for all the nodes in the cluster - and it worked! All nodes are now fully upgraded and running kernel 6.12.73 on Debian 13.

    This also fixes the outstanding issue caused by #Hetzner no longer supporting obtaining IP addresses through DHCP.

    #Linux #MSTDNDK #K8s

  4. I've been a little rough and irresponsible with my #baremetal #Kubernetes cluster, especially when it comes to randomly rebooting nodes. Today I fixed that.

    I'm running a bunch of somewhat delicate workloads, including database clusters with CSIs like #Longhorn and #OpenEBS. Checking if everything is in working order has been demanding task and often something I've skipped before rebooting or upgrading nodes - occasionally with horrific results.

    Last night I finally took the time and wrote a pretty thorough script that checks that everything is working and healthy, before politely cordoning off a node, draining it and applying upgrades.

    I felt so confident today that I tested it by running this new safe upgrade script for all the nodes in the cluster - and it worked! All nodes are now fully upgraded and running kernel 6.12.73 on Debian 13.

    This also fixes the outstanding issue caused by #Hetzner no longer supporting obtaining IP addresses through DHCP.

    #Linux #MSTDNDK #K8s

  5. I've been a little rough and irresponsible with my #baremetal #Kubernetes cluster, especially when it comes to randomly rebooting nodes. Today I fixed that.

    I'm running a bunch of somewhat delicate workloads, including database clusters with CSIs like #Longhorn and #OpenEBS. Checking if everything is in working order has been demanding task and often something I've skipped before rebooting or upgrading nodes - occasionally with horrific results.

    Last night I finally took the time and wrote a pretty thorough script that checks that everything is working and healthy, before politely cordoning off a node, draining it and applying upgrades.

    I felt so confident today that I tested it by running this new safe upgrade script for all the nodes in the cluster - and it worked! All nodes are now fully upgraded and running kernel 6.12.73 on Debian 13.

    This also fixes the outstanding issue caused by #Hetzner no longer supporting obtaining IP addresses through DHCP.

    #Linux #MSTDNDK #K8s

  6. Just when I thought I was almost out of yaks (and finally installing #Airflow!), #terragrunt got confused and started demanding to create resources that already exist, which broke #openEBS, which broke... sigh ...

    Another 2 days of work later, #argocd is installed in my neurons and my cluster, and most of my config is refactored "enough". I swear we'll actually get to do some #datascience someday folks...

    Big data on a tiny budget is hard!

    #dataengineering #sre

  7. Having recently experienced a rather horrible #Kubernetes crash, I'm looking for #backup solutions. We're good with PostgreSQL since we're using #CNPG with remote transaction logs to an offsite #S3 bucket. I need something for volumes and maybe Kubernetes resources. #Longhorn offers S3 backups for it's own volumes, but for other #CSI like local #OpenEBS, maybe #Velero? Thoughts?

    velero.io/

  8. I've been doing things I shouldn't with #Kubernetes. We're using a replicated #MinIO cluster as the storage backend on #mstdndk, which requires a boat load of storage, especially if you forget to specify any kind of retention. So far, the quick workaround for a full disk, was just to expand the filesystem. Since we're replicating across nodes, we're using #OpenEBS #LVM for local storage. Poor partitioning means we're running out of storage on the volume group, but even worse - PVCs sizes were increased before checking if we had space for it. Kubernetes is now stuck in a most unfortunate situation - it can't grow the local filesystem, as the volume group is full and you're not allowed to decrease the size request. What then? Cue github.com/etcd-io/auger - a tools that allows you to edit #K8s resources directly in #etcd. Obviously you should never do this, but with steady hands and clinical precision, you can get yourself out of a pickle like mine. Size was reverted and PVCs were unstuck.

  9. There, now the services are on #MetalLB IPs, and the #ReverseProxy forwards rukii.net to those. Works perfectly, #ZeroDowntime.

    Now everything else is #HighAvailability, except the persistent volumes. For #OpenEBS I'll need a third cluster node, which is in the mail... And of course the reverse proxy and the internet connections aren't redundant yet, in principle I could set up another internet connection e.g. over 4G, but for now the fiber and the proxy are reliable enough.