#rke2 — Public Fediverse posts on home.social

Johannes Kastl @[email protected] · 2026-03-14 · 16:35 UTC

As Hetzner is deprecating dns configuration via the dns-console, I migrated my domains to the new Cloud API. Last piece of the puzzle was to create new tokens and move from the old cert-manager-webhook-hetzner (by vadimkim) to the official chart maintained by Hetzner.

Migrated my 7 kubernetes clusters (k3s, rke2, OpenShift) without major hiccups, only had to do some cleanup due to old acme challenge entries being leftover after the migration (as cert-manager could not remove them without the new webhook and API token).

Only things left are the machines without k3s using lego.

#homelab #hetzner #certmanager #dns #hellyeah #kubernetes #k3s #rke2

#homelab #hetzner #certmanager #dns #hellyeah #kubernetes

Johannes Kastl @[email protected] · 2026-03-14 · 16:35 UTC

As Hetzner is deprecating dns configuration via the dns-console, I migrated my domains to the new Cloud API. Last piece of the puzzle was to create new tokens and move from the old cert-manager-webhook-hetzner (by vadimkim) to the official chart maintained by Hetzner.

Migrated my 7 kubernetes clusters (k3s, rke2, OpenShift) without major hiccups, only had to do some cleanup due to old acme challenge entries being leftover after the migration (as cert-manager could not remove them without the new webhook and API token).

Only things left are the machines without k3s using lego.

#homelab #hetzner #certmanager #dns #hellyeah #kubernetes #k3s #rke2

#homelab #hetzner #certmanager #dns #hellyeah #kubernetes

Mika @[email protected] · 2026-01-23 · 19:49 UTC

#k3s or #k3d? Is there even a difference - and also, what are they useful for? Is it really only good for a quick 'throw-away' #kubernetes cluster for testing, or something? coming from something like #rke2 (which i know is prolly not a good comparison, but still curious how they could be useful to me)

#k3s #k3d #kubernetes #rke2

Mika @[email protected] · 2026-01-23 · 19:49 UTC

#k3s or #k3d? Is there even a difference - and also, what are they useful for? Is it really only good for a quick 'throw-away' #kubernetes cluster for testing, or something? coming from something like #rke2 (which i know is prolly not a good comparison, but still curious how they could be useful to me)

#k3s #k3d #kubernetes #rke2

Mika @[email protected] · 2025-12-01 · 06:49 UTC

Updated #Orked, my collection of scripts to help set up a production-ready #RKE2 #Kubernetes cluster in your #homelab. This update brings general improvements to the scripts, improved documentation, #HAProxy load balancer support for load balancing multiple Master nodes, and upgraded all components including RKE2, #Longhorn, #Nginx Ingress, #Cert-manager, #MetalLB, #Rancher, etc. to their latest versions.

I still hope someday to support more Kubernetes distributions like #k3s, but haven't gotten around to it. I've also been planning to support more #Linux distros as the base too, instead of only #RockyLinux/#RHEL, but that'll have to wait as well for now. Regardless, I am quite happy with how mature and stable these scripts have turned out to be. If you'd like to set up a cluster of your own, maybe check it out!

🔗 https://github.com/irfanhakim-as/orked

🔗 https://github.com/irfanhakim-as/orked/pull/41

#orked #rke2 #kubernetes #homelab #haproxy #longhorn

Mika @[email protected] · 2025-12-01 · 06:49 UTC

Updated #Orked, my collection of scripts to help set up a production-ready #RKE2 #Kubernetes cluster in your #homelab. This update brings general improvements to the scripts, improved documentation, #HAProxy load balancer support for load balancing multiple Master nodes, and upgraded all components including RKE2, #Longhorn, #Nginx Ingress, #Cert-manager, #MetalLB, #Rancher, etc. to their latest versions.

I still hope someday to support more Kubernetes distributions like #k3s, but haven't gotten around to it. I've also been planning to support more #Linux distros as the base too, instead of only #RockyLinux/#RHEL, but that'll have to wait as well for now. Regardless, I am quite happy with how mature and stable these scripts have turned out to be. If you'd like to set up a cluster of your own, maybe check it out!

🔗 https://github.com/irfanhakim-as/orked

🔗 https://github.com/irfanhakim-as/orked/pull/41

#orked #rke2 #kubernetes #homelab #haproxy #longhorn

Mika @[email protected] · 2025-12-01 · 06:49 UTC

Updated #Orked, my collection of scripts to help set up a production-ready #RKE2 #Kubernetes cluster in your #homelab. This update brings general improvements to the scripts, improved documentation, #HAProxy load balancer support for load balancing multiple Master nodes, and upgraded all components including RKE2, #Longhorn, #Nginx Ingress, #Cert-manager, #MetalLB, #Rancher, etc. to their latest versions.

I still hope someday to support more Kubernetes distributions like #k3s, but haven't gotten around to it. I've also been planning to support more #Linux distros as the base too, instead of only #RockyLinux/#RHEL, but that'll have to wait as well for now. Regardless, I am quite happy with how mature and stable these scripts have turned out to be. If you'd like to set up a cluster of your own, maybe check it out!

🔗 https://github.com/irfanhakim-as/orked

🔗 https://github.com/irfanhakim-as/orked/pull/41

#orked #rke2 #kubernetes #homelab #haproxy #longhorn

Mika @[email protected] · 2025-12-01 · 06:49 UTC

Updated #Orked, my collection of scripts to help set up a production-ready #RKE2 #Kubernetes cluster in your #homelab. This update brings general improvements to the scripts, improved documentation, #HAProxy load balancer support for load balancing multiple Master nodes, and upgraded all components including RKE2, #Longhorn, #Nginx Ingress, #Cert-manager, #MetalLB, #Rancher, etc. to their latest versions.

I still hope someday to support more Kubernetes distributions like #k3s, but haven't gotten around to it. I've also been planning to support more #Linux distros as the base too, instead of only #RockyLinux/#RHEL, but that'll have to wait as well for now. Regardless, I am quite happy with how mature and stable these scripts have turned out to be. If you'd like to set up a cluster of your own, maybe check it out!

🔗 https://github.com/irfanhakim-as/orked

🔗 https://github.com/irfanhakim-as/orked/pull/41

#orked #rke2 #kubernetes #homelab #haproxy #longhorn

Mika @[email protected] · 2025-12-01 · 06:49 UTC

Updated #Orked, my collection of scripts to help set up a production-ready #RKE2 #Kubernetes cluster in your #homelab. This update brings general improvements to the scripts, improved documentation, #HAProxy load balancer support for load balancing multiple Master nodes, and upgraded all components including RKE2, #Longhorn, #Nginx Ingress, #Cert-manager, #MetalLB, #Rancher, etc. to their latest versions.

I still hope someday to support more Kubernetes distributions like #k3s, but haven't gotten around to it. I've also been planning to support more #Linux distros as the base too, instead of only #RockyLinux/#RHEL, but that'll have to wait as well for now. Regardless, I am quite happy with how mature and stable these scripts have turned out to be. If you'd like to set up a cluster of your own, maybe check it out!

🔗 https://github.com/irfanhakim-as/orked

🔗 https://github.com/irfanhakim-as/orked/pull/41

#rhel #rockylinux #linux #k3s #rancher #metallb

Mika @[email protected] · 2025-11-20 · 04:38 UTC

Hmm my services are running fine as far as I can tell, but my #Rancher/#RKE2 #Kubernetes cluster is acting up - possibly #etcd related?

Biggest tell being how the control plane/API server not being the most responsive, and some essential pods failing/restarting including #cert-manager, cloud-controller-manager, csi-smb-controller, kube-apiserver, kube-scheduler, rke2-snapshot-controller, csi-provisioner + -resizer, -snapshotter, yadda yadda.

Not sure what could be causing it just yet.

#rancher #rke2 #kubernetes #etcd #certmanager

Mika @[email protected] · 2025-11-20 · 04:38 UTC

Hmm my services are running fine as far as I can tell, but my #Rancher/#RKE2 #Kubernetes cluster is acting up - possibly #etcd related?

Biggest tell being how the control plane/API server not being the most responsive, and some essential pods failing/restarting including #cert-manager, cloud-controller-manager, csi-smb-controller, kube-apiserver, kube-scheduler, rke2-snapshot-controller, csi-provisioner + -resizer, -snapshotter, yadda yadda.

Not sure what could be causing it just yet.

#rancher #rke2 #kubernetes #etcd #certmanager

rdm @[email protected] · 2025-10-10 · 13:46 UTC

CW: Systems Administration, K8s, Storage.

At #work today I had an unexpectedly pleasant experience. With #kubernetes of all things.

I'm playing around with a sandpit #RKE2 cluster that I'm using to test how things are going to work when we need to deploy an actual working environment, and the matter of persistent storage came up.

A bit of poking around and I discovered #Longhorn. The requirements were trivial - all the required software was already present, just one service needed to be started. And so I installed it via #helm with just a handful of lines in the config.

And it just works. A fully distributed, clustered, read-write-many capable storage subsystem for Kubernetes, and it took me less than the time it is taking to write this to get it up and running.

I fired up a deployment that required multiple RWM PV's and ... it all just worked. I could even go into a management panel and see how the shards were distributed, and how busy everything was.

For a back-end all it needed was a filesystem on each node of the cluster, and even that could be managed with #LVM so it could be expanded at need.

Compared to the old in-tree VMWare CSI operator, this is a dream come true.

#Linux #SysadminLife

#work #kubernetes #rke2 #longhorn #helm #lvm

rdm @[email protected] · 2025-10-10 · 13:46 UTC

CW: Systems Administration, K8s, Storage.

At #work today I had an unexpectedly pleasant experience. With #kubernetes of all things.

I'm playing around with a sandpit #RKE2 cluster that I'm using to test how things are going to work when we need to deploy an actual working environment, and the matter of persistent storage came up.

A bit of poking around and I discovered #Longhorn. The requirements were trivial - all the required software was already present, just one service needed to be started. And so I installed it via #helm with just a handful of lines in the config.

And it just works. A fully distributed, clustered, read-write-many capable storage subsystem for Kubernetes, and it took me less than the time it is taking to write this to get it up and running.

I fired up a deployment that required multiple RWM PV's and ... it all just worked. I could even go into a management panel and see how the shards were distributed, and how busy everything was.

For a back-end all it needed was a filesystem on each node of the cluster, and even that could be managed with #LVM so it could be expanded at need.

Compared to the old in-tree VMWare CSI operator, this is a dream come true.

#Linux #SysadminLife

#work #kubernetes #rke2 #longhorn #helm #lvm

[cr0n0s@🐧📡⌨️ 🛠️ ~ ] # @[email protected] · 2025-08-20 · 16:00 UTC

Nueva Entrada en #H4ckseed: Actualizar Versiones de un cluster RKE2

#SysAdmin #Linux #DevOps #RKE2

https://h4ckseed.wordpress.com/2025/08/20/actualizar-versiones-de-un-cluster-rke2/

#h4ckseed #sysadmin #linux #devops #rke2

[cr0n0s@🐧📡⌨️ 🛠️ ~ ] # @[email protected] · 2025-08-20 · 16:00 UTC

Nueva Entrada en #H4ckseed: Actualizar Versiones de un cluster RKE2

#SysAdmin #Linux #DevOps #RKE2

https://h4ckseed.wordpress.com/2025/08/20/actualizar-versiones-de-un-cluster-rke2/

#h4ckseed #sysadmin #linux #devops #rke2

DerFetzer @[email protected] · 2025-08-02 · 13:37 UTC

Moved the last #PostgreSQL cluster from Crunchy Postgres to #CloudNativePG.💪

This was the final step in the long overdue migration from #RKE to #RKE2.

What at ride but went pretty smoothly to be honest!🥳

#kubernetes #homelab #diy

#postgresql #cloudnativepg #rke #rke2 #kubernetes #homelab

DerFetzer @[email protected] · 2025-07-11 · 00:18 UTC

Spent 4h debugging my #RKE2 #k8s test-cluster only to find out there was an IP address conflict with the #QEMU gateway on the network🙄🙈
Why is there even a gateway on an isolated network?
And yes, it is 2 AM in the morning!😴

#rke2 #k8s #qemu

Sidero Labs @[email protected] · 2025-07-08 · 11:56 UTC

We know #TalosLinux is 🤏 but is it really the smallest?

We ran the tests. We’ve got the data. Check it out if you like numbers.

Watch → https://youtu.be/atPvnJMGdfs
Read → https://www.siderolabs.com/blog/which-kubernetes-is-the-smallest/

#Kubeadm #K3s #K0s #Kairos #RKE2 #Kubernetes #K8s

#taloslinux #kubeadm #k3s #k0s #kairos #rke2

Sidero Labs @[email protected] · 2025-07-08 · 11:56 UTC

We know #TalosLinux is 🤏 but is it really the smallest?

We ran the tests. We’ve got the data. Check it out if you like numbers.

Watch → https://youtu.be/atPvnJMGdfs
Read → https://www.siderolabs.com/blog/which-kubernetes-is-the-smallest/

#Kubeadm #K3s #K0s #Kairos #RKE2 #Kubernetes #K8s

#taloslinux #kubeadm #k3s #k0s #kairos #rke2

Sidero Labs @siderolabs · 2025-07-08 · 11:56 UTC

We know #TalosLinux is 🤏 but is it really the smallest?

We ran the tests. We’ve got the data. Check it out if you like numbers.

Watch → https://youtu.be/atPvnJMGdfs
Read → https://www.siderolabs.com/blog/which-kubernetes-is-the-smallest/

#Kubeadm #K3s #K0s #Kairos #RKE2 #Kubernetes #K8s

#taloslinux #kubeadm #k3s #k0s #kairos #rke2

Mika @[email protected] · 2025-07-04 · 17:46 UTC

My #homelab wiki is getting really complicated to organise and write for haha, but it's definitely getting more interesting topics like more #RaspberryPi stuffs, #Docker, and some cool stuffs like #OpenMediaVault and #HomeAssistant. I'm taking my sweet time to update them 'properly' and hope it'll all link/piece together sensibly in the end.

This is partially thanks to me embracing the fact that I just don't (yet) have the resources for a standalone 'mega' homelab (#Proxmox & #Kubernetes based) server cluster that I could simply throw everything to it, hence supplementing that setup with tinier SBC-based servers. Gives me a bit of peace of mind too that things are now more 'spread out'.

The most interesting bit will probably be when I manage to explore replicating a mini version of my #RKE2 Kubernetes cluster, on a single (or at most, two) Raspberry Pi node - maybe based on #k3s, assuming that's better. I'm just not there yet cos I'm kinda reluctant if using something like #k8s on RPi makes much sense since I'm expecting a lot of resources will be wasted that way, when hosting on Docker alone (i.e. on #Portainer) should be leaner.

🔗 Anyway, if y'all wanna keep an eye on it: https://github.com/irfanhakim-as/homelab-wiki

#homelab #raspberrypi #docker #openmediavault #homeassistant #proxmox

Mika @[email protected] · 2025-07-04 · 17:46 UTC

My #homelab wiki is getting really complicated to organise and write for haha, but it's definitely getting more interesting topics like more #RaspberryPi stuffs, #Docker, and some cool stuffs like #OpenMediaVault and #HomeAssistant. I'm taking my sweet time to update them 'properly' and hope it'll all link/piece together sensibly in the end.

This is partially thanks to me embracing the fact that I just don't (yet) have the resources for a standalone 'mega' homelab (#Proxmox & #Kubernetes based) server cluster that I could simply throw everything to it, hence supplementing that setup with tinier SBC-based servers. Gives me a bit of peace of mind too that things are now more 'spread out'.

The most interesting bit will probably be when I manage to explore replicating a mini version of my #RKE2 Kubernetes cluster, on a single (or at most, two) Raspberry Pi node - maybe based on #k3s, assuming that's better. I'm just not there yet cos I'm kinda reluctant if using something like #k8s on RPi makes much sense since I'm expecting a lot of resources will be wasted that way, when hosting on Docker alone (i.e. on #Portainer) should be leaner.

🔗 Anyway, if y'all wanna keep an eye on it: https://github.com/irfanhakim-as/homelab-wiki

#homelab #raspberrypi #docker #openmediavault #homeassistant #proxmox

Mika @[email protected] · 2025-04-15 · 05:08 UTC

I have finally caved in and dove into the rabbit hole of #Linux Container (#LXC) on #Proxmox during my exploration on how to split a GPU across multiple servers and... I totally understand now seeing people's Proxmox setups that are made up exclusively of LXCs rather than VMs lol - it's just so pleasant to setup and use, and superficially at least, very efficient.

I now have a #Jellyfin and #ErsatzTV setup running on LXCs with working iGPU passthrough of my server's #AMD Ryzen 5600G APU. My #Intel #ArcA380 GPU has also arrived, but I'm prolly gonna hold off on adding that until I decide on which node should I add it to and schedule the shutdown, etc. In the future, I might even consider exploring (re)building a #Kubernetes, #RKE2 cluster on LXC nodes instead of VMs - and if that's viable or perhaps better.

Anyway, I've updated my #Homelab Wiki with guides pertaining LXCs, including creating one, passing through a GPU to multiple unprivileged LXCs, and adding an #SMB share for the entire cluster and mounting them, also, on unprivileged LXC containers.

🔗 https://github.com/irfanhakim-as/homelab-wiki/blob/master/topics/proxmox.md#linux-containers-lxc

#lxc #proxmox #jellyfin #ersatztv #amd #intel

Mika @[email protected] · 2025-04-15 · 05:08 UTC

I have finally caved in and dove into the rabbit hole of #Linux Container (#LXC) on #Proxmox during my exploration on how to split a GPU across multiple servers and... I totally understand now seeing people's Proxmox setups that are made up exclusively of LXCs rather than VMs lol - it's just so pleasant to setup and use, and superficially at least, very efficient.

I now have a #Jellyfin and #ErsatzTV setup running on LXCs with working iGPU passthrough of my server's #AMD Ryzen 5600G APU. My #Intel #ArcA380 GPU has also arrived, but I'm prolly gonna hold off on adding that until I decide on which node should I add it to and schedule the shutdown, etc. In the future, I might even consider exploring (re)building a #Kubernetes, #RKE2 cluster on LXC nodes instead of VMs - and if that's viable or perhaps better.

Anyway, I've updated my #Homelab Wiki with guides pertaining LXCs, including creating one, passing through a GPU to multiple unprivileged LXCs, and adding an #SMB share for the entire cluster and mounting them, also, on unprivileged LXC containers.

🔗 https://github.com/irfanhakim-as/homelab-wiki/blob/master/topics/proxmox.md#linux-containers-lxc

#linux #lxc #proxmox #jellyfin #ersatztv #amd

Johannes Kastl @[email protected] · 2025-03-27 · 07:18 UTC

Package updates for #rke2 including fixes for the #nginx #ingress #vulnerability are on their way to @opensuse #Tumbleweed. This means rke2 as well as the flavors for Kubernetes 1.31, 1.30 and 1.29.

#rke2 #nginx #ingress #vulnerability #tumbleweed

Johannes Kastl @[email protected] · 2025-03-27 · 07:18 UTC

Package updates for #rke2 including fixes for the #nginx #ingress #vulnerability are on their way to @opensuse #Tumbleweed. This means rke2 as well as the flavors for Kubernetes 1.31, 1.30 and 1.29.

#rke2 #nginx #ingress #vulnerability #tumbleweed

OOTS @[email protected] · 2025-03-18 · 10:03 UTC

@andreasdotorg @redknight
I think at that level it's conceptually easy, you "just" need (wo-)manpower to set up and maintain everything yourself. Assuming you want to set up a new cloud provider from scratch and build one/two/three new DCs in different regions in Europe:
- buy standard "off-the-shelve" server hardware
- at this level you can use US networking equipment (firewalls, routers, switches)
- and then use/self-host all the open-source software you want

E.g.:
- use your favourite #Linux distro (#debian, #ubuntu, #fedora, or whatever)
- set up Netbox or a similar tool (and maybe phpIPAM) + #PostGreSQL Server
- there's probably no way around #OpenStack either way, with #MariaDB and some other open source tools in the background
- you can set up #Prometheus, #Grafana, #OpenSearch for observability

And on top of that offer services as you see fit:
- automate setup/maintenance of #Kubernetes clusters (I heard #RKE2 is a fairly self-contained #K8s distribution)
- automate setup/maintenance of DB servers
- provide a way to run "serverless" apps
- set up #nextcloud or so

#linux #debian #ubuntu #fedora #postgresql #openstack

Mika @[email protected] · 2025-02-04 · 09:48 UTC

#FediHire #GetFediHired 🥳

I'm a #Programmer/#SoftwareEngineer. I'm most fluent in #Python, have some basics in #Java and #C++, but I'm also taking up new languages like #Javascript and others in my eternal journey of getting better and minimising the impostor syndrome that befalls pretty much all programmers (I feel). I'm also very experienced in #CloudNative/#DevOps technologies, and have been the one devising solutions and maintaining infrastructure in a fast-paced startup environment in my previous employment.

I'm passionate in what I do and those that know me here or IRL would know that I'm always yapping about the things I'm learning or working on - I love discussing them, and I love helping people out - esp those on the same boat as me.

This passion has led me into writing and maintaining tons of #FOSS projects like Mango: a content distribution framework based on #Django for #Mastodon and #Bluesky that powers various bots of mine like @[email protected] and @[email protected], Charts: a #Helm chart repository for an easy and reproducible deployment strategy for all my projects and everything else I self-host on my #homelab, and Orked: O-tomated #RKE2 distribution, a collection of scripts I wrote that are comprehensively documented to enable everyone to self-host a production-grade #Kubernetes cluster for absolutely free in their homes.

I'm based in Malaysia, but I'm open to just about any on-site, hybrid, or remote job opportunities anywhere. In the meantime though, I'm actively looking for a job in countries like #Japan and #Singapore, in a bid for a desperate lifestyle change. I've linked below my Portfolio (which you too, could self-host your own!), for those who'd wish to connect/learn more of me. Thank you ❤️

🔗 https://l.irfanhak.im/resume

#fedihire #programmer #softwareengineer #python #java #c

Mika @[email protected] · 2025-02-04 · 09:48 UTC

#FediHire #GetFediHired 🥳

I'm a #Programmer/#SoftwareEngineer. I'm most fluent in #Python, have some basics in #Java and #C++, but I'm also taking up new languages like #Javascript and others in my eternal journey of getting better and minimising the impostor syndrome that befalls pretty much all programmers (I feel). I'm also very experienced in #CloudNative/#DevOps technologies, and have been the one devising solutions and maintaining infrastructure in a fast-paced startup environment in my previous employment.

I'm passionate in what I do and those that know me here or IRL would know that I'm always yapping about the things I'm learning or working on - I love discussing them, and I love helping people out - esp those on the same boat as me.

This passion has led me into writing and maintaining tons of #FOSS projects like Mango: a content distribution framework based on #Django for #Mastodon and #Bluesky that powers various bots of mine like @[email protected] and @[email protected], Charts: a #Helm chart repository for an easy and reproducible deployment strategy for all my projects and everything else I self-host on my #homelab, and Orked: O-tomated #RKE2 distribution, a collection of scripts I wrote that are comprehensively documented to enable everyone to self-host a production-grade #Kubernetes cluster for absolutely free in their homes.

I'm based in Malaysia, but I'm open to just about any on-site, hybrid, or remote job opportunities anywhere. In the meantime though, I'm actively looking for a job in countries like #Japan and #Singapore, in a bid for a desperate lifestyle change. I've linked below my Portfolio (which you too, could self-host your own!), for those who'd wish to connect/learn more of me. Thank you ❤️

🔗 https://l.irfanhak.im/resume

#fedihire #programmer #softwareengineer #python #java #c

Mika @[email protected] · 2025-01-07 · 07:30 UTC

#Rancher/#RKE2 #Kubernetes cluster question - I don't need Rancher, but in the past with my RKE2 clusters, I normally deploy Rancher on a single VM using #Docker just for the sake of having some sort of UI for my cluster(s) if need be - with this setup, I'm relying on importing the downstream (RKE 2) cluster(s) onto said Rancher deployment. That worked well.

This time round though, I tried deploying Rancher on the cluster itself, instead of an external VM, using #Helm. Rancher's pretty beefy and heavy to deploy even with a single replica, and from my limited testing I found that it's easier to deploy when your cluster's pretty new and not have much resources running just yet.

What I'm curious about tho are these errors - my cluster's fine, and I'm not seeing anything wrong with it, but ever since deploying it a few days ago, I'm constantly seeing these Liveness/Readiness probe failed error on all 3 of my Master nodes (periodically most of the time, not all at once) - the same error also seems to include etcd failed: reason withheld. What does it mean, and how do I "address" it?

#rancher #rke2 #kubernetes #docker #helm

Mika @[email protected] · 2025-01-07 · 07:30 UTC

#Rancher/#RKE2 #Kubernetes cluster question - I don't need Rancher, but in the past with my RKE2 clusters, I normally deploy Rancher on a single VM using #Docker just for the sake of having some sort of UI for my cluster(s) if need be - with this setup, I'm relying on importing the downstream (RKE 2) cluster(s) onto said Rancher deployment. That worked well.

This time round though, I tried deploying Rancher on the cluster itself, instead of an external VM, using #Helm. Rancher's pretty beefy and heavy to deploy even with a single replica, and from my limited testing I found that it's easier to deploy when your cluster's pretty new and not have much resources running just yet.

What I'm curious about tho are these errors - my cluster's fine, and I'm not seeing anything wrong with it, but ever since deploying it a few days ago, I'm constantly seeing these Liveness/Readiness probe failed error on all 3 of my Master nodes (periodically most of the time, not all at once) - the same error also seems to include etcd failed: reason withheld. What does it mean, and how do I "address" it?

#rancher #rke2 #kubernetes #docker #helm

Mika @[email protected] · 2025-01-03 · 15:21 UTC

OK, finally got the node back up again. I decided to just get a brand new #BeQuiet SFX PSU, to replace the old #SpeedCruiser Flex PSU and rebuild the node with otherwise the same hardware, into a spare #Silverstone SG13 case I have and it all works.

Also discovered that the reason why HA wasn't working and none of my VMs I've set replication for didn't carry over to the healthy node was because I had forgotten to actually create a HA group on Proxmox and set the VMs to HA so... did that.

Now I'm freaking wrestling with my #RKE2 #Kubernetes cluster to get it back up and running again, cos atm the cluster is littered with pods that are either Pending, Unknown or in a crash loop... which is always fucking fun. Also the cluster itself is kind of slow to respond (on #k9s)... which is concerning but I think has to do prolly with how its networking is setup.

I'm still completely clueless honestly on what the "ideal" networking setup is for both Proxmox, and a Kubernetes cluster hosted on Proxmox. I'm still stuck with the defaults, for now, that is using the onboard nic on each Proxmox node for every single thing. The only customisation I've done was setting a bandwidth limit (on Proxmox) only for migration. #Homelab folks please feel free sending some suggestions my way, that is as dummy-friendly as possible :))

RE: https://sakurajima.social/notes/a2e9fm36yg

#bequiet #speedcruiser #silverstone #rke2 #kubernetes #k9s

Mika @[email protected] · 2025-01-03 · 15:21 UTC

OK, finally got the node back up again. I decided to just get a brand new #BeQuiet SFX PSU, to replace the old #SpeedCruiser Flex PSU and rebuild the node with otherwise the same hardware, into a spare #Silverstone SG13 case I have and it all works.

Also discovered that the reason why HA wasn't working and none of my VMs I've set replication for didn't carry over to the healthy node was because I had forgotten to actually create a HA group on Proxmox and set the VMs to HA so... did that.

Now I'm freaking wrestling with my #RKE2 #Kubernetes cluster to get it back up and running again, cos atm the cluster is littered with pods that are either Pending, Unknown or in a crash loop... which is always fucking fun. Also the cluster itself is kind of slow to respond (on #k9s)... which is concerning but I think has to do prolly with how its networking is setup.

I'm still completely clueless honestly on what the "ideal" networking setup is for both Proxmox, and a Kubernetes cluster hosted on Proxmox. I'm still stuck with the defaults, for now, that is using the onboard nic on each Proxmox node for every single thing. The only customisation I've done was setting a bandwidth limit (on Proxmox) only for migration. #Homelab folks please feel free sending some suggestions my way, that is as dummy-friendly as possible :))

RE: https://sakurajima.social/notes/a2e9fm36yg

#bequiet #speedcruiser #silverstone #rke2 #kubernetes #k9s

Mika @[email protected] · 2024-12-30 · 09:18 UTC

I'm wondering right now what to do - I'm outside rn and so fucking eager (and extremely worried) to inspect and see just what caused it, when I'm home. I did notice the power plug (attached to the wall/extension cord) wasn't seated tight.. could that have been the cause?

It's a #B550 ITX board with a 500W Flex PSU - it doesn't have any GPU, just an #AMD Ryzen 5 5600G APU. Besides that, just 2 sticks of DDR4 RAM (64GB) and 2x 1TB NVME SSDs. It all depends on what I'll find later when I inspect it tho, but I'm wondering if I should continue having the node in my #homelab with replaced PSU (found a 600W 80 Platinum rated Flex PSU by FSP), or ditch it completely and replace it with an MATX build instead - #AsRock B550M Pro4 mATX board, same APU (or prolly, replace with my spare Ryzen 7 3700X), and brand new #CoolerMaster or #Corsair ATX PSU (80 Gold rated prolly, cos ATX PSUs are somehow more expensive than Flex ones?).

The latter route is def more expensive, but idk if running a #Proxmox node with an #RKE2 #Kubernetes cluster 24/7 in a mini ITX setup is the most brilliant idea... though that node was running perfectly fine for a pretty long time now (~1 year or so probably).

#b550 #amd #homelab #asrock #coolermaster #corsair

Mika @[email protected] · 2024-12-30 · 09:18 UTC

I'm wondering right now what to do - I'm outside rn and so fucking eager (and extremely worried) to inspect and see just what caused it, when I'm home. I did notice the power plug (attached to the wall/extension cord) wasn't seated tight.. could that have been the cause?

It's a #B550 ITX board with a 500W Flex PSU - it doesn't have any GPU, just an #AMD Ryzen 5 5600G APU. Besides that, just 2 sticks of DDR4 RAM (64GB) and 2x 1TB NVME SSDs. It all depends on what I'll find later when I inspect it tho, but I'm wondering if I should continue having the node in my #homelab with replaced PSU (found a 600W 80 Platinum rated Flex PSU by FSP), or ditch it completely and replace it with an MATX build instead - #AsRock B550M Pro4 mATX board, same APU (or prolly, replace with my spare Ryzen 7 3700X), and brand new #CoolerMaster or #Corsair ATX PSU (80 Gold rated prolly, cos ATX PSUs are somehow more expensive than Flex ones?).

The latter route is def more expensive, but idk if running a #Proxmox node with an #RKE2 #Kubernetes cluster 24/7 in a mini ITX setup is the most brilliant idea... though that node was running perfectly fine for a pretty long time now (~1 year or so probably).

#b550 #amd #homelab #asrock #coolermaster #corsair

Mika @[email protected] · 2024-12-24 · 11:09 UTC

Lol, tried draining a #Kubernetes #RKE2 node with #Longhorn - first, I enabled the allow-node-drain-with-last-healthy-replica setting. That still didn't help with the PDB issue, so I manually deleted the PDBs (which helped in a different test, on a different cluster) - that resulted to the pod being evicted, but... it would just hang forever with no output whatsoever.

I tried restarting the drain process, but that did nothing, it'll just be stuck at evicting something again with no output. When I checked available volumes, I'd see several (on the draining node) appearing and disappearing, and changing status from detached to attaching, etc. I'm so freaking done with this whole shenanigan at this point that I just decide to freaking shutdown the whole goddamn node lol.

Maybe the "safe" way to stop a cluster from this point onwards is to just nuke the damn thing :blobfoxcat:

#kubernetes #rke2 #longhorn

Mika @[email protected] · 2024-12-24 · 11:09 UTC

Lol, tried draining a #Kubernetes #RKE2 node with #Longhorn - first, I enabled the allow-node-drain-with-last-healthy-replica setting. That still didn't help with the PDB issue, so I manually deleted the PDBs (which helped in a different test, on a different cluster) - that resulted to the pod being evicted, but... it would just hang forever with no output whatsoever.

I tried restarting the drain process, but that did nothing, it'll just be stuck at evicting something again with no output. When I checked available volumes, I'd see several (on the draining node) appearing and disappearing, and changing status from detached to attaching, etc. I'm so freaking done with this whole shenanigan at this point that I just decide to freaking shutdown the whole goddamn node lol.

Maybe the "safe" way to stop a cluster from this point onwards is to just nuke the damn thing :blobfoxcat:

#kubernetes #rke2 #longhorn

Mika @[email protected] · 2024-12-24 · 08:16 UTC

Does anyone know what I'm missing with removing a node from a #Kubernetes cluster (set up with #RKE2 and #Longhorn)?

I've cordoned the node, drained the node, "killed" the node with the rke2-killall.sh script, uncordoned the node (I wouldn't have in a node removal case, but my script does that generally and thought it doesn't matter), and then uninstalled RKE2 on the (removing) node with the rke2-uninstall.sh script.

If I check the nodes' status on said cluster now, the worker nodes I've "removed" are still there, just in a NotReady state. What's the proper, missing piece here?

NAME             STATUS     ROLES                       AGE   VERSION
orked-master-1   Ready      control-plane,etcd,master   46h   v1.25.15+rke2r2
orked-worker-1   Ready      worker                      46h   v1.25.15+rke2r2
orked-worker-2   NotReady   worker                      23h   v1.25.15+rke2r2
orked-worker-3   NotReady   worker                      23h   v1.25.15+rke2r2

Is it as simple as just, deleting the node with kubectl now?

kubectl delete node <node>

#kubernetes #rke2 #longhorn

Mika @[email protected] · 2024-12-24 · 08:16 UTC

Does anyone know what I'm missing with removing a node from a #Kubernetes cluster (set up with #RKE2 and #Longhorn)?

I've cordoned the node, drained the node, "killed" the node with the rke2-killall.sh script, uncordoned the node (I wouldn't have in a node removal case, but my script does that generally and thought it doesn't matter), and then uninstalled RKE2 on the (removing) node with the rke2-uninstall.sh script.

If I check the nodes' status on said cluster now, the worker nodes I've "removed" are still there, just in a NotReady state. What's the proper, missing piece here?

NAME             STATUS     ROLES                       AGE   VERSION
orked-master-1   Ready      control-plane,etcd,master   46h   v1.25.15+rke2r2
orked-worker-1   Ready      worker                      46h   v1.25.15+rke2r2
orked-worker-2   NotReady   worker                      23h   v1.25.15+rke2r2
orked-worker-3   NotReady   worker                      23h   v1.25.15+rke2r2

Is it as simple as just, deleting the node with kubectl now?

kubectl delete node <node>

#kubernetes #rke2 #longhorn

Mika @[email protected] · 2024-12-23 · 09:28 UTC

How do you update #Longhorn's Node Drain Policy on a #Kubernetes/#RKE2 cluster? I think you could do it on the UI, but in this test cluster I'm experimenting with, I did not install #Rancher or "attach" this cluster to one so I don't have access to the UI.

I'm trying to update said policy to allow-if-replica-is-stopped, and see if that would solve the errors I'm getting draining nodes in my cluster: Cannot evict pod as it would violate the pod's disruption budget.

Update: nvm got it https://longhorn.io/docs/1.7.2/advanced-resources/deploy/customizing-default-settings/#using-kubectl

Didn't solve my error though.

#longhorn #kubernetes #rke2 #rancher

Mika @[email protected] · 2024-12-23 · 09:28 UTC

How do you update #Longhorn's Node Drain Policy on a #Kubernetes/#RKE2 cluster? I think you could do it on the UI, but in this test cluster I'm experimenting with, I did not install #Rancher or "attach" this cluster to one so I don't have access to the UI.

I'm trying to update said policy to allow-if-replica-is-stopped, and see if that would solve the errors I'm getting draining nodes in my cluster: Cannot evict pod as it would violate the pod's disruption budget.

Update: nvm got it https://longhorn.io/docs/1.7.2/advanced-resources/deploy/customizing-default-settings/#using-kubectl

Didn't solve my error though.

#longhorn #kubernetes #rke2 #rancher

Mika @[email protected] · 2024-12-20 · 14:01 UTC

How does one properly, and safely shutdown an #RKE2 #Kubernetes cluster (and monitor + ensure that is truly the case)? I'm surprised it doesn't seem to be discussed in the official RKE2 docs at all, and the very few discussions I've seen of it on #GitHub kinda return various answers I'm not quite confident/certain of.

I'm trying to fully and properly tackle this so I can incorporate it in my RKE2 management tool so even fools like me can be "proper" about it rather than... idk... shutting the VMs down and hope for the best each time :blobfoxcat:

🔗 https://github.com/irfanhakim-as/orked/issues/25

#rke2 #kubernetes #github

Mika @[email protected] · 2024-12-20 · 14:01 UTC

How does one properly, and safely shutdown an #RKE2 #Kubernetes cluster (and monitor + ensure that is truly the case)? I'm surprised it doesn't seem to be discussed in the official RKE2 docs at all, and the very few discussions I've seen of it on #GitHub kinda return various answers I'm not quite confident/certain of.

I'm trying to fully and properly tackle this so I can incorporate it in my RKE2 management tool so even fools like me can be "proper" about it rather than... idk... shutting the VMs down and hope for the best each time :blobfoxcat:

🔗 https://github.com/irfanhakim-as/orked/issues/25

#rke2 #kubernetes #github

Mika @[email protected] · 2024-12-18 · 08:02 UTC

I've successfully migrated my #ESXi #homelab server over to #Proxmox after surprisingly a little bit of (unexpected) trouble - haven't really even moved all of my old services or #Kubernetes cluster back into it, but I'd say the most challenging part I was expecting which is #TrueNAS has not only been migrated, but also upgraded from TrueNAS Core 12 to TrueNAS Scale 24.10 (HUGE jump, I know).

Now then. I'm thinking what's the best way to move forward with this, now that I have 2 separate nodes running Proxmox. There are multiple things to consider. I suppose I could cluster 'em up, so I can manage both of them under one roof but from what I can tell, clustering on Proxmox works the same way as you would with Kubernetes clusters like #RKE2 or #K3s whereby you'd want at least 3 nodes, if not just 1. I can build another server, I have the hardware parts for it, but I don't think I'd want to take up more space I already do and have 3 PCs running 24/7.

I'm also thinking of possibly joining my 2 RKE 2 Clusters (1 on each node) into 1... but I'm not sure how I'd go about it having only 2 physical nodes. Atm, each cluster has 1 Master node and 3 Worker nodes (VMs ofc). Having only 2 physical nodes, I'm not sure how I'd spread the number of master/worker nodes across the 2. Maintaining only 1 (joined) cluster would be helpful though, since it'd solve my current issue of not being able to use one of them to publish services online using #Ingress "effectively", since I could only port forward the standard HTTP/S ports to only a single endpoint (which means the secondary cluster will use a non-standard port instead i.e. 8443).

This turned out pretty long - but yea... any ideas what'd be the "best" way of moving forward if I only plan to retain 2 Proxmox nodes - Proxmox wise, and perhaps even Kubernetes wise?

#esxi #homelab #proxmox #kubernetes #truenas #rke2

Mika @[email protected] · 2024-12-18 · 08:02 UTC

I've successfully migrated my #ESXi #homelab server over to #Proxmox after surprisingly a little bit of (unexpected) trouble - haven't really even moved all of my old services or #Kubernetes cluster back into it, but I'd say the most challenging part I was expecting which is #TrueNAS has not only been migrated, but also upgraded from TrueNAS Core 12 to TrueNAS Scale 24.10 (HUGE jump, I know).

Now then. I'm thinking what's the best way to move forward with this, now that I have 2 separate nodes running Proxmox. There are multiple things to consider. I suppose I could cluster 'em up, so I can manage both of them under one roof but from what I can tell, clustering on Proxmox works the same way as you would with Kubernetes clusters like #RKE2 or #K3s whereby you'd want at least 3 nodes, if not just 1. I can build another server, I have the hardware parts for it, but I don't think I'd want to take up more space I already do and have 3 PCs running 24/7.

I'm also thinking of possibly joining my 2 RKE 2 Clusters (1 on each node) into 1... but I'm not sure how I'd go about it having only 2 physical nodes. Atm, each cluster has 1 Master node and 3 Worker nodes (VMs ofc). Having only 2 physical nodes, I'm not sure how I'd spread the number of master/worker nodes across the 2. Maintaining only 1 (joined) cluster would be helpful though, since it'd solve my current issue of not being able to use one of them to publish services online using #Ingress "effectively", since I could only port forward the standard HTTP/S ports to only a single endpoint (which means the secondary cluster will use a non-standard port instead i.e. 8443).

This turned out pretty long - but yea... any ideas what'd be the "best" way of moving forward if I only plan to retain 2 Proxmox nodes - Proxmox wise, and perhaps even Kubernetes wise?

#esxi #homelab #proxmox #kubernetes #truenas #rke2

Mika @[email protected] · 2024-10-26 · 07:38 UTC

For some reason, I feel like the #RKE2 cluster on my #Proxmox node is more fragile than the cluster on my #ESXi node. Like, on the latter, I can simply shutdown and boot the nodes however I want just like that and everything seems to just get in a working state on tis own. On the former, for some reason, things seem to boot in a non-running start with various status like Unknown, CrashLoopBackOff, etc. - some gets solved by me deleting/restarting the pods, some though will require me to run the killall script and reboot the entire node. Pretty weird, when both clusters were deployed/configured the exact same way and runs the exact same version.

#rke2 #proxmox #esxi

Mika @[email protected] · 2024-10-26 · 07:38 UTC

For some reason, I feel like the #RKE2 cluster on my #Proxmox node is more fragile than the cluster on my #ESXi node. Like, on the latter, I can simply shutdown and boot the nodes however I want just like that and everything seems to just get in a working state on tis own. On the former, for some reason, things seem to boot in a non-running start with various status like Unknown, CrashLoopBackOff, etc. - some gets solved by me deleting/restarting the pods, some though will require me to run the killall script and reboot the entire node. Pretty weird, when both clusters were deployed/configured the exact same way and runs the exact same version.

#rke2 #proxmox #esxi

Mika @[email protected] · 2024-10-22 · 01:58 UTC

Twice now my secondary #RKE2 cluster running on my #Proxmox node is giving me a bunch of useless errors, mostly seems to do with #Longhorn, that are preventing my services from running 😴

Maybe it has to do with one of my SSD which shows that it had passed the #SMART test and that it's "healthy", on Proxmox, yet shows a Media and Data Integrity Errors value of 609 which I assume is definitely concerning.

#rke2 #proxmox #longhorn #smart

Mika @[email protected] · 2024-10-22 · 01:58 UTC

Twice now my secondary #RKE2 cluster running on my #Proxmox node is giving me a bunch of useless errors, mostly seems to do with #Longhorn, that are preventing my services from running 😴

Maybe it has to do with one of my SSD which shows that it had passed the #SMART test and that it's "healthy", on Proxmox, yet shows a Media and Data Integrity Errors value of 609 which I assume is definitely concerning.

#rke2 #proxmox #longhorn #smart

Mika @[email protected] · 2024-09-30 · 13:19 UTC

One of my #RKE2 #Kubernetes worker nodes seems to be having a networking issue of some sort I’ve not seen before in any of my clusters. I can ssh to it, it can access the internet, but yea a bunch of pods, all from that node, are either stuck at terminating for wtv reason while some of them are in a crash loop. The other worker nodes work fine though and they should all be identical to each other.

#rke2 #kubernetes