Automatización de Upgrades en RKE2
Objetivo
Implementar upgrades automatizados y controlados de un clúster RKE2 usando System Upgrade Controller, con planes separados para server nodes y agent nodes, drain configurado y políticas de concurrencia seguras.
Prerequisitos
- Clúster RKE2 multi-nodo activo
kubectlcon acceso admin- Workloads con al menos 2 réplicas para tolerar el drain sin downtime
Contexto
El System Upgrade Controller (SUC) de Rancher administra upgrades de forma declarativa
usando recursos CRD de tipo Plan. El orden de actualización importa:
los server nodes siempre deben actualizarse antes que los agent nodes. Si se
invierten, el kubelet de los agents puede quedar en una versión superior a la del
API server y presentar comportamientos inesperados.
Agenda
- Instalación del System Upgrade Controller
- Estructura del recurso
Plan - Plan para server nodes con
concurrency: 1 - Plan para agent nodes con dependencia del plan de servers
- Monitorear el progreso
- Pausar upgrades en ventanas de mantenimiento bloqueadas
- Diferencia entre
versionychannel
Laboratorio
1. Instalar System Upgrade Controller
kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/system-upgrade-controller.yaml
kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/crd.yaml
kubectl get pods -n system-upgrade
2. Plan para server nodes
# plan-server.yaml
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
name: rke2-server
namespace: system-upgrade
spec:
version: v1.31.6+rke2r1
nodeSelector:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: In
values: ["true"]
concurrency: 1
drain:
force: false
deleteEmptydirData: true
skipWaitForDeleteTimeout: 60
serviceAccountName: system-upgrade
upgrade:
image: rancher/rke2-upgrade
3. Plan para agent nodes
# plan-agent.yaml
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
name: rke2-agent
namespace: system-upgrade
spec:
version: v1.31.6+rke2r1
nodeSelector:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: NotIn
values: ["true"]
concurrency: 2
prepare:
image: rancher/rke2-upgrade
args: ["prepare", "rke2-server"] # espera a que el plan de servers termine
drain:
force: false
deleteEmptydirData: true
skipWaitForDeleteTimeout: 60
serviceAccountName: system-upgrade
upgrade:
image: rancher/rke2-upgrade
4. Aplicar y monitorear
kubectl apply -f plan-server.yaml
kubectl apply -f plan-agent.yaml
# Monitorear progreso
kubectl get plans -n system-upgrade
kubectl get jobs -n system-upgrade
kubectl describe plan rke2-server -n system-upgrade
# Ver logs del controller
kubectl logs -n system-upgrade \
$(kubectl get pods -n system-upgrade -o name | head -1) -f
5. Pausar upgrades
# Agregar label para excluir un nodo del upgrade
kubectl label node worker-01 plan.upgrade.cattle.io/rke2-agent=pause