Skip to content

Automatización de Upgrades en RKE2

Objetivo

Implementar upgrades automatizados y controlados de un clúster RKE2 usando System Upgrade Controller, con planes separados para server nodes y agent nodes, drain configurado y políticas de concurrencia seguras.

Prerequisitos

  • Clúster RKE2 multi-nodo activo
  • kubectl con acceso admin
  • Workloads con al menos 2 réplicas para tolerar el drain sin downtime

Contexto

El System Upgrade Controller (SUC) de Rancher administra upgrades de forma declarativa usando recursos CRD de tipo Plan. El orden de actualización importa: los server nodes siempre deben actualizarse antes que los agent nodes. Si se invierten, el kubelet de los agents puede quedar en una versión superior a la del API server y presentar comportamientos inesperados.

Agenda

  1. Instalación del System Upgrade Controller
  2. Estructura del recurso Plan
  3. Plan para server nodes con concurrency: 1
  4. Plan para agent nodes con dependencia del plan de servers
  5. Monitorear el progreso
  6. Pausar upgrades en ventanas de mantenimiento bloqueadas
  7. Diferencia entre version y channel

Laboratorio

1. Instalar System Upgrade Controller

kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/system-upgrade-controller.yaml
kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/crd.yaml

kubectl get pods -n system-upgrade

2. Plan para server nodes

# plan-server.yaml
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: rke2-server
  namespace: system-upgrade
spec:
  version: v1.31.6+rke2r1
  nodeSelector:
    matchExpressions:
    - key: node-role.kubernetes.io/control-plane
      operator: In
      values: ["true"]
  concurrency: 1
  drain:
    force: false
    deleteEmptydirData: true
    skipWaitForDeleteTimeout: 60
  serviceAccountName: system-upgrade
  upgrade:
    image: rancher/rke2-upgrade

3. Plan para agent nodes

# plan-agent.yaml
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: rke2-agent
  namespace: system-upgrade
spec:
  version: v1.31.6+rke2r1
  nodeSelector:
    matchExpressions:
    - key: node-role.kubernetes.io/control-plane
      operator: NotIn
      values: ["true"]
  concurrency: 2
  prepare:
    image: rancher/rke2-upgrade
    args: ["prepare", "rke2-server"]   # espera a que el plan de servers termine
  drain:
    force: false
    deleteEmptydirData: true
    skipWaitForDeleteTimeout: 60
  serviceAccountName: system-upgrade
  upgrade:
    image: rancher/rke2-upgrade

4. Aplicar y monitorear

kubectl apply -f plan-server.yaml
kubectl apply -f plan-agent.yaml

# Monitorear progreso
kubectl get plans -n system-upgrade
kubectl get jobs -n system-upgrade
kubectl describe plan rke2-server -n system-upgrade

# Ver logs del controller
kubectl logs -n system-upgrade \
  $(kubectl get pods -n system-upgrade -o name | head -1) -f

5. Pausar upgrades

# Agregar label para excluir un nodo del upgrade
kubectl label node worker-01 plan.upgrade.cattle.io/rke2-agent=pause

Referencias