Isolating Log Components on Kubernetes Infra Nodes

This guide explains how to isolate logging-related infrastructure components on dedicated Kubernetes infra nodes using labels, taints, and node selectors.

TOC

Objectives

  • Isolate resources: Prevent contention with business workloads.
  • Enforce stability: Reduce evictions and scheduling conflicts.
  • Simplify management: Centralize infra components with consistent scheduling rules.

Prerequisites

  1. kubectl is configured against the target cluster.
  2. Infra components are not bound to nodes via local-PV nodeAffinity, or you have accounted for those nodes (see below).
  3. Planning the infra nodes by referring to the

Check the Local PVs and nodeAffinity

If your components use local storage (for example TopoLVM, local PV), confirm whether PVs have spec.nodeAffinity. If so, either:

  1. Add all nodes referenced by pv.spec.nodeAffinity to the infra node group, or
  2. Redeploy components using a storage class without node affinity (for example Ceph/RBD).

Example (Elasticsearch):

# 1) Get ES PVCs
kubectl get pvc -n cpaas-system | grep elastic

# 2) Inspect one PV
kubectl get pv elasticsearch-log-node-pv-192.168.135.243 -o yaml

If the PV shows:

spec:
  local:
    path: /cpaas/data/elasticsearch/data
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - 192.168.135.243

Then Elasticsearch data is pinned to node 192.168.135.243. Ensure that node is part of the infra node group, or migrate storage.

Add Kafka/ZooKeeper nodes into infra nodes

Due to historical reasons, ensure Kafka and ZooKeeper nodes are also labeled/tainted as infra:

kubectl get nodes -l kafka=true
kubectl get nodes -l zk=true
# Add the listed nodes into infra nodes as above

Move Logging Components to Infra Nodes

ACP logging components tolerate infra taints by default. Use nodeSelector to pin workloads onto infra nodes.

Elasticsearch

# Data nodes
kubectl patch statefulset cpaas-elasticsearch -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

# Master nodes (if present)
kubectl patch statefulset cpaas-elasticsearch-master -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

# Verify
kubectl get pods -n cpaas-system -o wide | grep cpaas-elasticsearch

Kafka

kubectl patch statefulset cpaas-kafka -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

kubectl get pods -n cpaas-system -o wide | grep cpaas-kafka

ZooKeeper

kubectl patch statefulset cpaas-zookeeper -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

kubectl get pods -n cpaas-system -o wide | grep cpaas-zookeeper

ClickHouse

kubectl patch chi cpaas-clickhouse -n cpaas-system --type='json' -p='[
  {"op":"add","path":"/spec/templates/podTemplates/0/spec/nodeSelector/node-role.kubernetes.io~1infra","value":""},
  {"op":"add","path":"/spec/templates/podTemplates/1/spec/nodeSelector/node-role.kubernetes.io~1infra","value":""}
]'

kubectl get pods -n cpaas-system -o wide | grep clickhous

lanaya

kubectl patch deployment lanaya -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

kubectl get pods -n cpaas-system -o wide | grep lanaya

razor

# If deployed as Deployment (Elasticsearch backend)
kubectl patch deployment razor -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

# If deployed as StatefulSet (ClickHouse backend)
kubectl patch statefulset razor -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

kubectl get pods -n cpaas-system -o wide | grep razor

Any other logging component

# Deployment
kubectl patch deployment <deployment-name> -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

# StatefulSet
kubectl patch statefulset <statefulset-name> -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

kubectl get pods -n cpaas-system -o wide | grep <deployment-name>
kubectl get pods -n cpaas-system -o wide | grep <statefulset-name>

Evict non-infra workloads already on infra nodes

If some non-infra Pods keep running on infra nodes, trigger a reschedule by updating those workloads (for example, change an annotation) or add/selectors to exclude infra nodes.

Troubleshooting

Common issues and fixes:

IssueDiagnosisSolution
Pods stuck in Pendingkubectl describe pod &lt;pod&gt; | grep EventsAdd tolerations or adjust selectors
Taint/toleration mismatchkubectl describe node &lt;node&gt; | grep TaintsAdd matching tolerations to the workloads
Resource starvationkubectl top nodes -l node-role.kubernetes.io/infraScale infra nodes or tune resource requests

Example error:

Events:
  Warning  FailedScheduling  2m  default-scheduler  0/3 nodes are available:
  3 node(s) had untolerated taint {node-role.kubernetes.io/infra: true}

Fix: add matching tolerations to the workload.