Isolating Log Components on Kubernetes Infra Nodes

This guide explains how to isolate logging-related infrastructure components on dedicated Kubernetes infra nodes using labels, taints, and node selectors.

Objectives

Isolate resources: Prevent contention with business workloads.
Enforce stability: Reduce evictions and scheduling conflicts.
Simplify management: Centralize infra components with consistent scheduling rules.

Prerequisites

kubectl is configured against the target cluster.
Infra components are not bound to nodes via local-PV nodeAffinity, or you have accounted for those nodes (see below).
Planning the infra nodes by referring to the

Check the Local PVs and nodeAffinity

If your components use local storage (for example TopoLVM, local PV), confirm whether PVs have spec.nodeAffinity. If so, either:

Add all nodes referenced by pv.spec.nodeAffinity to the infra node group, or
Redeploy components using a storage class without node affinity (for example Ceph/RBD).

Example (Elasticsearch):

# 1) Get ES PVCs
kubectl get pvc -n cpaas-system | grep elastic

# 2) Inspect one PV
kubectl get pv elasticsearch-log-node-pv-192.168.135.243 -o yaml

If the PV shows:

spec:
  local:
    path: /cpaas/data/elasticsearch/data
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - 192.168.135.243

Then Elasticsearch data is pinned to node 192.168.135.243. Ensure that node is part of the infra node group, or migrate storage.

Add Kafka/ZooKeeper nodes into infra nodes

Due to historical reasons, ensure Kafka and ZooKeeper nodes are also labeled/tainted as infra:

kubectl get nodes -l kafka=true
kubectl get nodes -l zk=true
# Add the listed nodes into infra nodes as above

Move Logging Components to Infra Nodes

ACP logging components tolerate infra taints by default. Use nodeSelector to pin workloads onto infra nodes.

Elasticsearch

# Data nodes
kubectl patch statefulset cpaas-elasticsearch -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

# Master nodes (if present)
kubectl patch statefulset cpaas-elasticsearch-master -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

# Verify
kubectl get pods -n cpaas-system -o wide | grep cpaas-elasticsearch

Kafka

kubectl patch statefulset cpaas-kafka -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

kubectl get pods -n cpaas-system -o wide | grep cpaas-kafka

ZooKeeper

kubectl patch statefulset cpaas-zookeeper -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

kubectl get pods -n cpaas-system -o wide | grep cpaas-zookeeper

ClickHouse

kubectl patch chi cpaas-clickhouse -n cpaas-system --type='json' -p='[
  {"op":"add","path":"/spec/templates/podTemplates/0/spec/nodeSelector/node-role.kubernetes.io~1infra","value":""},
  {"op":"add","path":"/spec/templates/podTemplates/1/spec/nodeSelector/node-role.kubernetes.io~1infra","value":""}
]'

kubectl get pods -n cpaas-system -o wide | grep clickhous

lanaya

kubectl patch deployment lanaya -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

kubectl get pods -n cpaas-system -o wide | grep lanaya

razor

# If deployed as Deployment (Elasticsearch backend)
kubectl patch deployment razor -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

# If deployed as StatefulSet (ClickHouse backend)
kubectl patch statefulset razor -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

kubectl get pods -n cpaas-system -o wide | grep razor

Any other logging component

# Deployment
kubectl patch deployment <deployment-name> -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

# StatefulSet
kubectl patch statefulset <statefulset-name> -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'

kubectl get pods -n cpaas-system -o wide | grep <deployment-name>
kubectl get pods -n cpaas-system -o wide | grep <statefulset-name>

Evict non-infra workloads already on infra nodes

If some non-infra Pods keep running on infra nodes, trigger a reschedule by updating those workloads (for example, change an annotation) or add/selectors to exclude infra nodes.

Troubleshooting

Common issues and fixes:

Issue	Diagnosis	Solution
Pods stuck in Pending	`kubectl describe pod <pod> \| grep Events`	Add tolerations or adjust selectors
Taint/toleration mismatch	`kubectl describe node <node> \| grep Taints`	Add matching tolerations to the workloads
Resource starvation	`kubectl top nodes -l node-role.kubernetes.io/infra`	Scale infra nodes or tune resource requests

Example error:

Events:
  Warning  FailedScheduling  2m  default-scheduler  0/3 nodes are available:
  3 node(s) had untolerated taint {node-role.kubernetes.io/infra: true}

Fix: add matching tolerations to the workload.

Event APIs

Log APIs

Isolating Log Components on Kubernetes Infra Nodes

TOC

Objectives

Prerequisites

Check the Local PVs and nodeAffinity

Add Kafka/ZooKeeper nodes into infra nodes

Move Logging Components to Infra Nodes

Elasticsearch

Kafka

ZooKeeper

ClickHouse

lanaya

razor

Any other logging component

Evict non-infra workloads already on infra nodes

Troubleshooting

Event APIs

Log APIs

#Isolating Log Components on Kubernetes Infra Nodes

#TOC

#Objectives

#Prerequisites

#Check the Local PVs and nodeAffinity

#Add Kafka/ZooKeeper nodes into infra nodes

#Move Logging Components to Infra Nodes

#Elasticsearch

#Kafka

#ZooKeeper

#ClickHouse

#lanaya

#razor

#Any other logging component

#Evict non-infra workloads already on infra nodes

#Troubleshooting

Isolating Log Components on Kubernetes Infra Nodes

TOC

Objectives

Prerequisites

Check the Local PVs and nodeAffinity

Add Kafka/ZooKeeper nodes into infra nodes

Move Logging Components to Infra Nodes

Elasticsearch

Kafka

ZooKeeper

ClickHouse

lanaya

razor

Any other logging component

Evict non-infra workloads already on infra nodes

Troubleshooting