Platform | Notion

DaemonSets

When you create a Deployment, you tell Kubernetes "I want 3 replicas" and Kubernetes picks which nodes to put them on. But some things should not run on just 3 nodes — they need to run on every single node, always. That is what DaemonSets are for.

The Problem DaemonSets Solve

Think about log collection. Every node in your cluster is running Pods, and every Pod produces logs. If your log collector only runs on 3 out of 10 nodes, you are missing logs from 7 nodes entirely. You need one log collector sitting on every node, watching everything happening there.

A Deployment cannot guarantee this — it just spreads replicas wherever there is space. A DaemonSet guarantees exactly one Pod per node, no matter what.

Normal Deployment ("I want 3 replicas"):

  Node 1  [ Pod ]
  Node 2  [ Pod ]
  Node 3  [ Pod ]
  Node 4  (empty — Deployment doesn't care)
  Node 5  (empty — Deployment doesn't care)

DaemonSet ("one Pod on EVERY node, always"):

  Node 1  [ Pod ]
  Node 2  [ Pod ]
  Node 3  [ Pod ]
  Node 4  [ Pod ]   <- automatically added
  Node 5  [ Pod ]   <- automatically added

And it stays in sync automatically:

Cluster has 10 nodes  ->  DaemonSet runs 10 Pods
New node joins        ->  DaemonSet adds 1 Pod to it (11 total)
A node is removed     ->  That Pod is cleaned up (10 total)

You never have to manually adjust anything. The DaemonSet tracks the cluster and self-manages.

What DaemonSets Are Used For

DaemonSets are always for infrastructure-level tasks — things that need visibility into a specific node, not just anywhere in the cluster.

Use Case	Tool	Why Every Node
Log collection	Fluentd, Filebeat	Logs are written to each node's disk — collector must be local
Node monitoring	Prometheus Node Exporter	CPU, memory, disk metrics are node-specific
Networking	CNI plugins (Calico, Flannel)	Each node needs its own network agent to route traffic
Storage	CSI drivers	Storage plugins must run where the volumes are mounted
Security	Falco, intrusion detection agents	Threat detection needs to watch every node's syscalls

A good rule of thumb: if the task needs to see what is happening on a specific machine, it belongs in a DaemonSet.

How It Works Internally

When you create a DaemonSet, the Kubernetes control plane watches the list of nodes. For every node that exists and does not already have a Pod from this DaemonSet, it creates one. The DaemonSet controller keeps checking — if a node gets added, it acts. If a node gets removed, it cleans up.

DaemonSet Controller loop (runs constantly):

  Get list of all nodes
  For each node:
    Is there already a DaemonSet Pod here? -> do nothing
    No Pod here?                           -> create one

  Node joins cluster  -> Pod created automatically
  Node leaves cluster -> Pod cleaned up automatically

This is different from a Deployment, which just says "give me N Pods somewhere" and does not think in terms of nodes at all.

Real Example — Log Collector with Fluentd

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: log-collector
  namespace: kube-system       # infrastructure tools usually go here
spec:
  selector:
    matchLabels:
      app: log-collector
  template:
    metadata:
      labels:
        app: log-collector
    spec:
      containers:
      - name: fluentd
        image: fluentd:v1.16
        volumeMounts:
        - name: varlog
          mountPath: /var/log          # read logs from the node's /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      volumes:
      - name: varlog
        hostPath:
          path: /var/log               # mount the actual node's filesystem
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers

The key here is hostPath — the DaemonSet Pod mounts a folder directly from the node's filesystem. This is how it reads logs that other Pods wrote to disk. A regular Deployment Pod cannot do this reliably because it might be on a different node than the logs it needs to read.