Most Kubernetes users would have encountered OOMKilled at least once. When OOMKilled occurs, we tend to recalibrate the pod’s QoS or move the pod to a different node thinking there is a memory issue with the node. In this Kubernetes Tip, we will dig deep into an interesting aspect of OOMKilled that would help us to configure Pod QoS better.
I originally thought OOMKilled was something part of Kubernetes but realized it actually more related to the Linux Kernal process called OOM Killer. This process continuously monitors the node memory to determine memory exhaustion. If OOM Killer detects such exhaustion, will choose to kill the best process(es). The best processes are chosen by keeping the following in mind.
Kill least number of processes to minimize the damage in terms of stability & importance of the system.Killing those processes should fetch maximum freed memory for the node.
To facilitate this, the kernel maintains oom_score for each process. The higher the value of oom_score the bigger the chances of that process getting killed by OOM Killer. The kernel also provides flexibility for the user process to adjust oom_score using oom_score_adj value.
Kubernetes takes advantage of oom_score_adj by configuring different values for the different quality of service. The value configured are as below
Source: Kubernetes Documentation.
It can be inferred that BestEffort Pods gets booted out first while Guaranteed Pods gets booted out last when there is memory exhaustion on the node.
Figure-1 considers an example where two pods are running one is configured with QoS Guaranteed while the other one is configured with QoS BestEffort.
Figure-1: Two Pods Having Different QoS Configured.
Let’s look at the oom_score & oom_score_adj by getting into the pod. Figure-2 captures the steps. The BestEffort pod has a high oom_score while the Guaranteed pod has a very low oom_score. So, the BestEffort pod gets killed by the OOM Killer process when memory exhaustion occurs.
Figure-2: Both Pods Have Different Scores Depending On QoS Configured.
It should be noted that the pod that gets killed does not get evicted but restarted by the node’s kubelet if the restart policy is set to Always.