https://drive.google.com/file/d/16PZAh-epESLHXVZ7VUU0rywuNOes5S-l/view?usp=sharing
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
meta
name: nginx-deployment-hpa
spec:
behavior:
scaleDown:
stabilizationWindowSeconds: 30 # โ Wait 30s before scaling down
policies:
- type: Pods
value: 1 # โ Remove max 1 Pod every 30s
periodSeconds: 30
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-deployment
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 10 # โ Scale when CPU > 10% of request
๐ Key Insight:
The
behaviorsection lets you control the speed and stability of scaling operations.
scaleDown Configuration| Field | Meaning | Your Value | Effect |
|---|---|---|---|
stabilizationWindowSeconds: 30 |
Wait before scaling down | 30s |
Prevents flapping during brief load drops |
policies[0].type: Pods |
Scale by absolute Pod count | Pods |
Remove 1 Pod per interval |
value: 1 |
Max Pods to remove | 1 |
Slow, controlled scale-down |
periodSeconds: 30 |
Interval between scale actions | 30s |
One Pod removed every 30 seconds |
๐ฏ Scale-Down Flow:
- CPU drops below 10% โ HPA waits 30s (
stabilizationWindow)- Then removes 1 Pod
- Waits 30s โ removes another (if still underutilized)
- Until
minReplicas: 3is reached
scaleUp?๐ก Your YAML only defines scaleDown โ scaleUp uses default behavior:
- No stabilization window (scales immediately)
- Unlimited rate (can jump from 3 โ 10 instantly)
โ To control scale-up, add:
behavior: scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 # โ Double replicas every 60s periodSeconds: 60 scaleDown: ...
# Save as deployment-hpa-with-policies.yaml
kubectl apply -f deployment-hpa-with-policies.yaml
# Verify
kubectl get hpa nginx-deployment-hpa
kubectl get pods -l app=nginx
# Start aggressive load (CPU > 10% of 80m = 8m)
kubectl run loader --image=busybox --restart=Never -it -- sh
# Inside shell:
while true; do wget -q -O- <http://nginx-deployment-svc>; done
๐ Watch scale-up (should be fast โ no policies defined):
kubectl get hpa -w # Replicas jump from 3 โ 10 quickly