6.10: HPA with Scaling Policies (Behavior Control)

https://drive.google.com/file/d/16PZAh-epESLHXVZ7VUU0rywuNOes5S-l/view?usp=sharing

🔍 YAML Breakdown

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
meta
  name: nginx-deployment-hpa
spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 30    # ← Wait 30s before scaling down
      policies:
      - type: Pods
        value: 1                        # ← Remove max 1 Pod every 30s
        periodSeconds: 30
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deployment
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 10          # ← Scale when CPU > 10% of request

🔑 Key Insight:

The behavior section lets you control the speed and stability of scaling operations.

📌 Scaling Policies Explained

1. `scaleDown` Configuration

Field	Meaning	Your Value	Effect
`stabilizationWindowSeconds: 30`	Wait before scaling down	`30s`	Prevents flapping during brief load drops
`policies[0].type: Pods`	Scale by absolute Pod count	`Pods`	Remove 1 Pod per interval
`value: 1`	Max Pods to remove	`1`	Slow, controlled scale-down
`periodSeconds: 30`	Interval between scale actions	`30s`	One Pod removed every 30 seconds

🎯 Scale-Down Flow:

CPU drops below 10% → HPA waits 30s (stabilizationWindow)

Then removes 1 Pod

Waits 30s → removes another (if still underutilized)

Until minReplicas: 3 is reached

2. Missing `scaleUp`?

💡 Your YAML only defines scaleDown → scaleUp uses default behavior:

No stabilization window (scales immediately)

Unlimited rate (can jump from 3 → 10 instantly)

✅ To control scale-up, add:

behavior:
  scaleUp:
    stabilizationWindowSeconds: 0
    policies:
    - type: Percent
      value: 100    # ← Double replicas every 60s
      periodSeconds: 60
  scaleDown:
    ...

🧪 k3s Lab: Observe Scaling Policies in Action

🔧 Step 1: Deploy the Full Stack

# Save as deployment-hpa-with-policies.yaml
kubectl apply -f deployment-hpa-with-policies.yaml

# Verify
kubectl get hpa nginx-deployment-hpa
kubectl get pods -l app=nginx

🔧 Step 2: Generate Load (Trigger Scale-Up)

# Start aggressive load (CPU > 10% of 80m = 8m)
kubectl run loader --image=busybox --restart=Never -it -- sh

# Inside shell:
while true; do wget -q -O- <http://nginx-deployment-svc>; done

🔍 Watch scale-up (should be fast — no policies defined):
kubectl get hpa -w
# Replicas jump from 3 → 10 quickly

🔍 YAML Breakdown

📌 Scaling Policies Explained

1. scaleDown Configuration

2. Missing scaleUp?

🧪 k3s Lab: Observe Scaling Policies in Action

🔧 Step 1: Deploy the Full Stack

🔧 Step 2: Generate Load (Trigger Scale-Up)

🔧 Step 3: Stop Load (Trigger Scale-Down)

1. `scaleDown` Configuration

2. Missing `scaleUp`?