The gateway runs the following background processes continuously while the application is running.
Idle Timer Checker Runs on a fixed schedule every N seconds. Compares the current timestamp against the last recorded request activity timestamp. If the difference exceeds the configured idle threshold (default N minutes) and the pod is in READY state, it triggers pod shutdown via the GPU Lifecycle Manager.
Pod Warmup Poller Runs while the pod is in STARTING or WARMING state. Pings the Ollama health endpoint on the GPU pod every five seconds. Transitions pod state to READY when Ollama responds successfully. Fails the warmup and transitions to STOPPED if no response is received within three minutes. Triggers queue drain on successful transition to READY.
Queue Drain Worker Activates when pod state transitions to READY. Processes queued requests in arrival order sequentially, forwarding each to the appropriate service. Deactivates when the queue is empty.
Request Expiry Sweeper Runs every N seconds while the queue is non-empty. Removes requests that have exceeded the maximum wait time and sends a 503 response to their waiting connections.
Cost Session Recorder Listens for pod state transition events internally. Records session open on STARTING transition and session close with duration on STOPPED transition. Writes to the in-memory cost log consumed by the Cost Tracker.