Manages the full lifecycle of GPU pod instances across all configured providers. Responsibilities include starting a pod when demand arrives, monitoring warmup until Ollama and ComfyUI are ready, maintaining an idle timer that resets on each inbound request, and stopping the pod when the timer expires. Protects in-flight requests from being dropped during a shutdown sequence. Exposes current pod status to all other services so they can make routing decisions without querying providers directly.
Translates provider-agnostic commands into VastAi and RunPod specific API calls. Both providers are implemented behind a common interface with four operations: start, stop, status, and connection detail resolution.
classDiagram
class GpuProviderPort {
<<interface>>
start() void
stop() void
getStatus() PodStatus
getConnectionDetails() PodConnectionDetails
}
class RunPodAdapter {
}
class VastAiAdapter {
}
GpuProviderPort <|.. RunPodAdapter
GpuProviderPort <|.. VastAiAdapter
The VastAi adapter dynamically resolves the public IP and mapped ports on each status check because VastAi does not provide stable URLs. The RunPod adapter constructs stable proxy URLs using the static pod ID format. Provider selection is driven by configuration and can be changed without touching any other service.
Mimics the Ollama REST API surface so that OpenWebUI and any Ollama-compatible client can point at Zoltraak Gateway without modification. When a request arrives on the Ollama-compatible endpoints, this service checks pod status via the GPU Lifecycle Manager, triggers startup if needed, queues the request during warmup, and forwards to the real Ollama instance once the pod is ready. Owns the translation between the incoming Ollama-shaped request and the actual forwarded call.
Manages the multi-step fantasy mode pipeline. Accepts an image and an optional prompt, calls the vision model via Ollama to read the image and generate a story, optionally calls ComfyUI to generate an accompanying illustration, and assembles the full response. Each step is tracked so partial results can be returned if a downstream step fails. Model selection is delegated to the Ollama Proxy and ComfyUI Client.
Manages all communication with the ComfyUI process running on the GPU pod. Constructs workflow payloads, submits generation jobs, polls for completion, retrieves output images, and handles ComfyUI specific error responses. Workflow templates are managed via configuration.
Routes coding-focused requests to the appropriate model in Ollama. Applies a coding-specific system prompt, manages context window considerations for long code inputs, and returns structured responses. The underlying model call is delegated to the Ollama Proxy.
Holds requests that arrive while a pod is warming up. Implemented as an in-memory queue with a configurable maximum wait time. Requests exceeding the wait time are rejected with HTTP 503 and a retry-after header. The queue drains automatically once the pod reaches READY status. Requests are not persisted across gateway restarts.
Validates JWT tokens on all incoming requests via a Spring Security filter chain. Token issuance and user management are deferred to post-launch. A static pre-issued token is used during initial operation. The filter chain is structured so issuing endpoints and refresh logic can be added without restructuring validation.
Receives messages from Telegram and translates them into internal service calls using natural language intent classification. Routes classified intents to the GPU Lifecycle Manager, Cost Tracker, Health Service, or Fantasy Orchestrator as appropriate. For unrecognised intents it responds with a clarification message. A pure protocol adapter that delegates all logic to the appropriate internal service after intent resolution.