Zoltraak Gateway is a single Spring Boot application that acts as the control plane for all self-hosted intelligence workloads. It manages GPU cloud instances across multiple providers, proxies requests to locally running model servers (i.e. on the pods), orchestrates multi-step creative pipelines, and exposes a unified REST API to clients including OpenWebUI, a Telegram bot, and any future consumer applications.

The system is designed around one core constraint: GPU compute is expensive and must only run when actively needed. Every design decision flows from that constraint. The gateway owns the lifecycle of GPU instances, queues requests during warmup, routes to the correct provider and model, and shuts down compute aggressively when idle.

The server follows a Layered Hexagonal Architecture. Business logic lives in service classes and is fully isolated from infrastructure concerns. All external systems (GPU providers, Ollama, ComfyUI, Telegram) are reached through interfaces with concrete adapter implementations. Controllers handle HTTP concerns only and delegate immediately to services. This means any external system can be swapped or mocked without touching business logic.

Zoltraak Name Origin

The name comes from Frieren: Beyond Journey's End, it’s the name of a spell known for breaking through barriers with precision and no wasted effort. That is the intended character of this system

High Level Diagram

graph LR
    subgraph Clients["Clients"]
        OWU[OpenWebUI]
        TG[Telegram]
        REST[REST Client]
    end

    subgraph GW["Zoltraak Gateway"]
        CTRL[Controllers]
        SVC[Services]
        ADP[Adapters]

        CTRL --> SVC
        SVC --> ADP
    end

    subgraph External["External Systems"]
        LOLLAMA[Local Ollama<br>Intent Classification]

        subgraph Cloud["GPU Cloud Providers"]
            VAST[Vast.ai]
            RUNPOD[RunPod]
        end

        subgraph Pod["GPU Pod · On Demand"]
            OLLAMA[Ollama<br>Vision / Chat / Code]
            COMFY[ComfyUI<br>Image Generation]
            VOL[(Model Volume)]
            OLLAMA --- VOL
            COMFY --- VOL
        end
    end

    OWU -->|HTTP| CTRL
    TG -->|Webhook| CTRL
    REST -->|HTTP + JWT| CTRL

    ADP -->|Intent| LOLLAMA
    ADP -->|Lifecycle| VAST
    ADP -->|Lifecycle| RUNPOD
    ADP -->|Generate| OLLAMA
    ADP -->|Prompt| COMFY

Technology Stack

Gateway

Component Technology Reason
Language Java 25 LTS release, recommended for Spring Framework 7.x production use
Framework Spring Boot 4.0 First-class Java 25 support, Spring Framework 7 baseline, modular jars
HTTP Client Spring WebClient Non-blocking, supports streaming, configurable timeout and retry
Security Spring Security JWT filter chain integration, well understood
Build Maven Standard, well supported in CI environments
Bot Telegram Bot API via HTTP Free, no approval process, webhook based

GPU Pod

Component Technology Reason
LLM Server Ollama Single binary, REST API, multi-model support, vision capable
Image Generation ComfyUI Workflow based, REST API, wide model support
Base Image PyTorch Vast / RunPod CUDA pre-configured, Python available for ComfyUI

Home Server

Component Technology Reason
Hardware HP EliteDesk 800 i5 6th gen Personal server, always on, no GPU required for gateway or bot brain
LLM Server Ollama (CPU) Runs small intent classification model for the Telegram bot only
Bot Brain Model phi3:mini or tinyllama Lightweight, runs comfortably on CPU, sufficient for intent parsing
UI OpenWebUI Conversation history, image storage, model management

Infrastructure

Component Technology Reason
GPU Provider A Vast.ai Cheaper hourly rates, SA-located instances available
GPU Provider B RunPod Stable proxy URLs, cheaper persistent storage, simpler networking
Model Storage Persistent Volume (provider native) Models survive pod stop, avoids re-download cost