Overview | Notion

Zoltraak Gateway is a single Spring Boot application that acts as the control plane for all self-hosted intelligence workloads. It manages GPU cloud instances across multiple providers, proxies requests to locally running model servers (i.e. on the pods), orchestrates multi-step creative pipelines, and exposes a unified REST API to clients including OpenWebUI, a Telegram bot, and any future consumer applications.

The system is designed around one core constraint: GPU compute is expensive and must only run when actively needed. Every design decision flows from that constraint. The gateway owns the lifecycle of GPU instances, queues requests during warmup, routes to the correct provider and model, and shuts down compute aggressively when idle.

The server follows a Layered Hexagonal Architecture. Business logic lives in service classes and is fully isolated from infrastructure concerns. All external systems (GPU providers, Ollama, ComfyUI, Telegram) are reached through interfaces with concrete adapter implementations. Controllers handle HTTP concerns only and delegate immediately to services. This means any external system can be swapped or mocked without touching business logic.

Zoltraak Name Origin

The name comes from Frieren: Beyond Journey's End, it’s the name of a spell known for breaking through barriers with precision and no wasted effort. That is the intended character of this system

High Level Diagram

graph LR
    subgraph Clients["Clients"]
        OWU[OpenWebUI]
        TG[Telegram]
        REST[REST Client]
    end

    subgraph GW["Zoltraak Gateway"]
        CTRL[Controllers]
        SVC[Services]
        ADP[Adapters]

        CTRL --> SVC
        SVC --> ADP
    end

    subgraph External["External Systems"]
        LOLLAMA[Local Ollama<br>Intent Classification]

        subgraph Cloud["GPU Cloud Providers"]
            VAST[Vast.ai]
            RUNPOD[RunPod]
        end

        subgraph Pod["GPU Pod · On Demand"]
            OLLAMA[Ollama<br>Vision / Chat / Code]
            COMFY[ComfyUI<br>Image Generation]
            VOL[(Model Volume)]
            OLLAMA --- VOL
            COMFY --- VOL
        end
    end

    OWU -->|HTTP| CTRL
    TG -->|Webhook| CTRL
    REST -->|HTTP + JWT| CTRL

    ADP -->|Intent| LOLLAMA
    ADP -->|Lifecycle| VAST
    ADP -->|Lifecycle| RUNPOD
    ADP -->|Generate| OLLAMA
    ADP -->|Prompt| COMFY

Technology Stack

Gateway

Component	Technology	Reason
Language	Java 25	LTS release, recommended for Spring Framework 7.x production use
Framework	Spring Boot 4.0	First-class Java 25 support, Spring Framework 7 baseline, modular jars
HTTP Client	Spring WebClient	Non-blocking, supports streaming, configurable timeout and retry
Security	Spring Security	JWT filter chain integration, well understood
Build	Maven	Standard, well supported in CI environments
Bot	Telegram Bot API via HTTP	Free, no approval process, webhook based

GPU Pod

Component	Technology	Reason
LLM Server	Ollama	Single binary, REST API, multi-model support, vision capable
Image Generation	ComfyUI	Workflow based, REST API, wide model support
Base Image	PyTorch Vast / RunPod	CUDA pre-configured, Python available for ComfyUI

Home Server

Component	Technology	Reason
Hardware	HP EliteDesk 800 i5 6th gen	Personal server, always on, no GPU required for gateway or bot brain
LLM Server	Ollama (CPU)	Runs small intent classification model for the Telegram bot only
Bot Brain Model	phi3:mini or tinyllama	Lightweight, runs comfortably on CPU, sufficient for intent parsing
UI	OpenWebUI	Conversation history, image storage, model management

Infrastructure

Component	Technology	Reason
GPU Provider A	Vast.ai	Cheaper hourly rates, SA-located instances available
GPU Provider B	RunPod	Stable proxy URLs, cheaper persistent storage, simpler networking
Model Storage	Persistent Volume (provider native)	Models survive pod stop, avoids re-download cost