Zoltraak Gateway is a single Spring Boot application that acts as the control plane for all self-hosted intelligence workloads. It manages GPU cloud instances across multiple providers, proxies requests to locally running model servers (i.e. on the pods), orchestrates multi-step creative pipelines, and exposes a unified REST API to clients including OpenWebUI, a Telegram bot, and any future consumer applications.
The system is designed around one core constraint: GPU compute is expensive and must only run when actively needed. Every design decision flows from that constraint. The gateway owns the lifecycle of GPU instances, queues requests during warmup, routes to the correct provider and model, and shuts down compute aggressively when idle.
The server follows a Layered Hexagonal Architecture. Business logic lives in service classes and is fully isolated from infrastructure concerns. All external systems (GPU providers, Ollama, ComfyUI, Telegram) are reached through interfaces with concrete adapter implementations. Controllers handle HTTP concerns only and delegate immediately to services. This means any external system can be swapped or mocked without touching business logic.
The name comes from Frieren: Beyond Journey's End, it’s the name of a spell known for breaking through barriers with precision and no wasted effort. That is the intended character of this system
graph LR
subgraph Clients["Clients"]
OWU[OpenWebUI]
TG[Telegram]
REST[REST Client]
end
subgraph GW["Zoltraak Gateway"]
CTRL[Controllers]
SVC[Services]
ADP[Adapters]
CTRL --> SVC
SVC --> ADP
end
subgraph External["External Systems"]
LOLLAMA[Local Ollama<br>Intent Classification]
subgraph Cloud["GPU Cloud Providers"]
VAST[Vast.ai]
RUNPOD[RunPod]
end
subgraph Pod["GPU Pod · On Demand"]
OLLAMA[Ollama<br>Vision / Chat / Code]
COMFY[ComfyUI<br>Image Generation]
VOL[(Model Volume)]
OLLAMA --- VOL
COMFY --- VOL
end
end
OWU -->|HTTP| CTRL
TG -->|Webhook| CTRL
REST -->|HTTP + JWT| CTRL
ADP -->|Intent| LOLLAMA
ADP -->|Lifecycle| VAST
ADP -->|Lifecycle| RUNPOD
ADP -->|Generate| OLLAMA
ADP -->|Prompt| COMFY
| Component | Technology | Reason |
|---|---|---|
| Language | Java 25 | LTS release, recommended for Spring Framework 7.x production use |
| Framework | Spring Boot 4.0 | First-class Java 25 support, Spring Framework 7 baseline, modular jars |
| HTTP Client | Spring WebClient | Non-blocking, supports streaming, configurable timeout and retry |
| Security | Spring Security | JWT filter chain integration, well understood |
| Build | Maven | Standard, well supported in CI environments |
| Bot | Telegram Bot API via HTTP | Free, no approval process, webhook based |
| Component | Technology | Reason |
|---|---|---|
| LLM Server | Ollama | Single binary, REST API, multi-model support, vision capable |
| Image Generation | ComfyUI | Workflow based, REST API, wide model support |
| Base Image | PyTorch Vast / RunPod | CUDA pre-configured, Python available for ComfyUI |
| Component | Technology | Reason |
|---|---|---|
| Hardware | HP EliteDesk 800 i5 6th gen | Personal server, always on, no GPU required for gateway or bot brain |
| LLM Server | Ollama (CPU) | Runs small intent classification model for the Telegram bot only |
| Bot Brain Model | phi3:mini or tinyllama | Lightweight, runs comfortably on CPU, sufficient for intent parsing |
| UI | OpenWebUI | Conversation history, image storage, model management |
| Component | Technology | Reason |
|---|---|---|
| GPU Provider A | Vast.ai | Cheaper hourly rates, SA-located instances available |
| GPU Provider B | RunPod | Stable proxy URLs, cheaper persistent storage, simpler networking |
| Model Storage | Persistent Volume (provider native) | Models survive pod stop, avoids re-download cost |