All always-on components run on a personal home server (HP EliteDesk 800, i5 6th gen). This machine hosts Zoltraak Gateway, OpenWebUI, and a local Ollama instance used exclusively by the Telegram bot for natural language intent classification. The home server does not have a GPU and all model inference on this machine runs on CPU using small lightweight models suitable for intent classification only.
The GPU pod is a separate ephemeral machine running on Vast.ai or RunPod. It is started on demand by the gateway and stopped after idle timeout. The pod runs Ollama and ComfyUI as background processes. Models are stored on a persistent volume that survives pod stop and restart.
graph LR
subgraph Home Server - Always On
GW[Zoltraak Gateway]
OWU[OpenWebUI]
LOLLAMA[Ollama<br>CPU only<br>Bot brain]
end
subgraph On Demand GPU Pod
OLLAMA[Ollama<br>GPU]
COMFY[ComfyUI<br>GPU]
VOL[(Persistent Volume<br>Model Storage)]
end
GW <-->|REST| OWU
GW <-->|Intent classification| LOLLAMA
GW -->|Starts / Stops| POD
GW <-->|REST| OLLAMA
GW <-->|REST| COMFY
OLLAMA --- VOL
COMFY --- VOL