JWT validation is applied via a Spring Security filter chain on every request except the health endpoint. The token is validated for signature and expiry on each request with no caching. User identity is extracted from the token and attached to the request context for logging. Token issuance is not in scope for the initial build. The filter chain is structured so an issuing endpoint and refresh logic can be added without restructuring validation.
TODO
The idle timer uses a volatile timestamp updated on each inbound request. A scheduled background process compares current time against the last activity timestamp. This avoids timer cancellation and rescheduling complexity. The idle threshold is configurable and defaults to 15 minutes.
stateDiagram-v2
[*] --> STOPPED
STOPPED --> STARTING : request arrives
STARTING --> WARMING : provider confirms pod running
WARMING --> READY : Ollama health check passes
READY --> STOPPING : idle timer fires
STOPPING --> STOPPED : provider confirms pod stopped
WARMING --> STOPPED : warmup timeout exceeded
No request forwarding occurs in any state other than READY. All state transitions are logged with timestamp, provider, and pod ID.
All inbound requests are logged with timestamp, endpoint, and user identity from the JWT. All pod state transitions are logged. All provider API calls are logged with request intent and response status. Model generation requests are logged with model name and request size but prompt content is never logged. Intent classification inputs and outputs from the local Ollama are not logged. Log level is INFO by default with DEBUG available for provider communication via configuration flag.
Provider credentials, GPU type preferences, idle timeout, queue max wait time, ComfyUI workflow paths, model name mappings, local Ollama host, and intent classifier mode are all externalised to application properties. No credentials exist in code. Provider selection is a runtime configuration value requiring only a config change and restart to switch.
All external endpoints are HTTPS. JWT tokens are validated on every request. Provider API keys are loaded from environment variables and never logged. The Telegram webhook endpoint validates requests using the Telegram secret token header. The Ollama and ComfyUI ports on the GPU pod are not exposed to the public internet directly, all access is via the gateway only. The local Ollama instance on the home server is not exposed externally, it is accessed only by the gateway over localhost or internal network.
Timeouts are applied via Netty HttpClient configured on the WebClient connector in WebClientConfig
| Operation | Default Timeout |
|---|---|
| Ollama generation (GPU pod) | 300 seconds |
| ComfyUI generation | 120 seconds |
| Provider lifecycle API | 30 seconds |
| Ollama warmup ping (GPU pod) | 5 seconds per attempt, 3 minutes total |
| Queue max wait | 180 seconds |
| Local Ollama intent classification | 15 seconds |