Cross-Cutting Concerns

Authentication

JWT validation is applied via a Spring Security filter chain on every request except the health endpoint. The token is validated for signature and expiry on each request with no caching. User identity is extracted from the token and attached to the request context for logging. Token issuance is not in scope for the initial build. The filter chain is structured so an issuing endpoint and refresh logic can be added without restructuring validation.

Retry Logic

TODO

Concurrency and Idle Timer

The idle timer uses a volatile timestamp updated on each inbound request. A scheduled background process compares current time against the last activity timestamp. This avoids timer cancellation and rescheduling complexity. The idle threshold is configurable and defaults to 15 minutes.

Pod State Machine

stateDiagram-v2
    [*] --> STOPPED
    STOPPED --> STARTING : request arrives
    STARTING --> WARMING : provider confirms pod running
    WARMING --> READY : Ollama health check passes
    READY --> STOPPING : idle timer fires
    STOPPING --> STOPPED : provider confirms pod stopped
    WARMING --> STOPPED : warmup timeout exceeded

No request forwarding occurs in any state other than READY. All state transitions are logged with timestamp, provider, and pod ID.

Logging

All inbound requests are logged with timestamp, endpoint, and user identity from the JWT. All pod state transitions are logged. All provider API calls are logged with request intent and response status. Model generation requests are logged with model name and request size but prompt content is never logged. Intent classification inputs and outputs from the local Ollama are not logged. Log level is INFO by default with DEBUG available for provider communication via configuration flag.

Configuration Management

Provider credentials, GPU type preferences, idle timeout, queue max wait time, ComfyUI workflow paths, model name mappings, local Ollama host, and intent classifier mode are all externalised to application properties. No credentials exist in code. Provider selection is a runtime configuration value requiring only a config change and restart to switch.

Security

All external endpoints are HTTPS. JWT tokens are validated on every request. Provider API keys are loaded from environment variables and never logged. The Telegram webhook endpoint validates requests using the Telegram secret token header. The Ollama and ComfyUI ports on the GPU pod are not exposed to the public internet directly, all access is via the gateway only. The local Ollama instance on the home server is not exposed externally, it is accessed only by the gateway over localhost or internal network.

Timeout Strategy

Timeouts are applied via Netty HttpClient configured on the WebClient connector in WebClientConfig

Operation	Default Timeout
Ollama generation (GPU pod)	300 seconds
ComfyUI generation	120 seconds
Provider lifecycle API	30 seconds
Ollama warmup ping (GPU pod)	5 seconds per attempt, 3 minutes total
Queue max wait	180 seconds
Local Ollama intent classification	15 seconds