Scope

This document defines all models that live entirely inside the gateway and never cross any external boundary. They are not serialised to JSON, not sent to providers, and not returned to clients. They carry state between services and background processes within the single Spring Boot application.

Pod Lifecycle Models

classDiagram
    class PodState {
        PodStatus status
        GpuProvider provider
        LocalDateTime lastActivityAt
        LocalDateTime sessionStartedAt
        String podId
    }

    class QueuedRequest {
        String requestId
        LocalDateTime enqueuedAt
        String targetService
        Object originalRequest
        CompletableFuture response
    }

    PodState --> PodStatus
    PodState --> GpuProvider

PodState

Purpose The single source of truth for the current GPU pod lifecycle state. Held in memory by the GPU Lifecycle Manager and read by all services that need to check pod availability before forwarding requests.
Key Constraints status is the authoritative current state and drives all routing and background process decisions. lastActivityAt is updated atomically on every inbound request that reaches any AI service. sessionStartedAt is set when transitioning to STARTING and cleared when transitioning to STOPPED. podId is the provider-specific identifier used in API calls. provider must not change while status is anything other than STOPPED. This object is shared read-only across services -- only the GPU Lifecycle Manager may write to it.

QueuedRequest

Purpose Represents a client request that arrived while the pod was not yet READY. Held in the in-memory Request Queue until the pod transitions to READY, at which point it is drained and forwarded.
Key Constraints enqueuedAt is used by the Request Expiry Sweeper to identify and reject requests that have exceeded the maximum queue wait time. targetService identifies which service should process the request when drained. originalRequest carries the deserialised inbound DTO. response is a CompletableFuture that the original request thread is blocking on -- completing it sends the response back to the client. requestId matches the RequestContext requestId for log tracing.

Cost Tracking Models

classDiagram
    class CostSession {
        String sessionId
        GpuProvider provider
        String podId
        Double hourlyRateUsd
        LocalDateTime startedAt
        LocalDateTime endedAt
        Double estimatedCostUsd
    }

    class CostLog {
        List~CostSession~ sessions
        Double totalEstimatedUsd
        Double totalHours
    }

    CostLog --> CostSession
    CostSession --> GpuProvider

CostSession

Purpose Records a single GPU pod session with enough information to calculate its estimated cost. Created when the pod transitions to STARTING and finalised when it transitions to STOPPED.
Key Constraints sessionId is a UUID. hourlyRateUsd is copied from the provider response at session start so the rate is locked for the session even if configuration changes. endedAt and estimatedCostUsd are null while the session is active. estimatedCostUsd is calculated as hourlyRateUsd multiplied by session duration in hours when the session ends. Sessions are not persisted across gateway restarts in phase one.

CostLog

Purpose In-memory collection of all cost sessions since the gateway started. Used by the Cost Tracker to produce summaries for health responses and bot queries.
Key Constraints totalEstimatedUsd and totalHours are recalculated on each write to avoid scanning the full session list on every read. An active session contributes its current elapsed time to totals but is not marked as complete until the pod stops. Resets to empty on gateway restart in phase one.

Fantasy Pipeline Models

classDiagram
    class FantasyPipelineContext {
        String requestId
        String imageBase64
        String prompt
        boolean includeImageGeneration
        String generatedStory
        String generatedImageBase64
        FantasyStage currentStage
        List~String~ warnings
    }

    FantasyPipelineContext --> FantasyStage

FantasyPipelineContext

Purpose Carries all state for a single Fantasy Mode pipeline execution through the Fantasy Orchestrator. Accumulates results from each pipeline stage so partial results can be returned if a later stage fails.
Key Constraints requestId matches the RequestContext requestId for log tracing. generatedStory is populated after the vision stage completes. generatedImageBase64 is populated only when includeImageGeneration is true and image generation succeeds. currentStage is updated as each stage completes. warnings accumulates non-fatal messages from any stage. The context object is not shared between requests and lives only for the duration of a single pipeline execution.