Internal Models

Scope

This document defines all models that live entirely inside the gateway and never cross any external boundary. They are not serialised to JSON, not sent to providers, and not returned to clients. They carry state between services and background processes within the single Spring Boot application.

Pod Lifecycle Models

classDiagram
    class PodState {
        PodStatus status
        GpuProvider provider
        LocalDateTime lastActivityAt
        LocalDateTime sessionStartedAt
        String podId
    }

    class QueuedRequest {
        String requestId
        LocalDateTime enqueuedAt
        String targetService
        Object originalRequest
        CompletableFuture response
    }

    PodState --> PodStatus
    PodState --> GpuProvider

PodState


Purpose	The single source of truth for the current GPU pod lifecycle state. Held in memory by the GPU Lifecycle Manager and read by all services that need to check pod availability before forwarding requests.
Key Constraints	status is the authoritative current state and drives all routing and background process decisions. lastActivityAt is updated atomically on every inbound request that reaches any AI service. sessionStartedAt is set when transitioning to STARTING and cleared when transitioning to STOPPED. podId is the provider-specific identifier used in API calls. provider must not change while status is anything other than STOPPED. This object is shared read-only across services -- only the GPU Lifecycle Manager may write to it.

Purpose

The single source of truth for the current GPU pod lifecycle state. Held in memory by the GPU Lifecycle Manager and read by all services that need to check pod availability before forwarding requests.

Key Constraints

status is the authoritative current state and drives all routing and background process decisions. lastActivityAt is updated atomically on every inbound request that reaches any AI service. sessionStartedAt is set when transitioning to STARTING and cleared when transitioning to STOPPED. podId is the provider-specific identifier used in API calls. provider must not change while status is anything other than STOPPED. This object is shared read-only across services -- only the GPU Lifecycle Manager may write to it.

QueuedRequest


Purpose	Represents a client request that arrived while the pod was not yet READY. Held in the in-memory Request Queue until the pod transitions to READY, at which point it is drained and forwarded.
Key Constraints	enqueuedAt is used by the Request Expiry Sweeper to identify and reject requests that have exceeded the maximum queue wait time. targetService identifies which service should process the request when drained. originalRequest carries the deserialised inbound DTO. response is a CompletableFuture that the original request thread is blocking on -- completing it sends the response back to the client. requestId matches the RequestContext requestId for log tracing.

Purpose

Represents a client request that arrived while the pod was not yet READY. Held in the in-memory Request Queue until the pod transitions to READY, at which point it is drained and forwarded.

Key Constraints

enqueuedAt is used by the Request Expiry Sweeper to identify and reject requests that have exceeded the maximum queue wait time. targetService identifies which service should process the request when drained. originalRequest carries the deserialised inbound DTO. response is a CompletableFuture that the original request thread is blocking on -- completing it sends the response back to the client. requestId matches the RequestContext requestId for log tracing.

Cost Tracking Models

classDiagram
    class CostSession {
        String sessionId
        GpuProvider provider
        String podId
        Double hourlyRateUsd
        LocalDateTime startedAt
        LocalDateTime endedAt
        Double estimatedCostUsd
    }

    class CostLog {
        List~CostSession~ sessions
        Double totalEstimatedUsd
        Double totalHours
    }

    CostLog --> CostSession
    CostSession --> GpuProvider

CostSession


Purpose	Records a single GPU pod session with enough information to calculate its estimated cost. Created when the pod transitions to STARTING and finalised when it transitions to STOPPED.
Key Constraints	sessionId is a UUID. hourlyRateUsd is copied from the provider response at session start so the rate is locked for the session even if configuration changes. endedAt and estimatedCostUsd are null while the session is active. estimatedCostUsd is calculated as hourlyRateUsd multiplied by session duration in hours when the session ends. Sessions are not persisted across gateway restarts in phase one.

Purpose

Records a single GPU pod session with enough information to calculate its estimated cost. Created when the pod transitions to STARTING and finalised when it transitions to STOPPED.

Key Constraints

sessionId is a UUID. hourlyRateUsd is copied from the provider response at session start so the rate is locked for the session even if configuration changes. endedAt and estimatedCostUsd are null while the session is active. estimatedCostUsd is calculated as hourlyRateUsd multiplied by session duration in hours when the session ends. Sessions are not persisted across gateway restarts in phase one.

CostLog


Purpose	In-memory collection of all cost sessions since the gateway started. Used by the Cost Tracker to produce summaries for health responses and bot queries.
Key Constraints	totalEstimatedUsd and totalHours are recalculated on each write to avoid scanning the full session list on every read. An active session contributes its current elapsed time to totals but is not marked as complete until the pod stops. Resets to empty on gateway restart in phase one.

Fantasy Pipeline Models

classDiagram
    class FantasyPipelineContext {
        String requestId
        String imageBase64
        String prompt
        boolean includeImageGeneration
        String generatedStory
        String generatedImageBase64
        FantasyStage currentStage
        List~String~ warnings
    }

    FantasyPipelineContext --> FantasyStage

FantasyPipelineContext


Purpose	Carries all state for a single Fantasy Mode pipeline execution through the Fantasy Orchestrator. Accumulates results from each pipeline stage so partial results can be returned if a later stage fails.
Key Constraints	requestId matches the RequestContext requestId for log tracing. generatedStory is populated after the vision stage completes. generatedImageBase64 is populated only when includeImageGeneration is true and image generation succeeds. currentStage is updated as each stage completes. warnings accumulates non-fatal messages from any stage. The context object is not shared between requests and lives only for the duration of a single pipeline execution.

Purpose

Carries all state for a single Fantasy Mode pipeline execution through the Fantasy Orchestrator. Accumulates results from each pipeline stage so partial results can be returned if a later stage fails.

Key Constraints

requestId matches the RequestContext requestId for log tracing. generatedStory is populated after the vision stage completes. generatedImageBase64 is populated only when includeImageGeneration is true and image generation succeeds. currentStage is updated as each stage completes. warnings accumulates non-fatal messages from any stage. The context object is not shared between requests and lives only for the duration of a single pipeline execution.