AlphaForge Data Layer – Snapshot Strategy with Chained Materialized Views

This design integrates the three-layer architecture (Landing → Staging → Data Mart) with a snapshot tracking system for reproducible backtests and fast latest reads.

Key principle: Data flows through validation layers, then automatically propagates to downstream tables via chained materialized views, eliminating manual dual INSERTs.


1) Architecture Overview

Complete Data Flow

Ingestion Workflow
  ↓ (minimal validation)
landing_bars (raw data)
  ↓ [Application-level DQ validation + transformation]
staging_bars (validated, source of truth)
  ↓ [Automatic via chained materialized views]
  ├→ bars (production table)
  ├→ bars_snapshots (audit trail for reproducibility)
	  └→ bars_current (fast latest reads for live trading)

Data flow: Write raw data to landing_bars, validate and transform in application layer, then INSERT to staging_bars. All downstream tables update automatically via materialized views.


2) Goals

  1. Reproducibility

    Every experiment/backtest can pin a specific dataset version and remain stable even if we later fill gaps or correct historical points.

  2. Live-read performance

    Live trading / monitoring can read "latest bars" quickly without doing snapshot-folding on every request.

  3. Operational simplicity


3) Data Model

3.1 staging_bars (validated source of truth)

Purpose: Clean, validated data after DQ checks. Single source of truth for all downstream tables.

Schema:

CREATE TABLE IF NOT EXISTS analytics.staging_bars (
    id             UInt64,
    instrument_id  UInt32,
    interval       String,
    timestamp      DateTime,
    timestamp_unix Int64,
    open           Float64,
    high           Float64,
    low            Float64,
    close          Float64,
    volume         Float64,
    metadata       String,
    -- Lineage column
    deleted_at     **DateTime,**
    ingested_at    **DateTime**
)
ENGINE = ReplacingMergeTree(ingested_at)
PARTITION BY toDate(timestamp)
ORDER BY (instrument_id, interval, timestamp_unix);

Key columns: