This design integrates the three-layer architecture (Landing → Staging → Data Mart) with a snapshot tracking system for reproducible backtests and fast latest reads.
Key principle: Data flows through validation layers, then automatically propagates to downstream tables via chained materialized views, eliminating manual dual INSERTs.
Ingestion Workflow
↓ (minimal validation)
landing_bars (raw data)
↓ [Application-level DQ validation + transformation]
staging_bars (validated, source of truth)
↓ [Automatic via chained materialized views]
├→ bars (production table)
├→ bars_snapshots (audit trail for reproducibility)
└→ bars_current (fast latest reads for live trading)
Data flow: Write raw data to landing_bars, validate and transform in application layer, then INSERT to staging_bars. All downstream tables update automatically via materialized views.
Reproducibility
Every experiment/backtest can pin a specific dataset version and remain stable even if we later fill gaps or correct historical points.
Live-read performance
Live trading / monitoring can read "latest bars" quickly without doing snapshot-folding on every request.
Operational simplicity
staging_bars (validated source of truth)Purpose: Clean, validated data after DQ checks. Single source of truth for all downstream tables.
Schema:
CREATE TABLE IF NOT EXISTS analytics.staging_bars (
id UInt64,
instrument_id UInt32,
interval String,
timestamp DateTime,
timestamp_unix Int64,
open Float64,
high Float64,
low Float64,
close Float64,
volume Float64,
metadata String,
-- Lineage column
deleted_at **DateTime,**
ingested_at **DateTime**
)
ENGINE = ReplacingMergeTree(ingested_at)
PARTITION BY toDate(timestamp)
ORDER BY (instrument_id, interval, timestamp_unix);
Key columns: