Sharding and replication

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/1b40c7bf-841d-4c30-a9f4-91fbc8ba6242/Untitled.png

Step 1. A sharded, replicated fact table

CREATE TABLE IF NOT EXISTS `ontime_shard` ON CLUSTER `{cluster}` (
 `Year` UInt16,
 `Quarter` UInt8,
 ...
)
Engine=ReplicatedMergeTree(
'/clickhouse/{cluster}/tables/{shard}/{database}/ontime_shard',
'{replica}')
PARTITION BY toYYYYMM(FlightDate)
ORDER BY (FlightDate, `Year`, `Month`, DepDel15)

What does ON CLUSTER do?

ON CLUSTER executes a command over a set of nodes

Step 2. A distributed table to find data

CREATE TABLE IF NOT EXISTS ontime ON CLUSTER '{cluster}'
AS ontime_shard
ENGINE = Distributed(
 '{cluster}', currentDatabase(), ontime_shard, rand())

Step 3. A fully replicated dimension table

CREATE TABLE IF NOT EXISTS airports ON CLUSTER 'all-replicated' (
 AirportID String,
 Name String,
 ...
)
Engine=ReplicatedMergeTree(
 '/clickhouse/{cluster}/tables/all/{database}/airports',
'{replica}')
PARTITION BY tuple()
PRIMARY KEY AirportID
ORDER BY AirportID

Querying shards and replicas

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/22f8d7ad-7a9a-4856-875e-883c8c9298cc/Untitled.png