Background

About three months ago, @Eric Fu wrote a design doc about the watermark in our streaming system. Every watermark must be assigned over a specified time window in that design. However, many cases of watermarks are not covered in that design.

Previously, we also wrote a deprecated document [Deprecated] RFC: Watermark in RisingWave (WaterMark part I), which mixed the

Design

The Watermark message

We decide to introduce a new message type:

pub enum Message {
    Chunk(StreamChunk),
    Barrier(Barrier),
    Watermark(col_idx, Timestamp),
}

Term 1.1: Every record has multiple watermark columns with the timestamp type.

Term 1.2: If the order of a data record d in the stream is after another watermark record w, then d[watermark_col] > w.val.

Based on Term 1.2, we can get a noteworthy property:

Prop 1.1: We can postpone or remove any watermark record in a valid stream.