Introduce

What is columnar storage?

In most databases, data is arranged in a row-oriented manner: all values in a single row of a table are stored adjacently. This is also similar in program memory, where objects of the same type are generally stored in an array. This layout has obvious advantages, as each row of data contains complete semantic information, making it easy to update new data.

However, the main drawback of row-based storage is that adjacent storage locations contain data of different types, making it difficult to compress. Columnar storage, on the other hand, storing data from the same column together. Each column contains the data of a single field in the table, which allows for better utilization of compression algorithms and ultimately saves storage space.

Untitled

Why serde columnar?

We want to maintain the traditional memory layout while programming, where each data has complete semantic information. However, when storing data, we want to leverage columnar storage in the most convenient and ergonomic manner for data compression. Additionally, we'd like to easily modify the compression strategy for each column during the development process without altering any encoding logic. To achieve this, we have developed serde_columnar.

serde_columnar is an ergonomic columnar storage encoding crate that allows you to seamlessly transform row-based storage into columnar storage without modifying any existing code. By using Rust's procedural macros, serde_columnar hides the conversion process behind the scenes. Furthermore, you can easily modify the compression strategy for each field on the fly through macro attribute.

For example, you want to store the array that a is always 1 and b is auto-incremented from 1.

A	B
1	1
1	2
1	3

No doubt, it can be compressed, right?

Out of the box, serde_columnar can encode binary data while using a columnar storage layout to compress common Rust structures stored in a list or map. You only need to set a few proc-macros like #[columnar].

If you'd like to give it a try right away, you can find the usages and examples here.

Performance

We just use simple test data below.

#[columnar(vec, ser, de)]
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Data {
    #[columnar(strategy = "DeltaRle")]
    id: u64,
    #[columnar(strategy = "Rle")]
    name: String,
}

#[columnar(ser, de)]
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct VecStore {
    #[columnar(class = "vec")]
    pub data: Vec<Data>,
}

Friendly Data

for i in 0..10000 {
    _data.push(Data {
        id: i,
        name: String::from("name"),
    });
}

	encode time (us)	decode time (us)	doc size (bytes)
postcard	185.7	1559.4	69875
bincode	93.1	1466.1	160012
serde_columnar	188.25	1627.3	19