https://planetscale.com/blog/btrees-and-database-indexes

B Tree

Every B-tree is made up of nodes (the rectangles) and child pointers (the lines connecting the nodes). We call the top-most node the root node, the nodes on the bottom level leaf nodes, and everything else internal nodes.

image.png

A B-tree of order K is a tree structure with the following properties:

The other key characteristic of a B-tree is ordering. Within each node the elements are kept in order. Any child to the left of a key must only contain other keys that are less than it. Children to the right must have keys that are greater than it.

B-trees are uniquely suited to work well when you have a very large quantity of data that also needs to be persisted to long-term storage (disk). This is because each node uses a fixed number of bytes. The number of bytes can be tailored to play nicely with disk blocks.

Reading and writing data on hard-drive disks (HDDs) and solid-state disks (SSDs) is done in units called blocks. These are typically byte sequences of length 4096, 8192, or 16384 (4k, 8k, 16k). A single disk will have a capacity of many millions or billions of blocks. RAM on the other hand is typically addressable on a per-byte level.

This is why B-trees work so well when we need to organize and store persistent data on disk. Each node of a B-tree can be sized to match the size of a disk block (or a multiple of this size).

<aside> 💡

Imagine you are designing a database index for a file system with a 4096-byte (4KB) disk block size. Your goal is to make each B-tree node exactly 4096 bytes so that the operating system can fetch the entire node in a single disk read. The Calculation Let's assume the following sizes for the data within a single node: • Key size: 8 bytes (e.g., a BIGINT ID) • Child Pointer size: 8 bytes (the disk address of another block) • Metadata/Header: 16 bytes (to store things like the number of keys currently in the node)

</aside>

image.png

Why This is "Uniquely Suited" for Large Data With a "fan-out" (number of children) of 255, the tree stays incredibly short even as the amount of data explodes:

image.png

If our disk block and B-tree node is 16k, and our keys, values, and child pointers are all 8 bits, this means we could store 682 key/values with 683 child pointers per node.