Databases: An Introduction to How They Work Internally

Modern systems rely on databases as the backbone for handling massive amounts of data. Databases store petabytes of information, process millions of rows per day, and support billions of users, each performing millions of read/write operations.

This is why designing a scalable and high-performance database is critical — ensuring consistency, availability, and fault tolerance at scale.

But before we dive into how databases achieve this, let’s first understand the basics of memory, storage, and data organization — the foundation of any DBMS.

1. Memory and Storage Hierarchy

image.png

Modern computer systems use a layered memory hierarchy:

CPU registers → L1/L2/L3 caches → RAM (main memory) → Persistent storage (SSD/HDD)

Each layer trades speed for capacity:

Because databases must keep data safe even after power loss, they use non-volatile storage (SSD or HDD) to store data permanently.

2. How Data is Stored on Disk

image.png

A Hard Disk Drive (HDD) is a mechanical storage device with the following main components:

Reading and writing data on hard-drive disks (HDDs) and solid-state disks (SSDs) is done in units called blocks. These are typically byte sequences of length (4k, 8k, 16k).

<aside> 💡

Key Insight:

Disk I/O is orders of magnitude slower than CPU or RAM access. This is why database systems aim to minimize the number of disk reads/writes for every query.

</aside>

For an excellent deep dive into storage devices and latencies, check this guide: IO devices and latency — PlanetScale

3. Organizing Data Efficiently

Since disk access is expensive, we need data structures that minimize the number of disk blocks we touch per query. This is where B-trees and B+ trees come into play — the core structures used by most databases to store indexes efficiently.

B-trees :

Every B-tree is made up of nodes child pointers . We call the top-most node the root node, the nodes on the bottom level leaf nodes, and everything else internal nodes.

Each node uses a fixed number of bytes. The number of bytes can be tailored to play nicely with disk blocks. The number of values each node of the tree can store is based on the number of bytes each is allocated and the number of bytes consumed by each key / value pair.

A B-tree of order K is a tree structure with the following properties: