On-disk structures are just a layout of bytes inside a file, there’s no malloc() or free().

When designing a file format:

Even though the operating system and filesystem take over some of the responsibilities, implementing on-disk structures requires attention to more details and has more pitfalls.

<aside> <img src="/icons/bookmark_yellow.svg" alt="/icons/bookmark_yellow.svg" width="40px" />

The same design principles apply in other places such as serialization formats (e.g., Protobuf, Thrift, Avro), communication protocols (e.g., TCP message framing).

</aside>

Data structures on-disk vs in-memory

Managing on-disk structure is equivalent managing memory manually without malloc() in C, since you’d have to implement memory allocation + tracking + fragmentation + etc.

Concept In-Memory On-Disk
Access Virtual memory Explicit via system calls
Pointers Actual memory addresses File offsets
Allocation Automatic (malloc/new) Manually tracks pages, free space
Structure updates Simple reallocs or pointer swaps Requires rewriting parts of the file
Fragmentation Handled by allocator Managed manually

Binary Encoding

Before organizing records into pages: