On-disk structures are just a layout of bytes inside a file, there’s no malloc() or free().
When designing a file format:
Even though the operating system and filesystem take over some of the responsibilities, implementing on-disk structures requires attention to more details and has more pitfalls.
<aside> <img src="/icons/bookmark_yellow.svg" alt="/icons/bookmark_yellow.svg" width="40px" />
The same design principles apply in other places such as serialization formats (e.g., Protobuf, Thrift, Avro), communication protocols (e.g., TCP message framing).
</aside>
Data structures on-disk vs in-memory
Managing on-disk structure is equivalent managing memory manually without malloc() in C, since you’d have to implement memory allocation + tracking + fragmentation + etc.
| Concept | In-Memory | On-Disk |
|---|---|---|
| Access | Virtual memory | Explicit via system calls |
| Pointers | Actual memory addresses | File offsets |
| Allocation | Automatic (malloc/new) | Manually tracks pages, free space |
| Structure updates | Simple reallocs or pointer swaps | Requires rewriting parts of the file |
| Fragmentation | Handled by allocator | Managed manually |
Before organizing records into pages: