A topic on its own is just a single stream of messages. But what if millions of messages are coming in every second? One stream can't handle that fast enough.
That's where partitions come in. A partition is a subsection of a topic — Kafka splits a topic into multiple partitions so data can be processed in parallel across multiple brokers.
Think of it like a highway. One lane (no partition) → traffic jam. Multiple lanes (partitions) → cars move in parallel, much faster.
Broker
└── Topic A
├── Partition 0 → [msg0, msg1, msg2, msg3, msg4, msg5, msg6...]
└── Partition 1 → [msg0, msg1, msg2, msg3, msg4, msg5...]
└── Topic B
└── Partition 0 → [msg0, msg1, msg2...]
Key things to know:
How does a producer decide which partition to send to? the producer provides the key when sending the message.
driver-123 always go to Partition 0)hash(key) % number_of_partitionskey = "driver-123"
Partitions = 3
hash("driver-123") % 3 = 0
This is important — if you want all events for a specific driver to be in order, you use their driver ID as the key. That guarantees all their messages land in the same partition, in order.