Kafka — Serialization, Consumer Groups and rebalancing

Serialization

Kafka only understands bytes. It can't store a JSON object or a string directly. So before sending, data must be converted to bytes. When reading, bytes must be converted back.

Devloper object message deta h then producer use convert krte h before sending to consumer

Serialization = Object/JSON → Bytes (Producer side, before sending) Deserialization = Bytes → Object/JSON (Consumer side, after reading)

Producer Side — Serialization

Raw Data (JSON Object)  →  Serialize (Convert to Bytes)  →  Kafka Broker (Stores Bytes)

Consumer Side — Deserialization

Kafka Broker (Bytes)  →  Deserialize (Convert to JSON)  →  Consumer (Reads JSON Object)

Simple analogy: It's like sending a file over WhatsApp. Your phone compresses/encodes it before sending (serialize), and the receiver's phone decodes it back (deserialize). The network in between only sees raw binary — it doesn't care what's inside.

Why does this matter?

Kafka is language agnostic — Producer can be in Python, Consumer in Java. As long as both agree on the format (e.g. JSON), it works.
Using the wrong deserializer = garbage data or crash on consumer side.

Common Serializer formats

Format	When to use
`String`	Simple text, log messages
`JSON`	Most common — easy to read, flexible
`Avro`	Schema-enforced, compact binary, best for production at scale
`Protobuf`	Google's format, very fast and compact

Events are appended to the end of a partition — Kafka never overwrites old messages. This is why Kafka is so fast — appending to a log is one of the cheapest operations a disk can do.

Consumer Groups

A Consumer Group is a group of consumers that work together to read from a topic. Instead of one consumer reading everything, the work is split across multiple consumers in the group.

The rule: Each partition is assigned to only one consumer in a group at a time.