📘 What Kafka Is
- Kafka is a distributed, durable, replayable log that decouples producers and consumers
- Companies use Kafka for real‑time data pipelines, event‑driven microservices, analytics, and fault‑tolerant streaming at scale.
- While Queues deliver messages once and forget, Kafka stores events in a log (append only file), allowing replay, multiple consumers, and ordering guarantees.
- 👉 Topics = named logs. Partitions =subsets of a topic [unique Id] each storing events in order. A set of consumers can share partitions and parallel process logs
- Kafka ensure fault tolerance using Replication of partitions across brokers (server storing partition). Each partition has a leader and followers. If leader fails, a follower takes over.
- Offsets : A consumer’s position in a partition log. Offsets enable replay and exactly‑once/at‑least‑once semantics.
- Producers: write messages to topics. Consumers: read messages; track offsets. Consumer Groups: multiple consumers share partitions for parallelism.
- ZooKeeper / KRaft: metadata management (cluster coordination).
- Retention policies: messages kept for X days or until size limit.
- Compaction: removes old values for same key, keeps latest.
🛒 Real‑World Behavior
- Example: Ecommerce site
- Producers: checkout, order, payments, click services.
- Kafka topics:
orders, payments, clicks.
- Consumers: analytics, email, fraud detection, ML.
🦸 Superpowers
- Backpressure handled → slow consumers don’t break system.
- Historical replay → reprocess past data.
- Exactly‑once or at‑least‑once semantics.
- Massive throughput (millions/sec).