What Kafka is
Kafka is a distributed, append-only log. Producers write events to topics; consumers read from them at their own pace. Events are retained for a configurable period regardless of consumption.
Core concepts
- Topic — a named, ordered log of events.
- Partition — a topic is split into partitions for parallelism. Each partition is ordered; cross-partition ordering is not guaranteed.
- Consumer group — multiple consumers sharing a topic. Each partition is consumed by exactly one member of the group at a time, enabling horizontal scale-out.
- Offset — a consumer's position in a partition. Committed offsets enable exactly-once or at-least-once processing guarantees.
When Kafka is the right choice
- High-throughput event streaming (millions of events/sec).
- Multiple independent consumers need the same event stream.
- Event replay is required (audit, reprocessing after a bug fix).
- Decoupling producers from consumers across teams.
When Kafka is overkill
Simple job queues, low-volume webhooks, or single-consumer pipelines. SQS, RabbitMQ, or a Postgres-backed queue are simpler and cheaper for these cases.