What CDC is
CDC captures row-level changes (INSERT/UPDATE/DELETE) from a source database and streams them to consumers in real time. The gold standard for keeping downstream systems in sync without impacting the source.
Why log-based CDC beats polling
- Polling queries the source table on a schedule. Misses deletes. Creates load. Has latency equal to the poll interval.
- Log-based CDC reads the database replication log (Postgres WAL, MySQL binlog). Zero impact on source performance. Captures deletes. Near-real-time.
Debezium
Debezium is the standard open-source CDC tool. It runs as a Kafka Connect connector, reading the database log and emitting change events to Kafka topics.
// Event shape
{
"op": "u", // c=create, u=update, d=delete, r=read(snapshot)
"before": { "id": 1, "status": "pending" },
"after": { "id": 1, "status": "shipped" },
"source": { "ts_ms": 1709000000000, "table": "orders" }
}
Use cases
- Replicating OLTP data to a warehouse in near real-time.
- Invalidating caches on data change.
- Event sourcing — building event streams from existing databases.