What CDC is

CDC captures row-level changes (INSERT/UPDATE/DELETE) from a source database and streams them to consumers in real time. The gold standard for keeping downstream systems in sync without impacting the source.

Why log-based CDC beats polling

  • Polling queries the source table on a schedule. Misses deletes. Creates load. Has latency equal to the poll interval.
  • Log-based CDC reads the database replication log (Postgres WAL, MySQL binlog). Zero impact on source performance. Captures deletes. Near-real-time.

Debezium

Debezium is the standard open-source CDC tool. It runs as a Kafka Connect connector, reading the database log and emitting change events to Kafka topics.

// Event shape
{
  "op": "u",  // c=create, u=update, d=delete, r=read(snapshot)
  "before": { "id": 1, "status": "pending" },
  "after":  { "id": 1, "status": "shipped" },
  "source": { "ts_ms": 1709000000000, "table": "orders" }
}

Use cases

  • Replicating OLTP data to a warehouse in near real-time.
  • Invalidating caches on data change.
  • Event sourcing — building event streams from existing databases.