The schema evolution problem

Without schema management, a producer changing a field name silently breaks all consumers. In a streaming pipeline, this is discovered in production — often hours later when downstream tables have corrupt data.

Apache Avro

Avro is a binary serialisation format with an embedded schema. The schema is defined in JSON and versioned. Forward and backward compatibility rules are enforced at registration time.

Confluent Schema Registry

A REST service that stores and versions schemas. Producers register a schema before sending; consumers fetch it by ID from the message header. Incompatible changes are rejected at registration — the pipeline never starts if the schema breaks a consumer.

Compatibility modes

  • BACKWARD — new schema can read old data. Safe for consumers to upgrade first.
  • FORWARD — old schema can read new data. Safe for producers to upgrade first.
  • FULL — both. Required for zero-downtime rolling upgrades.