The schema evolution problem
Without schema management, a producer changing a field name silently breaks all consumers. In a streaming pipeline, this is discovered in production — often hours later when downstream tables have corrupt data.
Apache Avro
Avro is a binary serialisation format with an embedded schema. The schema is defined in JSON and versioned. Forward and backward compatibility rules are enforced at registration time.
Confluent Schema Registry
A REST service that stores and versions schemas. Producers register a schema before sending; consumers fetch it by ID from the message header. Incompatible changes are rejected at registration — the pipeline never starts if the schema breaks a consumer.
Compatibility modes
- BACKWARD — new schema can read old data. Safe for consumers to upgrade first.
- FORWARD — old schema can read new data. Safe for producers to upgrade first.
- FULL — both. Required for zero-downtime rolling upgrades.