Why governance fails
Governance initiatives fail when they become pure bureaucracy. Effective governance is infrastructure that makes doing the right thing the easiest thing.
Data ownership
Every dataset needs an owner — a team, not an individual. The owner is responsible for quality SLAs, documentation, and access requests. Without ownership, everything drifts.
Data catalog
A searchable inventory of datasets with descriptions, owners, lineage, and sample data. Open-source options: DataHub, Apache Atlas. Managed: Alation, Collibra. Start with DataHub — it has pre-built connectors for most warehouses and has strong lineage support.
Column-level lineage
Track which source columns contribute to which output columns. Critical for GDPR compliance (where did this PII come from?) and debugging (why did this metric change?).
Access control
Role-based access at the warehouse level (Snowflake roles, BigQuery IAM). Data masking for PII — analysts see ****@****.com instead of real emails unless they have PII role. Audit logs on all access.