The six dimensions

  • Completeness — are required fields populated?
  • Accuracy — do values reflect reality?
  • Consistency — do related records agree?
  • Timeliness — is data arriving within SLA?
  • Uniqueness — are there unexpected duplicates?
  • Validity — do values conform to expected formats and ranges?

Validation layers

  1. Source validation — check incoming data before it enters the pipeline.
  2. Pipeline validation — Great Expectations or custom assertions mid-pipeline.
  3. Warehouse validation — dbt tests on modelled data.
  4. Dashboard validation — metric monitors that alert when KPIs move unexpectedly.

Great Expectations

Define expectations as code: expect_column_values_to_not_be_null, expect_column_values_to_be_between. Run them as checkpoint steps in your pipeline. Results are stored and browsable in a Data Docs site.

Alerting

Set up anomaly detection on row counts and null rates per table. A sudden 40% drop in row count is usually a pipeline failure, not a business event.