Idempotency is non-negotiable

Every task must be safe to re-run. If a task fails mid-way and is retried, it should not create duplicate rows or corrupt state. Use UPSERT (INSERT ... ON CONFLICT) over plain INSERT; write to a staging path then atomically swap.

Avoid dynamic DAGs where possible

DAGs generated at parse time from database queries slow the scheduler. Prefer static DAGs with dynamic task mapping (Airflow 2.3+) using expand() instead.

Backfilling

Design DAGs with start_date and use logical_date (formerly execution_date) in your queries. This enables clean backfills: airflow dags backfill -s 2024-01-01 -e 2024-03-01 my_dag.

SLA monitoring

Set sla on tasks that have latency commitments. Airflow calls sla_miss_callback when a task exceeds its SLA — wire this to Slack or PagerDuty.

Common pitfalls

  • Top-level code in DAG files runs at every parse cycle (every 30 seconds by default) — never make DB calls at import time.
  • Do not use depends_on_past=True without understanding its backfill implications.