Core concepts

  • DAG — Directed Acyclic Graph. A Python file defining tasks and their dependencies.
  • Operator — a single task type: PythonOperator, BashOperator, SQLExecuteQueryOperator, etc.
  • Task instance — one execution of one operator for one DAG run.
  • Scheduler — evaluates DAGs and creates task instances based on schedule and dependencies.

A minimal DAG

from airflow.decorators import dag, task
from datetime import datetime

@dag(schedule="0 6 * * *", start_date=datetime(2024, 1, 1))
def my_pipeline():
    @task
    def extract(): return fetch_data()

    @task
    def load(data): write_to_warehouse(data)

    load(extract())

my_pipeline()

Production tips

  • Use the TaskFlow API (decorators) for new DAGs — much cleaner than classic operators.
  • Store secrets in Airflow Connections/Variables or a secrets backend (AWS Secrets Manager, Vault).
  • Set max_active_runs=1 on pipelines that are not safe to run concurrently.
  • Use on_failure_callback to send alerts to Slack or PagerDuty.