What is a data pipeline?
A data pipeline is a series of processing steps that move data from one or more sources to a destination — transforming, validating, and enriching it along the way.
Core stages
- Ingest — pull or receive data from sources (APIs, databases, files, streams).
- Validate — check schema, nulls, ranges, and referential integrity.
- Transform — clean, normalise, join, aggregate.
- Load — write to the destination (warehouse, lake, cache).
- Monitor — track latency, row counts, and data quality metrics.
Build vs buy
Use managed tools (Fivetran, Airbyte) for standard connector needs. Build custom pipelines when sources are proprietary, latency requirements are tight, or transformation logic is complex enough that a generic tool becomes a liability.