Data Warehouse
Structured, schema-on-write. Data is modelled before loading. Fast SQL queries, enforced data quality. Examples: Snowflake, BigQuery, Redshift. Best when: BI and reporting are the primary use case and data is structured.
Data Lake
Schema-on-read. Raw files (Parquet, JSON, CSV) stored in object storage (S3, GCS). Cheap at scale. Best when: ML training data, logs, and semi-structured data are the primary use case.
Lakehouse
Combines both. Open table formats (Delta Lake, Apache Iceberg, Apache Hudi) add ACID transactions, schema enforcement, and time travel to a data lake. Query engines like Spark, Trino, and DuckDB read them directly. Examples: Databricks, Apache Iceberg on S3.
Recommendation
Start with a warehouse if your team primarily writes SQL and builds dashboards. Add a lake layer when you have ML workloads, need raw data retention, or storage cost becomes a constraint. The lakehouse pattern is the destination for most mature data platforms.