What Iceberg solves
Traditional Hive tables on S3 have no ACID transactions, no schema evolution, and terrible performance on large datasets (full partition scans). Iceberg adds all of this while keeping data in open Parquet/ORC files.
Snapshots and time travel
Every write creates a new snapshot. You can query any historical snapshot:
SELECT * FROM orders
FOR SYSTEM_TIME AS OF '2024-01-15 00:00:00';
Snapshots also enable safe concurrent writes — readers are never blocked by writers.
Schema evolution
Add, rename, or drop columns without rewriting data files. Iceberg tracks column IDs, not names, so renaming a column does not break readers of old files.
Partition evolution
Change the partitioning strategy without rewriting data. Old data retains its old partitioning; new data uses the new partitioning. Queries read both transparently.
Compaction
Streaming writes produce many small files. Run compaction periodically to merge them into larger files for faster reads: CALL system.rewrite_data_files('orders').