Article Data & Platform

Data Platform Cost Optimization Strategies

Reducing Snowflake, S3, Spark, and Kafka spend without sacrificing performance.

Updated May 24, 2026 47 views

Snowflake

Auto-suspend warehouses after 60 seconds idle — often cuts compute spend 40–60%.
Use result caching — identical queries within 24h cost zero compute.
Query profile expensive queries. A 10-minute query is often fixable to 30 seconds with a filter push or materialised table.

S3

Compress with Snappy or ZSTD — typical 5–8× size reduction from raw CSV.
Right-size storage class with lifecycle policies — Standard-IA for data accessed less than monthly.
Enforce column pruning — ensure query engines are not reading unused columns.

Spark

Use Spot/Preemptible instances for batch jobs — 60–80% cheaper. Add checkpointing for resilience.
Right-size executors — over-provisioned executor memory wastes money. Profile with Spark UI.

Kafka

Set appropriate retention — not everything needs 7-day retention. Compacted topics for reference data.
Enable compression (lz4 or snappy) at the producer level — reduces network and storage cost.