What Trino does
Trino is a distributed SQL query engine that can query data in place across Hive/Iceberg tables on S3, PostgreSQL, MySQL, Kafka, Elasticsearch, and more — without moving data into a central warehouse.
Architecture
- Coordinator — parses queries, builds execution plans, assigns tasks.
- Workers — execute plan fragments in parallel. Scale horizontally by adding workers.
- Connectors — translate Trino SQL into source-specific operations.
Federation use case
-- Join a Postgres operational table with an Iceberg data lake table
SELECT o.order_id, c.segment, o.amount
FROM postgres.public.orders o
JOIN iceberg.analytics.customers c ON o.customer_id = c.id
WHERE o.created_at > DATE '2024-01-01';
Performance tips
- Partition pruning — filter on partition columns early.
- Push predicates to connectors — Trino pushes filter conditions to the connector where possible, reducing data transfer.
- Cost-based optimizer (CBO) — run
ANALYZEon tables so Trino has statistics to build better plans.