What Trino does

Trino is a distributed SQL query engine that can query data in place across Hive/Iceberg tables on S3, PostgreSQL, MySQL, Kafka, Elasticsearch, and more — without moving data into a central warehouse.

Architecture

  • Coordinator — parses queries, builds execution plans, assigns tasks.
  • Workers — execute plan fragments in parallel. Scale horizontally by adding workers.
  • Connectors — translate Trino SQL into source-specific operations.

Federation use case

-- Join a Postgres operational table with an Iceberg data lake table
SELECT o.order_id, c.segment, o.amount
FROM postgres.public.orders o
JOIN iceberg.analytics.customers c ON o.customer_id = c.id
WHERE o.created_at > DATE '2024-01-01';

Performance tips

  • Partition pruning — filter on partition columns early.
  • Push predicates to connectors — Trino pushes filter conditions to the connector where possible, reducing data transfer.
  • Cost-based optimizer (CBO) — run ANALYZE on tables so Trino has statistics to build better plans.