Search results
13 resultsPrivacy-First Data Design — PII Handling Patterns
Tokenisation, pseudonymisation, encryption at rest, and right-to-deletion workflows.
Getting Started with dbt (data build tool)
Models, tests, documentation, and the dbt workflow for transforming warehouse data.
DuckDB — Blazing Fast Local Analytics
When to reach for DuckDB instead of Spark, and how to use it effectively.
API Documentation Best Practices
What makes documentation useful, tooling, and keeping docs accurate.
Snowflake Best Practices for Cost and Performance
Virtual warehouses, clustering, query optimization, and controlling spend.
Trino (formerly PrestoSQL) — Federated SQL Across Data Sources
Architecture, connectors, query federation, and performance tuning.
Designing a Data Lake on AWS S3
Folder structure, naming conventions, lifecycle policies, and access patterns.
Real-Time Analytics Architecture Patterns
Lambda, Kappa, HTAP, and choosing the right pattern for sub-second analytics.
ETL vs ELT — Which Pattern Should You Use?
Understand the difference between Extract-Transform-Load and Extract-Load-Transform and when each fits.
Time-Series Databases — InfluxDB vs TimescaleDB vs ClickHouse
Comparing purpose-built and general-purpose solutions for time-series data.
Parquet vs CSV — Why Columnar Storage Matters
How Parquet's columnar format reduces storage costs and speeds up analytical queries.
Building a Data Catalog with DataHub
Ingestion, metadata, search, and making your catalog actually useful.
PostgreSQL Replication — Streaming, Logical, and Read Replicas
Set up read replicas, understand WAL, and choose between streaming and logical replication.