Search results
16 resultsData Governance — Principles and Practical Implementation
Ownership, cataloguing, lineage tracking, and access control at scale.
Privacy-First Data Design — PII Handling Patterns
Tokenisation, pseudonymisation, encryption at rest, and right-to-deletion workflows.
REST API Versioning Strategies
URL path, header, and query-param versioning compared with real-world tradeoffs.
Getting Started with dbt (data build tool)
Models, tests, documentation, and the dbt workflow for transforming warehouse data.
DuckDB — Blazing Fast Local Analytics
When to reach for DuckDB instead of Spark, and how to use it effectively.
Snowflake Best Practices for Cost and Performance
Virtual warehouses, clustering, query optimization, and controlling spend.
Designing a Data Lake on AWS S3
Folder structure, naming conventions, lifecycle policies, and access patterns.
The Twelve-Factor App — Principles for Modern Services
How the twelve factors apply to real production services today.
GraphQL vs REST — When to Use Each
Comparing query flexibility, over-fetching, tooling, and operational complexity.
API Testing Strategy — Unit, Integration, Contract, and E2E
Building a test pyramid that catches real bugs without slowing delivery.
Building a Data Catalog with DataHub
Ingestion, metadata, search, and making your catalog actually useful.
API Documentation Best Practices
What makes documentation useful, tooling, and keeping docs accurate.
ETL vs ELT — Which Pattern Should You Use?
Understand the difference between Extract-Transform-Load and Extract-Load-Transform and when each fits.
Parquet vs CSV — Why Columnar Storage Matters
How Parquet's columnar format reduces storage costs and speeds up analytical queries.
Load Testing with k6
Script a realistic load test, interpret results, and find bottlenecks before they find users.
Trino (formerly PrestoSQL) — Federated SQL Across Data Sources
Architecture, connectors, query federation, and performance tuning.