Search results
12 resultsIntroduction to Data Pipelines
What a data pipeline is, the core stages, and when to build vs buy.
Getting Started with dbt (data build tool)
Models, tests, documentation, and the dbt workflow for transforming warehouse data.
DuckDB — Blazing Fast Local Analytics
When to reach for DuckDB instead of Spark, and how to use it effectively.
Feature Stores — Bridging Data Engineering and ML
What a feature store is, online vs offline stores, and when to build vs buy.
API Gateway — Responsibilities and Implementation Patterns
Authentication, rate limiting, routing, request aggregation, and when not to use a gateway.
Testing Strategy for Data Pipelines
Unit tests, integration tests, data contract tests, and regression testing for pipelines.
Implementing Data Lineage Tracking
Column-level lineage, tools, and why it is critical for debugging and compliance.
ETL vs ELT — Which Pattern Should You Use?
Understand the difference between Extract-Transform-Load and Extract-Load-Transform and when each fits.
Apache Spark — Core Concepts and When to Use It
RDDs, DataFrames, Spark SQL, and the use cases where Spark is the right tool.
Event Sourcing and CQRS — Practical Implementation
Event store design, projection rebuilding, and operational realities.
Running Data Workloads on Kubernetes
Spark on K8s, Airflow on K8s, resource requests, and storage patterns.
Serverless Architecture — When Functions Work and When They Don't
Cold starts, event-driven patterns, cost model, and the right use cases.