Search results
12 resultsIntroduction to Data Pipelines
What a data pipeline is, the core stages, and when to build vs buy.
Apache Spark — Core Concepts and When to Use It
RDDs, DataFrames, Spark SQL, and the use cases where Spark is the right tool.
Running Data Workloads on Kubernetes
Spark on K8s, Airflow on K8s, resource requests, and storage patterns.
Feature Stores — Bridging Data Engineering and ML
What a feature store is, online vs offline stores, and when to build vs buy.
Testing Strategy for Data Pipelines
Unit tests, integration tests, data contract tests, and regression testing for pipelines.
Serverless Architecture — When Functions Work and When They Don't
Cold starts, event-driven patterns, cost model, and the right use cases.
Event Sourcing and CQRS — Practical Implementation
Event store design, projection rebuilding, and operational realities.
ETL vs ELT — Which Pattern Should You Use?
Understand the difference between Extract-Transform-Load and Extract-Load-Transform and when each fits.
API Gateway — Responsibilities and Implementation Patterns
Authentication, rate limiting, routing, request aggregation, and when not to use a gateway.
Getting Started with dbt (data build tool)
Models, tests, documentation, and the dbt workflow for transforming warehouse data.
DuckDB — Blazing Fast Local Analytics
When to reach for DuckDB instead of Spark, and how to use it effectively.
Implementing Data Lineage Tracking
Column-level lineage, tools, and why it is critical for debugging and compliance.