Knowledge Base

 Results for "Spark SQL"

Articles, FAQs, project case studies, and service deep-dives.

Main site

Search results

25 results
Article Product Engineering ★ Featured

Database Schema Migration Strategies

Expand-contract pattern, zero-downtime migrations, and tooling.

database migration expand-contract zero downtime Flyway Liquibase
47 views May 24, 2026
Article Data & Platform ★ Featured

Graph Databases — When to Use Neo4j Over Relational

Nodes, edges, Cypher queries, and use cases where graph beats SQL.

Neo4j graph database Cypher knowledge graph fraud detection
46 views May 24, 2026
Article Data & Platform ★ Featured

Choosing a vector database: pgvector vs Pinecone vs Weaviate

A practical comparison across dimensions that matter for production RAG systems.

vector database pgvector Pinecone Weaviate embeddings
51 views May 24, 2026
Article Product Engineering ★ Featured

SQL Query Optimisation — Indexes, Execution Plans, and N+1

Practical techniques for making slow queries fast.

SQL query optimisation indexes N+1 EXPLAIN
45 views May 24, 2026
Article Product Engineering ★ Featured

Secure Coding — OWASP Top 10 for Backend Engineers

Injection, broken auth, XSS, IDOR, and how to prevent each.

OWASP security SQL injection XSS IDOR
46 views May 19, 2026
Article Data & Platform

Designing a Data Lake on AWS S3

Folder structure, naming conventions, lifecycle policies, and access patterns.

S3 data lake AWS partitioning lifecycle
42 views May 24, 2026
Article Data & Platform

Data Lake vs Data Warehouse vs Lakehouse

Practical comparison of the three architectures and how to choose.

data lake data warehouse lakehouse Delta Lake Iceberg
50 views May 24, 2026
Article Data & Platform

Batch vs Streaming Pipelines — Choosing the Right Pattern

Lambda architecture, Kappa architecture, and practical guidance for choosing.

batch streaming Lambda architecture Kappa architecture Flink
52 views May 24, 2026
Article Data & Platform

Getting Started with dbt (data build tool)

Models, tests, documentation, and the dbt workflow for transforming warehouse data.

dbt data build tool ELT SQL transformation
53 views May 24, 2026
Article Data & Platform

Apache Spark — Core Concepts and When to Use It

RDDs, DataFrames, Spark SQL, and the use cases where Spark is the right tool.

Spark Apache Spark DataFrames distributed compute Spark SQL
44 views May 24, 2026
Article Data & Platform

ETL vs ELT — Which Pattern Should You Use?

Understand the difference between Extract-Transform-Load and Extract-Load-Transform and when each fits.

ETL ELT data warehouse dbt Snowflake
44 views May 24, 2026
Article Data & Platform

Trino (formerly PrestoSQL) — Federated SQL Across Data Sources

Architecture, connectors, query federation, and performance tuning.

Trino Presto federated query SQL Iceberg
48 views May 24, 2026
Article Data & Platform

Real-Time Analytics Architecture Patterns

Lambda, Kappa, HTAP, and choosing the right pattern for sub-second analytics.

real-time analytics ClickHouse Druid Flink HTAP
44 views May 24, 2026
Article Data & Platform

Time-Series Databases — InfluxDB vs TimescaleDB vs ClickHouse

Comparing purpose-built and general-purpose solutions for time-series data.

time-series InfluxDB TimescaleDB ClickHouse metrics
40 views May 24, 2026
Article Data & Platform

Running Data Workloads on Kubernetes

Spark on K8s, Airflow on K8s, resource requests, and storage patterns.

Kubernetes K8s Spark Airflow KubernetesExecutor
49 views May 24, 2026
Article Data & Platform

DuckDB — Blazing Fast Local Analytics

When to reach for DuckDB instead of Spark, and how to use it effectively.

DuckDB analytics local Parquet S3
44 views May 24, 2026
Article Product Engineering

API Pagination — Cursor, Offset, and Keyset Patterns

When each method works, performance tradeoffs, and implementation details.

pagination cursor offset keyset API design
44 views May 24, 2026
Article Data & Platform

Stream Processing with Apache Flink

Event time vs processing time, windows, stateful operators, and production deployment.

Flink stream processing event time watermarks windows
44 views May 24, 2026
Article Data & Platform

BigQuery Cost and Performance Optimization

Partitioned tables, clustered tables, slot usage, and avoiding full scans.

BigQuery GCP partitioning clustering cost optimization
43 views May 24, 2026
Article Data & Platform

Orchestrating Pipelines with Apache Airflow

DAGs, operators, scheduling, and production best practices for Airflow.

Airflow orchestration DAG scheduling pipeline
45 views May 23, 2026
Article Product Engineering

Database Connection Patterns in PHP

PDO, prepared statements, connection pooling, and transaction management.

PHP PDO prepared statements transactions database
40 views May 23, 2026
Article Data & Platform

Migrating from MySQL to PostgreSQL

Schema translation, data migration, and common incompatibilities to address.

MySQL PostgreSQL migration pgloader schema translation
45 views May 23, 2026
Article Data & Platform

Data Platform Cost Optimization Strategies

Reducing Snowflake, S3, Spark, and Kafka spend without sacrificing performance.

cost optimization Snowflake S3 Spark Kafka
45 views May 22, 2026
Article Data & Platform

Implementing Data Lineage Tracking

Column-level lineage, tools, and why it is critical for debugging and compliance.

data lineage OpenLineage DataHub dbt column lineage
44 views May 19, 2026
Article Product Engineering

Implementing Search — From Basic SQL to Elasticsearch

Full-text search progression from LIKE queries to dedicated search engines.

search full-text search Elasticsearch PostgreSQL vector search
43 views May 18, 2026