Why IaC for data platforms

Data platforms span dozens of cloud resources: S3 buckets, IAM roles, Redshift clusters, Glue jobs, Lambda functions. Without IaC, environments drift and recreating a production environment after disaster is days of manual work.

Terraform basics

resource "aws_s3_bucket" "data_lake" {
  bucket = "my-data-lake-${var.env}"
  tags = {
    Environment = var.env
    Owner       = "data-platform"
  }
}

resource "aws_s3_bucket_versioning" "data_lake" {
  bucket = aws_s3_bucket.data_lake.id
  versioning_configuration { status = "Enabled" }
}

State management

Store Terraform state in S3 with DynamoDB locking. Never commit state files to git — they contain secrets.

Module pattern

Create reusable modules for common patterns: modules/data-lake-bucket encapsulates the bucket, versioning, lifecycle, and IAM policies. Environments instantiate modules with different variable values.