Why IaC for data platforms
Data platforms span dozens of cloud resources: S3 buckets, IAM roles, Redshift clusters, Glue jobs, Lambda functions. Without IaC, environments drift and recreating a production environment after disaster is days of manual work.
Terraform basics
resource "aws_s3_bucket" "data_lake" {
bucket = "my-data-lake-${var.env}"
tags = {
Environment = var.env
Owner = "data-platform"
}
}
resource "aws_s3_bucket_versioning" "data_lake" {
bucket = aws_s3_bucket.data_lake.id
versioning_configuration { status = "Enabled" }
}
State management
Store Terraform state in S3 with DynamoDB locking. Never commit state files to git — they contain secrets.
Module pattern
Create reusable modules for common patterns: modules/data-lake-bucket encapsulates the bucket, versioning, lifecycle, and IAM policies. Environments instantiate modules with different variable values.