PII classification
Classify data before designing storage. Direct identifiers (name, email, SSN) require the highest protection. Quasi-identifiers (zip, DOB, gender) can re-identify when combined — treat them carefully.
Pseudonymisation
Replace identifying values with tokens. A mapping table (kept in a restricted system) allows re-identification when legally necessary. Analytics datasets use tokens only — an analyst can segment by user without seeing email addresses.
Tokenisation in practice
-- Production table
users: id (UUID), email_token, created_at
-- Restricted PII vault (separate DB, separate access)
pii_vault: id, email_token, email_plain, created_at
Right to deletion
GDPR Article 17. Design deletion from day one: maintain a list of all tables and columns containing a given user's PII. When deletion is requested, wipe the PII vault row. Pseudonymised records in analytics tables become anonymous automatically.
Encryption at rest
Warehouse-level encryption (Snowflake, BigQuery default) covers the platform. For column-level encryption of high-risk fields, use application-layer encryption with keys managed in AWS KMS or HashiCorp Vault.