PII classification

Classify data before designing storage. Direct identifiers (name, email, SSN) require the highest protection. Quasi-identifiers (zip, DOB, gender) can re-identify when combined — treat them carefully.

Pseudonymisation

Replace identifying values with tokens. A mapping table (kept in a restricted system) allows re-identification when legally necessary. Analytics datasets use tokens only — an analyst can segment by user without seeing email addresses.

Tokenisation in practice

-- Production table
users: id (UUID), email_token, created_at

-- Restricted PII vault (separate DB, separate access)
pii_vault: id, email_token, email_plain, created_at

Right to deletion

GDPR Article 17. Design deletion from day one: maintain a list of all tables and columns containing a given user's PII. When deletion is requested, wipe the PII vault row. Pseudonymised records in analytics tables become anonymous automatically.

Encryption at rest

Warehouse-level encryption (Snowflake, BigQuery default) covers the platform. For column-level encryption of high-risk fields, use application-layer encryption with keys managed in AWS KMS or HashiCorp Vault.