Three pillars
- Logs — structured JSON to your log aggregator (Loki, CloudWatch, Datadog).
- Metrics — RED metrics (Rate, Errors, Duration) on every endpoint via Prometheus-compatible exporters.
- Traces — OpenTelemetry spans end-to-end so you can see exactly where latency lives.
Alerting
We define SLOs before launch and wire alerts to them, not to raw metrics. You get paged when users are hurting, not when a CPU ticks up 5%.