Logs
Structured JSON logs, not text. Every log line should be parseable by a machine.
{
"timestamp": "2024-03-01T14:22:01Z",
"level": "error",
"service": "orders-api",
"request_id": "req_01hx9k3p2m",
"user_id": 4821,
"message": "Payment gateway timeout",
"duration_ms": 5003
}
Metrics — RED method
- Rate — requests per second.
- Errors — error rate (%).
- Duration — latency percentiles (P50, P95, P99).
Instrument every HTTP endpoint, queue consumer, and background job with RED metrics. These three numbers tell you whether a service is healthy.
Traces
A trace follows a request across service boundaries. Each step is a span with start time, duration, and metadata. When a request is slow, the trace shows exactly which service and which operation consumed the time.
OpenTelemetry
Use the OpenTelemetry SDK for all three. It is vendor-neutral — export to Jaeger, Tempo, Datadog, Honeycomb, or any OTLP-compatible backend without changing application code.