Logs

Structured JSON logs, not text. Every log line should be parseable by a machine.

{
  "timestamp": "2024-03-01T14:22:01Z",
  "level": "error",
  "service": "orders-api",
  "request_id": "req_01hx9k3p2m",
  "user_id": 4821,
  "message": "Payment gateway timeout",
  "duration_ms": 5003
}

Metrics — RED method

  • Rate — requests per second.
  • Errors — error rate (%).
  • Duration — latency percentiles (P50, P95, P99).

Instrument every HTTP endpoint, queue consumer, and background job with RED metrics. These three numbers tell you whether a service is healthy.

Traces

A trace follows a request across service boundaries. Each step is a span with start time, duration, and metadata. When a request is slow, the trace shows exactly which service and which operation consumed the time.

OpenTelemetry

Use the OpenTelemetry SDK for all three. It is vendor-neutral — export to Jaeger, Tempo, Datadog, Honeycomb, or any OTLP-compatible backend without changing application code.