Timeout
Every external call must have a timeout. Without one, a slow downstream holds threads/goroutines indefinitely. Set timeouts aggressively — it is better to fail fast and return an error than to degrade silently.
Retry with exponential backoff
Retry transient failures (network timeouts, 503s). Do not retry immediately — use exponential backoff with jitter. Add a Retry-After header check for 429 responses. Never retry on 4xx client errors.
Circuit breaker
Tracks the error rate of a downstream call. When it exceeds a threshold (e.g., 50% errors in 10 seconds), the circuit opens — subsequent calls fail immediately without hitting the downstream. After a cooldown period, it enters half-open and allows one test request through.
// Resilience4j (Java) / Polly (.NET) / GoBreaker (Go)
CircuitBreaker.of("payment-service", config)
.executeSupplier(() -> paymentClient.charge(request));
Bulkhead
Isolate thread pools per downstream dependency. If the payment service is slow and exhausts its thread pool, it cannot affect the inventory service's thread pool. Failures are contained.