Overload & Backpressure

One-line summary: How systems handle more load than capacity, and mechanisms to prevent cascading failures.

Prerequisites: Queueing Theory, understanding of request/response model.


Mental Model

What is Overload?

Overload occurs when a system receives more requests than it can process. Without proper handling, overload leads to: - Increased latency - Timeouts - Cascading failures - Complete system failure

The Cascade

flowchart TD Load[High Load] --> Queue[Queue Grows] Queue --> Latency[Latency Increases] Latency --> Timeout[Timeouts] Timeout --> Retry[Clients Retry] Retry --> Load style Load fill:#ff9999 style Queue fill:#ff9999 style Latency fill:#ff9999

Key insight: Without backpressure, overload creates a positive feedback loop that makes things worse.

Backpressure

Backpressure is a mechanism where a system signals upstream components to slow down when it's overloaded.

Principle: It's better to reject some requests gracefully than to accept all requests and fail catastrophically.


Internals & Architecture

Overload Detection

Queue Depth

Latency

Error Rate

Resource Utilization

Backpressure Mechanisms

1. Explicit Backpressure

TCP Flow Control: - Receiver advertises window size - Sender limits data sent - Use case: Network-level backpressure

HTTP 429 (Too Many Requests): - Server returns 429 status code - Client backs off and retries - Use case: Application-level backpressure

gRPC Flow Control: - gRPC uses HTTP/2 flow control - Limits in-flight requests - Use case: RPC-level backpressure

2. Implicit Backpressure

Blocking: - Server blocks when queue is full - Client waits (implicit backpressure) - Problem: Can cause cascading failures

Dropping Requests: - Server drops requests when overloaded - Client gets errors (implicit backpressure) - Problem: Poor user experience

Load Shedding Strategies

1. Random Drop

2. Priority-Based

3. Client-Based

4. Request Type-Based

5. Adaptive


Failure Modes & Blast Radius

Overload Scenarios

10× Normal Load

100× Normal Load

1000× Normal Load (DDoS)

Cascading Failures

Scenario: Service A calls Service B 1. Service B becomes overloaded 2. Service B's latency increases 3. Service A times out waiting for B 4. Service A retries, increasing load on B 5. Service B fails completely 6. Service A fails (no responses from B) 7. Cascades to other services

Prevention: - Circuit breakers: Stop calling failing services - Timeouts: Fail fast instead of waiting - Retry limits: Limit retries - Exponential backoff: Space out retries - Load shedding: Drop requests before overload


Observability Contract

Metrics to Track

Load Metrics

Latency Metrics

Error Metrics

Resource Metrics

Logs

Log events: - Requests dropped due to overload - Backpressure signals sent - Circuit breaker state changes - Load shedding decisions

Traces

Trace: - End-to-end request latency - Time spent in queues - Backpressure delays - Retry attempts

Alerts

Critical alerts: - Queue depth > threshold - P99 latency > threshold - Error rate > threshold - Resource utilization > 90%

Warning alerts: - Queue depth trending up - Latency trending up - Resource utilization > 80%


Change Safety

Implementing Backpressure

1. Add Queue Limits

2. Implement Load Shedding

3. Add Circuit Breakers

4. Set Appropriate Timeouts

Testing Strategy

  1. Load testing: Test behavior under various load levels
  2. Stress testing: Push system beyond capacity
  3. Chaos testing: Inject delays and failures
  4. Backpressure testing: Verify backpressure works correctly

Security Boundaries

Overload itself isn't a security issue, but: - DDoS attacks: Can cause overload - Resource exhaustion: Attackers can fill queues - Mitigation: Rate limiting, DDoS protection, authentication


Tradeoffs

What We Gain with Backpressure

What We Lose

When to Use Backpressure

Alternatives

If backpressure is too complex: - Over-provision: Always have excess capacity (expensive) - Fail fast: Return errors immediately (poor UX) - Accept failures: Let system fail (unreliable)


Operational Considerations

Capacity Planning

Calculate capacity needed: 1. Determine expected peak load 2. Add safety margin (2-3×) 3. Plan for auto-scaling 4. Plan for load shedding

Monitoring & Debugging

Monitor: - Queue depth over time - Latency over time - Error rate over time - Resource utilization over time

Debug overload: 1. Check queue depth: Is queue growing? 2. Check latency: Is latency increasing? 3. Check error rate: Are errors increasing? 4. Check resource utilization: Are resources saturated? 5. Check downstream services: Are they failing?

Incident Response

Common incidents: - Overload detected - Cascading failures - Circuit breaker trips

Response: 1. Scale up (if possible) 2. Load shed (drop low-priority requests) 3. Circuit break (stop calling failing services) 4. Investigate root cause


What Staff Engineers Ask in Reviews

Design Questions

Scale Questions

Operational Questions


Further Reading

Comprehensive Guide: Further Reading: Overload & Backpressure

Quick Links: - "The Tail at Scale" (Dean & Barroso, 2013) - "Site Reliability Engineering" (Google SRE Book) - "Why Do Internet Services Fail?" (Oppenheimer et al., 2003) - Load Shedding & Circuit Breakers - Back to Distributed Systems


Exercises

  1. Design backpressure: Design a system that handles 10× load gracefully. What mechanisms do you use?

  2. Prevent cascades: Service A calls Service B. How do you prevent B's failure from cascading to A?

  3. Load shedding strategy: Design a load shedding strategy for an API that handles both read and write requests. Which requests do you drop first?

Answer Key: View Answers