Consensus & Leases

One-line summary: Understanding consensus algorithms (Paxos, Raft) and leases for coordination in distributed systems.

Prerequisites: Time, Ordering, Causality, understanding of distributed coordination.


Mental Model

Consensus Problem

Consensus: Multiple nodes agree on a single value.

Requirements: - Agreement: All nodes agree on same value - Validity: Value must be proposed by some node - Termination: Algorithm eventually terminates - Integrity: Value decided at most once

graph LR Proposer[Proposer] --> Nodes[Nodes] Nodes --> Consensus[Consensus
Agreed Value] style Proposer fill:#99ccff style Consensus fill:#ffcc99

Key insight: Consensus enables coordination without a single leader, but is expensive.

Leases

Lease: Time-limited exclusive access to a resource.

Properties: - Exclusive: Only one holder at a time - Time-limited: Expires after duration - Renewable: Can be renewed before expiry

Use case: Lighter-weight than consensus for coordination.


Internals & Architecture

Paxos

Paxos: Classic consensus algorithm.

Phases: 1. Prepare: Proposer sends prepare request with proposal number 2. Promise: Acceptors promise not to accept lower-numbered proposals 3. Accept: Proposer sends accept request with value 4. Accepted: Acceptors accept value if no higher-numbered proposal

Properties: - Safety: Only one value can be chosen - Liveness: Eventually chooses a value (if majority available)

Use case: Foundation for many consensus implementations (e.g., Spanner).

Raft

Raft: Understandable consensus algorithm.

Components: - Leader: Single leader handles all client requests - Followers: Replicate leader's log - Candidate: Node seeking to become leader

Phases: 1. Leader election: Elect leader when no leader 2. Log replication: Leader replicates log to followers 3. Safety: Ensure consistency

Properties: - Easier to understand: More understandable than Paxos - Same guarantees: Safety and liveness

Use case: Used in etcd, Consul, and many systems.

Leases

Lease mechanism: 1. Acquire: Request lease from lease server 2. Hold: Use resource while lease valid 3. Renew: Renew lease before expiry 4. Release: Release lease when done

Lease server: Manages leases and ensures exclusivity.

Properties: - Lighter-weight: Less overhead than consensus - Time-based: Relies on time, not consensus - Fault-tolerant: Lease server can be replicated

Use case: Coordination when consensus is too expensive.


Failure Modes & Blast Radius

Consensus Failures

Scenario 1: Split-Brain

Scenario 2: Leader Failure

Lease Failures

Scenario 1: Lease Expiry


Observability Contract

Metrics

Alerts


Change Safety

Consensus Configuration Changes


Tradeoffs

Consensus vs Leases

Consensus: - Pros: Strong guarantees, no single point of failure - Cons: Higher latency, more complex

Leases: - Pros: Lower latency, simpler - Cons: Weaker guarantees, relies on time


Operational Considerations

Best Practices

  1. Choose right mechanism: Consensus for strong guarantees, leases for coordination
  2. Monitor consensus: Track consensus health
  3. Handle failures: Fast recovery from failures
  4. Prevent split-brain: Require majority for consensus

What Staff Engineers Ask in Reviews


Further Reading

Comprehensive Guide: Further Reading: Consensus & Leases

Quick Links: - "The Part-Time Parliament" (Lamport, 1998) - Paxos paper - "In Search of an Understandable Consensus Algorithm" (Ongaro & Ousterhout, 2014) - Raft paper - Replication Strategies - Spanner: Consistency & Performance - Back to Distributed Systems


Exercises

  1. Choose consensus: When do you use consensus vs leases? What are the tradeoffs?

  2. Handle leader failure: Your consensus system loses its leader. How does it recover?

  3. Design leases: Design a lease system for coordinating access to a shared resource.

Answer Key: View Answers