Spanner: Consistency & Performance

One-line summary: Deep dive into Spanner's TrueTime, distributed transactions, external consistency guarantees, and how to design schemas for scale.

Prerequisites: Consensus & Leases, Replication Strategies, Basic database concepts (ACID, transactions).


Mental Model

Spanner Architecture

flowchart TB Client[Client] --> LB[Load Balancer] LB --> Spanner[Spanner Instance] Spanner --> Zone1[Zone 1
Replicas] Spanner --> Zone2[Zone 2
Replicas] Spanner --> Zone3[Zone 3
Replicas] Zone1 --> Paxos1[Paxos
Consensus] Zone2 --> Paxos2[Paxos
Consensus] Zone3 --> Paxos3[Paxos
Consensus] Paxos1 --> TrueTime[TrueTime
Clock] Paxos2 --> TrueTime Paxos3 --> TrueTime style Spanner fill:#99ccff style TrueTime fill:#ffcc99 style Paxos1 fill:#99ff99 style Paxos2 fill:#99ff99 style Paxos3 fill:#99ff99

Key insight: Spanner provides external consistency (stronger than serializability) using TrueTime and Paxos. Understanding TrueTime is critical for understanding Spanner's guarantees.

Consistency Model

External Consistency: Strongest consistency guarantee. - Definition: All transactions appear to execute in a global order - Guarantee: If transaction T1 commits before T2 starts, T2 sees T1's writes - Use case: Financial transactions, inventory management

Serializability: Standard database consistency. - Definition: Transactions appear to execute serially - Guarantee: No anomalies (dirty reads, lost updates) - Use case: General-purpose applications


Internals & Architecture

TrueTime

TrueTime: Spanner's distributed clock.

Components: - GPS clocks: Atomic clocks synchronized via GPS - Atomic clocks: High-precision clocks - Time uncertainty: Bounded clock uncertainty (ε)

Time representation: TTinterval = [earliest, latest] - earliest: Earliest possible time - latest: Latest possible time - ε: Clock uncertainty (typically < 7ms)

Why TrueTime? - External consistency: Enables external consistency without blocking - Global ordering: Provides global transaction ordering - Low latency: Avoids expensive coordination

Paxos Consensus

Paxos: Consensus algorithm for replication.

How it works: 1. Leader election: Elect leader for each Paxos group 2. Proposal: Leader proposes writes 3. Voting: Replicas vote on proposals 4. Commit: Commit when majority agrees

Replication: - Multi-region: Replicas across regions - Quorum: Majority of replicas must agree - Durability: Data durable once committed

Distributed Transactions

Two-Phase Commit (2PC)

Process: 1. Prepare phase: Coordinator asks participants to prepare 2. Commit phase: Coordinator commits if all prepared, aborts otherwise

Spanner's 2PC: - Paxos per shard: Each shard uses Paxos - Coordinator: Coordinates across shards - TrueTime: Uses TrueTime for commit timestamps

Read-Write Transactions

Process: 1. Read: Read from consistent snapshot 2. Write: Buffer writes locally 3. Commit: Two-phase commit with TrueTime timestamp 4. External consistency: Commit timestamp ensures external consistency

Example:

T1: Write X=1, commit at t1
T2: Starts at t2 > t1, reads X, sees X=1

Read-Only Transactions

Process: 1. Snapshot: Read from consistent snapshot 2. No locks: No locks needed (read-only) 3. Low latency: Can read from nearby replica

Use case: Analytics, reporting, read-heavy workloads

Schema Design

Tables

Tables: Collections of rows.

Schema:

CREATE TABLE Users (
  UserId INT64 NOT NULL,
  Name STRING(MAX),
  Email STRING(MAX),
  CreatedAt TIMESTAMP,
) PRIMARY KEY (UserId);

Primary key: Required, determines data distribution.

Interleaving

Interleaved tables: Co-locate related data.

Example:

CREATE TABLE Orders (
  OrderId INT64 NOT NULL,
  UserId INT64 NOT NULL,
  Total FLOAT64,
) PRIMARY KEY (UserId, OrderId),
  INTERLEAVE IN PARENT Users ON DELETE CASCADE;

Benefits: - Co-location: Related data on same server - Performance: Faster joins, better locality - Transactions: Can update parent and child in same transaction

Secondary Indexes

Indexes: Improve query performance.

Types: - Storing indexes: Store additional columns (covering indexes) - Null-filtered indexes: Filter NULL values - Array indexes: Index array elements

Tradeoffs: - Write cost: Indexes increase write cost - Storage cost: Indexes consume storage - Query performance: Indexes improve query performance

Performance Characteristics

Latency

Read latency: - Single region: P95 < 10ms - Multi-region: P95 < 50ms (depends on distance)

Write latency: - Single region: P95 < 10ms - Multi-region: P95 < 100ms (2PC overhead)

Throughput

Read throughput: - Scales linearly: With number of nodes - Bottleneck: Usually network bandwidth

Write throughput: - Scales linearly: With number of nodes - Bottleneck: Usually 2PC coordination overhead

Scalability

Limits: - Database size: Petabytes - Nodes: Thousands of nodes - QPS: Millions of queries per second - Regions: Global distribution


Failure Modes & Blast Radius

Spanner Failures

Scenario 1: Regional Outage

Scenario 2: Leader Failure

Scenario 3: TrueTime Failure

Consistency Failures

Scenario 1: Clock Skew

Scenario 2: Network Partition

Performance Failures

Scenario 1: Hot Spots

Scenario 2: Transaction Conflicts

Overload Scenarios

10× Normal Load

100× Normal Load


Observability Contract

Metrics to Track

Database Metrics

Node Metrics

Replication Metrics

Logs

Spanner logs: - Query logs (if enabled) - Transaction logs - Error logs - Admin activity logs

Alerts

Critical alerts: - Database unavailable - High error rate (> 1%) - High latency (> threshold) - Replica failures

Warning alerts: - High abort rate - Hot spots detected - Replica lag increasing - Node resource exhaustion


Change Safety

Schema Changes

Adding Tables

Adding Columns

Changing Primary Key

Index Changes

Adding Indexes

Removing Indexes

Configuration Changes

Changing Replication

Scaling Nodes


Security Boundaries

Access Control

Encryption

At rest: - Google-managed keys: Default encryption - Customer-managed keys: Cloud KMS keys

In transit: - TLS: All connections use TLS - Encryption: Data encrypted in transit

Data Protection


Tradeoffs

Consistency: External vs Serializability

External consistency: - Pros: Strongest guarantee, no anomalies - Cons: Higher latency (TrueTime wait), higher cost

Serializability: - Pros: Standard guarantee, lower latency - Cons: Weaker than external consistency

Replication: Multi-Region vs Regional

Multi-region: - Pros: Highest availability, disaster recovery - Cons: Higher latency, higher cost

Regional: - Pros: Lower latency, lower cost - Cons: Single region failure affects availability

Schema: Normalized vs Denormalized

Normalized: - Pros: Less storage, easier updates - Cons: More joins, potentially slower queries

Denormalized: - Pros: Faster queries, fewer joins - Cons: More storage, harder updates


Operational Considerations

Capacity Planning

Storage: - Growth: Plan for storage growth - Backups: Plan for backup storage - Indexes: Plan for index storage

Compute: - Nodes: Plan for node capacity - QPS: Plan for query throughput - Scaling: Plan for auto-scaling

Monitoring & Debugging

Monitor: - Query performance - Transaction performance - Node health - Replication health

Debug issues: 1. Check query performance (slow queries) 2. Check transaction conflicts 3. Check hot spots 4. Check node health 5. Check replication lag 6. Review logs

Incident Response

Common incidents: - High latency - Transaction conflicts - Hot spots - Replica failures

Response: 1. Check database health 2. Check query performance 3. Check transaction conflicts 4. Check hot spots 5. Scale if needed 6. Contact support if persistent


What Staff Engineers Ask in Reviews

Design Questions

Scale Questions

Consistency Questions

Operational Questions


Further Reading

Comprehensive Guide: Further Reading: Spanner

Quick Links: - Spanner Documentation - "Spanner: Google's Globally-Distributed Database" (Corbett et al., 2012) - "Spanner, TrueTime and The CAP Theorem" (Google Research) - Schema Design - Back to GCP Core Building Blocks


Exercises

  1. Design schema: Design a schema for an e-commerce application. What tables? What indexes? How do you handle interleaving?

  2. Handle consistency: Your application needs external consistency. How do you ensure this with Spanner?

  3. Optimize performance: Your queries are slow. How do you optimize them? What indexes do you add?

Answer Key: View Answers