Bigtable: Design & Tradeoffs

One-line summary: Understanding Bigtable's wide-column store design, key structure, performance characteristics, and how to avoid hot spots.

Prerequisites: Sharding & Partitioning, Basic NoSQL concepts (key-value stores, column families).


Mental Model

Bigtable Architecture

flowchart TB Client[Client] --> LB[Load Balancer] LB --> Bigtable[Bigtable Instance] Bigtable --> Tablet1[Tablet 1
Shard] Bigtable --> Tablet2[Tablet 2
Shard] Bigtable --> Tablet3[Tablet 3
Shard] Tablet1 --> SSTable1[SSTable Files] Tablet2 --> SSTable2[SSTable Files] Tablet3 --> SSTable3[SSTable Files] SSTable1 --> GCS[Cloud Storage
Backup] SSTable2 --> GCS SSTable3 --> GCS style Bigtable fill:#99ccff style Tablet1 fill:#ffcc99 style Tablet2 fill:#ffcc99 style Tablet3 fill:#ffcc99

Key insight: Bigtable is a wide-column store optimized for high-throughput, low-latency reads and writes. Understanding key design is critical for avoiding hot spots and achieving good performance.

Data Model

Table: Collection of rows.

Row: Identified by row key (string).

Column Family: Group of columns (logical grouping).

Column: Identified by column family:column qualifier.

Cell: Value at (row, column, timestamp).

Example:

Row Key: user:123
  Column Family: profile
    Column: name → "Alice"
    Column: email → "alice@example.com"
  Column Family: activity
    Column: last_login → 2024-01-01T10:00:00Z

Internals & Architecture

Key Design

Row Key Structure

Row key: Determines data distribution and access patterns.

Design principles: - Even distribution: Distribute load evenly across tablets - Locality: Related data should be co-located - Access patterns: Optimize for common access patterns

Good key design:

user:{userId}:{timestamp}
  - Distributes users evenly
  - Co-locates user data
  - Supports time-range queries

Bad key design:

{timestamp}:user:{userId}
  - All recent data on one tablet (hot spot)
  - Poor distribution

Key Components

Components: - Prefix: Common prefix for related data - Hash: Hash component for distribution - Suffix: Additional components for ordering

Example:

{prefix}:{hash}:{suffix}
  user:abc123:20240101
  - prefix: user (logical grouping)
  - hash: abc123 (distribution)
  - suffix: 20240101 (ordering)

Tablet Distribution

Tablets: Shards that store data.

Distribution: - Automatic: Bigtable automatically distributes tablets - Key-based: Distribution based on row key ranges - Dynamic: Tablets split and merge automatically

Splitting: - Trigger: Tablet size exceeds threshold - Process: Split tablet into two tablets - Result: Better load distribution

Merging: - Trigger: Tablet size below threshold - Process: Merge tablets - Result: Fewer tablets, simpler management

Column Families

Column families: Logical grouping of columns.

Design: - Group related columns: Columns accessed together - Separate access patterns: Different access patterns in different families - Limit families: Too many families hurt performance

Example:

Column Family: profile
  - name, email, phone (frequently accessed together)
Column Family: activity
  - last_login, page_views (less frequently accessed)

Timestamps

Timestamps: Version data in cells.

Use cases: - Time-series data: Store historical data - Versioning: Track changes over time - TTL: Automatic expiration of old data

Example:

Row: user:123
  Column: profile:name
    Timestamp: 2024-01-01T10:00:00Z → "Alice"
    Timestamp: 2024-01-02T10:00:00Z → "Alice Smith"

SSTable Storage

SSTable: Sorted String Table (immutable data files).

Structure: - Sorted: Rows sorted by key - Immutable: Files never modified (only created/deleted) - Compressed: Data compressed for storage efficiency

Compaction: - Minor compaction: Merge small SSTables - Major compaction: Merge all SSTables - Benefits: Reduce file count, improve read performance

Performance Characteristics

Latency

Read latency: - Single row: P95 < 10ms - Range scan: Depends on range size - Bottleneck: Usually disk I/O

Write latency: - Single row: P95 < 10ms - Batch writes: Lower latency per row - Bottleneck: Usually WAL (Write-Ahead Log)

Throughput

Read throughput: - Scales linearly: With number of nodes - Bottleneck: Usually disk I/O or network

Write throughput: - Scales linearly: With number of nodes - Bottleneck: Usually WAL or disk I/O

Scalability

Limits: - Table size: Petabytes - Nodes: Thousands of nodes - QPS: Millions of queries per second - Regions: Multi-region support


Failure Modes & Blast Radius

Bigtable Failures

Scenario 1: Regional Outage

Scenario 2: Tablet Server Failure

Scenario 3: Hot Spot

Performance Failures

Scenario 1: Slow Compaction

Scenario 2: WAL Overload

Overload Scenarios

10× Normal Load

100× Normal Load


Observability Contract

Metrics to Track

Table Metrics

Node Metrics

Tablet Metrics

Logs

Bigtable logs: - Query logs (if enabled) - Error logs - Admin activity logs - Compaction logs

Alerts

Critical alerts: - Table unavailable - High error rate (> 1%) - High latency (> threshold) - Hot spots detected

Warning alerts: - High compaction lag - WAL queue full - Node resource exhaustion - Tablet count increasing


Change Safety

Schema Changes

Adding Column Families

Changing Row Key Structure

Configuration Changes

Scaling Nodes

Changing Replication


Security Boundaries

Access Control

Encryption

At rest: - Google-managed keys: Default encryption - Customer-managed keys: Cloud KMS keys

In transit: - TLS: All connections use TLS - Encryption: Data encrypted in transit

Data Protection


Tradeoffs

Consistency: Strong vs Eventual

Strong consistency: - Pros: Always see latest data - Cons: Higher latency, lower throughput

Eventual consistency: - Pros: Better performance, higher throughput - Cons: May see stale data briefly

Key Design: Sequential vs Hashed

Sequential keys: - Pros: Supports range scans, ordered access - Cons: May cause hot spots

Hashed keys: - Pros: Even distribution, no hot spots - Cons: No range scans, random access only

Column Families: Many vs Few

Many families: - Pros: Better organization, separate access patterns - Cons: More overhead, worse performance

Few families: - Pros: Better performance, simpler - Cons: Less organization, mixed access patterns


Operational Considerations

Capacity Planning

Storage: - Growth: Plan for storage growth - Backups: Plan for backup storage - Compaction: Plan for compaction overhead

Compute: - Nodes: Plan for node capacity - QPS: Plan for query throughput - Scaling: Plan for auto-scaling

Monitoring & Debugging

Monitor: - Query performance - Hot spots - Node health - Compaction lag

Debug issues: 1. Check query performance (slow queries) 2. Check hot spots 3. Check node health 4. Check compaction lag 5. Review logs

Incident Response

Common incidents: - High latency - Hot spots - Compaction lag - Node failures

Response: 1. Check table health 2. Check query performance 3. Check hot spots 4. Check node health 5. Scale if needed 6. Contact support if persistent


What Staff Engineers Ask in Reviews

Design Questions

Scale Questions

Performance Questions

Operational Questions


Further Reading

Comprehensive Guide: Further Reading: Bigtable

Quick Links: - Bigtable Documentation - "Bigtable: A Distributed Storage System for Structured Data" (Chang et al., 2006) - Schema Design - Performance Best Practices - Back to GCP Core Building Blocks


Exercises

  1. Design keys: Design row keys for a time-series application. How do you avoid hot spots?

  2. Handle hot spots: Your table has hot spots. How do you fix them? What key changes do you make?

  3. Optimize performance: Your queries are slow. How do you optimize them? What column families do you use?

Answer Key: View Answers