AlloyDB: PostgreSQL-Compatible Database

One-line summary: Deep dive into AlloyDB's architecture, compute-storage separation, PostgreSQL compatibility, and performance optimizations.

Prerequisites: Spanner: Consistency & Performance, Basic PostgreSQL concepts (ACID, transactions, replication).


Mental Model

AlloyDB Architecture

flowchart TB Client[Client] --> LB[Load Balancer] LB --> Primary[Primary Instance
Compute] Primary --> ReadPool[Read Pool
Replicas] Primary --> Storage[Storage Layer
Separated] ReadPool --> Storage Storage --> LogStorage[Log Storage] Storage --> DataStorage[Data Storage] Storage --> Backup[Backup Storage
Cloud Storage] style Primary fill:#99ccff style Storage fill:#ffcc99 style ReadPool fill:#99ff99

Key insight: AlloyDB separates compute (PostgreSQL instances) from storage (distributed storage layer). This architecture enables independent scaling, better performance, and high availability.

Compute-Storage Separation

Compute Layer: PostgreSQL instances (primary + read replicas). - Primary: Handles writes, coordinates replication - Read Replicas: Handle reads, scale independently - Scaling: Scale compute independently from storage

Storage Layer: Distributed storage (shared by all instances). - Log Storage: Write-ahead log (WAL) storage - Data Storage: Table data storage - Scaling: Scale storage independently from compute

Benefits: - Independent scaling: Scale compute and storage separately - Better performance: Optimized storage layer - High availability: Storage redundancy, automatic failover


Internals & Architecture

Storage Layer

Log Storage

Write-Ahead Log (WAL): Transaction log for durability.

Architecture: - Distributed: WAL replicated across storage nodes - Durability: Strong durability guarantees - Performance: Low-latency writes

How it works: 1. Write: Primary writes to WAL 2. Replication: WAL replicated to storage nodes 3. Durability: Write confirmed after replication 4. Replay: Read replicas replay WAL for consistency

Data Storage

Data Storage: Table data storage.

Architecture: - Columnar cache: Columnar cache for analytics - Row storage: Row storage for OLTP - Optimization: Optimized for both OLTP and OLAP

Features: - Vectorized execution: Vectorized query execution - Columnar storage: Columnar storage for analytics - Compression: Advanced compression algorithms

Compute Layer

Primary Instance

Primary: Handles writes and coordinates replication.

Functions: - Write handling: Processes write transactions - Replication: Coordinates WAL replication to read replicas - Consistency: Ensures consistency across replicas

Scaling: - Vertical: Scale CPU/memory - Horizontal: Not directly (use read replicas for reads)

Read Replicas

Read Replicas: Handle reads, scale independently.

Functions: - Read handling: Processes read queries - Replication: Receives WAL from primary - Consistency: Reads from consistent snapshot

Scaling: - Horizontal: Add read replicas for read scaling - Vertical: Scale CPU/memory per replica

Replication: - Streaming: WAL streamed to replicas - Lag: Low replication lag (< 100ms) - Consistency: Read replicas see consistent snapshots

PostgreSQL Compatibility

SQL Compatibility

PostgreSQL: Full PostgreSQL SQL compatibility.

Features: - SQL: Standard PostgreSQL SQL - Extensions: PostgreSQL extensions supported - Functions: PostgreSQL functions supported

Migration: - Easy migration: Migrate from PostgreSQL with minimal changes - Compatibility: High compatibility with PostgreSQL applications

Performance Optimizations

Vectorized Execution: Vectorized query execution for better performance.

Columnar Cache: Columnar cache for analytics workloads.

Query Optimization: Advanced query optimizer.

Benefits: - Better performance: Up to 4× faster than standard PostgreSQL - Analytics: Better performance for analytics workloads - OLTP: Maintains OLTP performance

High Availability

Automatic Failover

Failover: Automatic failover from primary to replica.

Process: 1. Detection: Primary failure detected 2. Election: Read replica promoted to primary 3. Switchover: Traffic switched to new primary 4. Recovery: Old primary recovered as replica

Recovery Time: - RTO: Recovery Time Objective (< 60 seconds) - RPO: Recovery Point Objective (< 1 second)

Multi-Zone Deployment

Zones: Deploy across multiple zones.

Benefits: - Availability: Higher availability (99.99%) - Disaster recovery: Zone-level disaster recovery - Performance: Low-latency access from multiple zones


Failure Modes & Blast Radius

AlloyDB Failures

Scenario 1: Primary Failure

Scenario 2: Storage Layer Failure

Scenario 3: Read Replica Failure

Performance Failures

Scenario 1: Storage I/O Bottleneck

Scenario 2: Compute Exhaustion

Overload Scenarios

10× Normal Load

100× Normal Load


Observability Contract

Metrics to Track

Database Metrics

Compute Metrics

Storage Metrics

Logs

AlloyDB logs: - Query logs (if enabled) - Error logs - Admin activity logs - Replication logs

Alerts

Critical alerts: - Database unavailable - High error rate (> 1%) - High latency (> threshold) - Primary failure

Warning alerts: - High replication lag - High storage I/O - Compute resource exhaustion - Connection pool exhaustion


Change Safety

Instance Changes

Scaling Compute

Adding Read Replicas

Changing Storage Size

Configuration Changes

Changing Backup Schedule

Enabling High Availability


Security Boundaries

Access Control

Encryption

At rest: - Google-managed keys: Default encryption - Customer-managed keys: Cloud KMS keys

In transit: - TLS: All connections use TLS - Encryption: Data encrypted in transit

Data Protection


Tradeoffs

Compute-Storage Separation: Benefits vs Complexity

Benefits: - Independent scaling: Scale compute and storage separately - Better performance: Optimized storage layer - High availability: Storage redundancy

Complexity: - More components: More components to manage - Coordination: Need to coordinate compute and storage

Replication: Synchronous vs Asynchronous

Synchronous: - Pros: Strong consistency, no data loss - Cons: Higher latency, lower throughput

Asynchronous: - Pros: Lower latency, higher throughput - Cons: Possible data loss, eventual consistency

Compatibility: PostgreSQL vs Optimizations

PostgreSQL compatibility: - Pros: Easy migration, familiar SQL - Cons: May not use all optimizations

Optimizations: - Pros: Better performance, advanced features - Cons: May require application changes


Operational Considerations

Capacity Planning

Storage: - Growth: Plan for storage growth - Backups: Plan for backup storage - Scaling: Plan for storage scaling

Compute: - Instances: Plan for instance capacity - QPS: Plan for query throughput - Scaling: Plan for compute scaling

Monitoring & Debugging

Monitor: - Query performance - Instance health - Storage health - Replication lag

Debug issues: 1. Check query performance (slow queries) 2. Check instance health 3. Check storage health 4. Check replication lag 5. Review logs

Incident Response

Common incidents: - High latency - Primary failure - Storage I/O bottleneck - Replication lag

Response: 1. Check database health 2. Check query performance 3. Check instance health 4. Check storage health 5. Scale if needed 6. Contact support if persistent


What Staff Engineers Ask in Reviews

Design Questions

Scale Questions

Performance Questions

Operational Questions


Further Reading

Comprehensive Guide: Further Reading: AlloyDB

Quick Links: - AlloyDB Documentation - PostgreSQL Compatibility - Performance Best Practices - High Availability - Back to GCP Core Building Blocks


Exercises

  1. Design architecture: Design an AlloyDB architecture for a high-traffic application. How many read replicas? What's the failover strategy?

  2. Handle scaling: Your application needs to scale reads. How do you scale with AlloyDB? What's the strategy?

  3. Optimize performance: Your queries are slow. How do you optimize them? What AlloyDB features do you use?

Answer Key: View Answers