Case Study: Customer Cohort Analysis

Advanced SQL Window Functions Business Intelligence

This case study demonstrates how to perform customer cohort analysis using advanced SQL techniques to understand customer retention and behavior patterns over time.

Business Context

An e-commerce company wants to understand customer retention patterns to improve marketing strategies. They need to analyze how customers acquired in different time periods (cohorts) behave over time.

Technical Challenge

Calculate monthly retention rates for customers grouped by their acquisition month (cohort), showing how many customers from each cohort continue to make purchases over subsequent months.

Implementation Approach

We'll use Common Table Expressions (CTEs) and window functions to:

  • Identify the cohort (acquisition month) for each customer
  • Calculate the period number since acquisition
  • Count active customers in each cohort-period combination
  • Calculate retention rates as percentages

SQL Implementation

-- CTE to identify customer acquisition cohort and activity periods
WITH customer_activities AS (
  SELECT
    customer_id,
    DATE_TRUNC('month', MIN(order_date))::date AS cohort_date,
    DATE_TRUNC('month', order_date)::date AS activity_month
  FROM orders
  GROUP BY customer_id, DATE_TRUNC('month', order_date)
),

-- CTE to assign period numbers relative to cohort start
cohort_analysis AS (
  SELECT
    customer_id,
    cohort_date,
    activity_month,
    1 AS has_activity,
    (EXTRACT(YEAR FROM activity_month) - EXTRACT(YEAR FROM cohort_date)) * 12 +
    (EXTRACT(MONTH FROM activity_month) - EXTRACT(MONTH FROM cohort_date)) + 1 AS period_number
  FROM customer_activities
),

-- Calculate metrics by cohort and period
cohort_metrics AS (
  SELECT
    cohort_date,
    period_number,
    COUNT(DISTINCT customer_id) AS active_customers
  FROM cohort_analysis
  GROUP BY cohort_date, period_number
),

-- Calculate cohort sizes (customers in period 1)
cohort_sizes AS (
  SELECT
    cohort_date,
    active_customers AS cohort_size
  FROM cohort_metrics
  WHERE period_number = 1
)

-- Final result with retention rates
SELECT
  cm.cohort_date,
  cm.period_number,
  cm.active_customers,
  cs.cohort_size,
  ROUND((cm.active_customers * 100.0 / cs.cohort_size), 2) AS retention_rate
FROM cohort_metrics cm
JOIN cohort_sizes cs ON cm.cohort_date = cs.cohort_date
ORDER BY cm.cohort_date, cm.period_number;

Results Analysis

The query produces results that show:

  • Each cohort (acquisition month) in the first column
  • Period number (1 = acquisition month, 2 = month after, etc.)
  • Number of active customers in each cohort-period combination
  • Retention rate as a percentage of the original cohort size
Cohort Month Period Active Customers Cohort Size Retention Rate
2023-01 1 1000 1000 100.00%
2023-01 2 750 1000 75.00%
2023-01 3 520 1000 52.00%
2023-02 1 850 850 100.00%
2023-02 2 620 850 72.94%

Key SQL Concepts Demonstrated

Common Table Expressions (CTEs)

Multiple CTEs decompose complex logic into manageable steps, improving readability and maintainability.

Date Functions

DATE_TRUNC and EXTRACT functions for cohort period calculations and time-based analysis.

Window Functions

Concept of period calculation relative to a baseline date for cohort analysis.

JOIN Operations

Combining metrics with cohort sizes to calculate retention percentages.

Performance Considerations

For large datasets, consider:

  • Indexing on date and customer_id columns
  • Partitioning by date for time-series data
  • Using materialized views for frequently accessed results
  • Applying appropriate data types (DATE vs TIMESTAMP)

Business Impact

This analysis helps the business:

  • Understand customer retention patterns across different acquisition periods
  • Identify which marketing campaigns produce long-term customers
  • Optimize customer acquisition spend based on long-term value
  • Target interventions to reduce churn in specific customer segments