Case Study: Customer Cohort Analysis
This case study demonstrates how to perform customer cohort analysis using advanced SQL techniques to understand customer retention and behavior patterns over time.
Business Context
An e-commerce company wants to understand customer retention patterns to improve marketing strategies. They need to analyze how customers acquired in different time periods (cohorts) behave over time.
Technical Challenge
Calculate monthly retention rates for customers grouped by their acquisition month (cohort), showing how many customers from each cohort continue to make purchases over subsequent months.
Implementation Approach
We'll use Common Table Expressions (CTEs) and window functions to:
- Identify the cohort (acquisition month) for each customer
- Calculate the period number since acquisition
- Count active customers in each cohort-period combination
- Calculate retention rates as percentages
SQL Implementation
WITH customer_activities AS (
SELECT
customer_id,
DATE_TRUNC('month', MIN(order_date))::date AS cohort_date,
DATE_TRUNC('month', order_date)::date AS activity_month
FROM orders
GROUP BY customer_id, DATE_TRUNC('month', order_date)
),
-- CTE to assign period numbers relative to cohort start
cohort_analysis AS (
SELECT
customer_id,
cohort_date,
activity_month,
1 AS has_activity,
(EXTRACT(YEAR FROM activity_month) - EXTRACT(YEAR FROM cohort_date)) * 12 +
(EXTRACT(MONTH FROM activity_month) - EXTRACT(MONTH FROM cohort_date)) + 1 AS period_number
FROM customer_activities
),
-- Calculate metrics by cohort and period
cohort_metrics AS (
SELECT
cohort_date,
period_number,
COUNT(DISTINCT customer_id) AS active_customers
FROM cohort_analysis
GROUP BY cohort_date, period_number
),
-- Calculate cohort sizes (customers in period 1)
cohort_sizes AS (
SELECT
cohort_date,
active_customers AS cohort_size
FROM cohort_metrics
WHERE period_number = 1
)
-- Final result with retention rates
SELECT
cm.cohort_date,
cm.period_number,
cm.active_customers,
cs.cohort_size,
ROUND((cm.active_customers * 100.0 / cs.cohort_size), 2) AS retention_rate
FROM cohort_metrics cm
JOIN cohort_sizes cs ON cm.cohort_date = cs.cohort_date
ORDER BY cm.cohort_date, cm.period_number;
Results Analysis
The query produces results that show:
- Each cohort (acquisition month) in the first column
- Period number (1 = acquisition month, 2 = month after, etc.)
- Number of active customers in each cohort-period combination
- Retention rate as a percentage of the original cohort size
| Cohort Month | Period | Active Customers | Cohort Size | Retention Rate |
|---|---|---|---|---|
| 2023-01 | 1 | 1000 | 1000 | 100.00% |
| 2023-01 | 2 | 750 | 1000 | 75.00% |
| 2023-01 | 3 | 520 | 1000 | 52.00% |
| 2023-02 | 1 | 850 | 850 | 100.00% |
| 2023-02 | 2 | 620 | 850 | 72.94% |
Key SQL Concepts Demonstrated
Common Table Expressions (CTEs)
Multiple CTEs decompose complex logic into manageable steps, improving readability and maintainability.
Date Functions
DATE_TRUNC and EXTRACT functions for cohort period calculations and time-based analysis.
Window Functions
Concept of period calculation relative to a baseline date for cohort analysis.
JOIN Operations
Combining metrics with cohort sizes to calculate retention percentages.
Performance Considerations
For large datasets, consider:
- Indexing on date and customer_id columns
- Partitioning by date for time-series data
- Using materialized views for frequently accessed results
- Applying appropriate data types (DATE vs TIMESTAMP)
Business Impact
This analysis helps the business:
- Understand customer retention patterns across different acquisition periods
- Identify which marketing campaigns produce long-term customers
- Optimize customer acquisition spend based on long-term value
- Target interventions to reduce churn in specific customer segments