Dimensional modeling is a data design technique used for data warehouses and data marts, optimized for analytical querying and reporting. It structures data into "facts" (measures) and "dimensions" (context).
Key Concepts:
- Fact Tables:
Contain quantitative measurements (metrics) and foreign keys to dimension tables. Facts are typically numeric and additive.
- Additive Facts: Can be summed across all dimensions (e.g., sales amount, quantity).
- Semi-Additive Facts: Can be summed across some dimensions but not all (e.g., account balance - sum across customers, but not across time).
- Non-Additive Facts: Cannot be summed across any dimension (e.g., unit price, ratios). Require specific aggregations like average, min, max.
- Dimension Tables:
Contain descriptive attributes that provide context to the facts. They answer "who, what, where, when, why, how."
- Slowly Changing Dimensions (SCDs): Handle changes in dimension attributes over time.
- SCD Type 0 (Retain Original): Attribute never changes.
- SCD Type 1 (Overwrite): Old value is overwritten by new value (no history).
- SCD Type 2 (Add New Row): A new row is added for each change, preserving full history. Most common.
- SCD Type 3 (Add New Column): A new column is added to store the old value.
- SCD Type 4 (History Table): Current attributes in dimension, history in separate table.
- SCD Type 6 (Hybrid): Combines Type 1, 2, and 3 (e.g., current value, historical value, and effective dates).
- Conformed Dimensions: Dimensions that are shared across multiple fact tables or data marts, ensuring consistent reporting and integration across business areas.
- Slowly Changing Dimensions (SCDs): Handle changes in dimension attributes over time.
- Granularity: The level of detail stored in a fact table. It's crucial to define the lowest level of detail for which data is captured (e.g., sales per item per day per store).
- Surrogate Keys: Simple, system-generated integer keys used in dimension tables instead of natural keys. They provide performance benefits, handle SCDs, and isolate the data warehouse from source system key changes.
Schema Types:
- Star Schema:
The simplest and most common dimensional model. A central fact table surrounded by denormalized dimension tables. Optimized for query performance and ease of understanding.
[Time Dimension] | | [Product Dimension] --- [Fact Table] --- [Customer Dimension] | | [Store Dimension] - Snowflake Schema:
An extension of the star schema where dimensions are normalized into multiple related tables. Reduces data redundancy but can increase query complexity due to more joins.
[Time Dimension] | | [Product Category] --- [Product Dimension] --- [Fact Table] --- [Customer Dimension] | | [Store Location] --- [Store Dimension]