Case Study: Unified Analytics Platform for a CPG Company

How a global consumer packaged goods (CPG) company built a unified analytics platform with Azure Synapse to analyze sales, marketing, and supply chain data.

The Challenge

A global CPG company was struggling to get a unified view of its operations. Their data was fragmented across numerous systems:

  • An on-premises SQL Server for sales and transactional data.
  • A separate cloud data warehouse for marketing campaign data.
  • Multiple ERP systems (like SAP) for supply chain and inventory management.
  • Third-party market share data delivered as flat files.

This fragmentation made it incredibly difficult to answer critical business questions, such as "How does a social media campaign in a specific region impact store-level inventory?" The goal was to create a single, unified analytics platform to provide a 360-degree view of the business.

The Architecture

graph TD subgraph "Data Sources" A[On-Prem SQL] B[SaaS Platforms] C[ERP Systems] end subgraph "Unified Analytics Platform" A & B & C --> D{Synapse Pipelines}; D --> E[ADLS Gen2 Data Lake]; E --> F(Synapse Spark Pools); E --> G(Synapse Serverless SQL Pools); F --> H(Synapse Dedicated SQL Pools); G --> H; end subgraph "Business Intelligence & ML" F --> I{Azure ML}; H --> J[Power BI]; end

Azure Synapse Analytics was selected to break down the data silos and provide a unified workspace for all data professionals:

  1. Data Integration Hub: Synapse Pipelines (powered by Azure Data Factory) act as the central ingestion engine. It uses pre-built connectors to pull data from the on-premises SQL Server, the marketing warehouse, and the ERP systems, landing it all in Azure Data Lake Storage (ADLS) Gen2.
  2. The Enterprise Data Lake: ADLS Gen2 serves as the single source of truth, storing all raw and processed data in a hierarchical namespace optimized for analytics.
  3. Multi-Modal Compute: This is where Synapse shines. Different teams use different compute engines on the same data lake, all within the Synapse Studio workspace:
    • Data Engineers use Synapse Spark Pools to run large-scale data transformation jobs in Scala and Python, cleaning and modeling the data into a star schema.
    • Data Analysts use Synapse Serverless SQL Pools for ad-hoc, exploratory queries directly on the files in the data lake, paying only for the queries they run.
    • BI Professionals use Synapse Dedicated SQL Pools (a provisioned MPP data warehouse) to host the final, curated data models for high-performance dashboards.
  4. Machine Learning & BI: Data scientists use the same Synapse Spark Pools to train demand forecasting models. The results are written back to the data lake and served to business users via interactive Power BI reports that are seamlessly integrated into the Synapse Studio.

Key Technical Details

  • Unified Workspace with Synapse Studio: The ability for data engineers, analysts, and data scientists to collaborate in a single web UI was a major productivity booster. It eliminated the need to switch between different tools for data integration, data warehousing, and machine learning.
  • Cost Optimization with Hybrid Compute: The company heavily utilized the serverless SQL pools for data exploration, which significantly reduced costs compared to running a dedicated data warehouse 24/7. Dedicated pools were reserved for the most critical, high-concurrency BI workloads.
  • Code-Free and Code-First Flexibility: Data engineers used the visual, drag-and-drop Data Flows within Synapse Pipelines for standard ETL tasks, which accelerated development. For more complex transformations, they used Spark notebooks, providing the flexibility to switch between low-code and code-first approaches.
  • End-to-End Security and Governance: The entire platform is secured with Azure Active Directory. Synapse's built-in capabilities for column-level security (CLS) and row-level security (RLS) ensure that different user groups (e.g., marketing, finance) can only see the data they are authorized to access, even when querying the same underlying tables.