Data Modelling : Star vs Snowflake Schema

Introduction

When building a data warehouse, the foundation lies in how we organize and connect data. Two of the most common data modeling techniques are the Star Schema and the Snowflake Schema. These aren’t databases themselves, but rather ways of arranging fact and dimension tables to make analysis faster, cleaner, and more reliable.

Before we compare the two, let’s clarify what fact tables and dimension tables are.

Fact Table:

A fact table stores the measurable business data—for example, sales, profit, revenue, or quantity sold. Alongside these numeric metrics, it includes foreign keys that link to descriptive attributes in dimension tables. Think of it as the “numbers table” at the center of your reporting.

Example: In a retail dataset, a fact table might store each order line with columns for Order ID, Product ID, Customer ID, Sales Amount, Profit, and Quantity.
Dimension Table:

A dimension table provides context to the facts. It contains descriptive, categorical attributes that explain the numbers in the fact table. These tables answer the “who, what, where, when” questions.

Example: A Product dimension might store details such as Product Name, Category, and Subcategory. A Customer dimension might hold Customer Name, Segment, and Region.

By combining fact and dimension tables, analysts can slice and dice metrics across any angle—like profit by region, sales by customer segment, or orders by category over time.

This relationship between facts (measures) and dimensions (descriptions) is the backbone of both the Star Schema and the Snowflake Schema. The difference lies in how those dimensions are structured—either kept simple and denormalized (Star) or normalized into multiple related tables (Snowflake).

Star Schema

The Star Schema is the most common and intuitive way to design a data warehouse. It’s called a star because of its shape: the fact table sits at the center, and all the dimension tables connect directly to it, like points of a star.

Structure: One central fact table with several surrounding dimension tables.
Example: In the Superstore dataset, the Fact_Sales table (with sales, profit, and quantity) links directly to dimensions like Customer, Product, Date, and Region.
Advantages:
- Easy to understand for business users.
- Faster queries (fewer joins).
- Well-suited for dashboards and ad-hoc analysis.
Disadvantages:
- Redundant data in dimensions (e.g., state repeated in every city row).
- Slightly larger storage footprint.

Digram created using draw.io