← All writing
Star Schema7 min read · Blog

Designing dimensions people actually enjoy querying

Facts get the glory — they hold the numbers everyone wants. But the daily experience of using a warehouse is mostly an experience of its dimensions: they're the words in every filter, the labels on every axis, the "group by" in every question. Good dimensions make a model feel obvious. Thin ones make every query a scavenger hunt through join logic. Here's what separates the two.

Wide and denormalized, on purpose

The instinct drilled into application developers is to normalize — factor everything into lookup tables, never repeat a value. In a dimension, resist it. A dimension should be wide: a customer dimension carries name, segment, industry, region, tier, signup cohort, and every other descriptive attribute, all in one table, even though "region" repeats across thousands of rows.

That repetition is the feature. It means one join gets you everything about the customer, the query reads like a sentence, and the engine isn't chasing a chain of lookups. Storage is cheap; a confusing model that nobody can query is expensive. Denormalize dimensions for the human, not the disk.

The date dimension earns its keep first

If you build one dimension well, make it date. Nearly every fact joins to it, and a real date dimension — one row per day, with columns for day-of-week, month, quarter, fiscal period, holiday flags, "is weekend," week-of-year — turns a whole class of painful date math into a simple filter. "Weekday sales this fiscal quarter" becomes a WHERE clause instead of a tangle of date functions every analyst re-derives slightly differently.

Build it once, populate it for a few decades, and every time-based question across the warehouse gets easier. It's the highest-leverage table you'll ever make.

A few patterns worth knowing

  • Role-playing dimensions. One date dimension, joined multiple times under different roles — order date, ship date, delivery date. One physical table, several logical lenses, via aliased joins.
  • Degenerate dimensions. Some identifiers (an order number) belong in the fact table itself — there are no attributes to hang on them, but you still want to group by them. Keep them on the fact; don't build a hollow dimension for them.
  • Junk dimensions. A scatter of low-cardinality flags (order type, channel, priority) doesn't each need its own table. Collapse them into one small "junk" dimension of the combinations that occur. It tidies the model without losing the flags.

Handle change deliberately

Dimensions change — a customer re-segments, a product gets recategorized. Whether those changes overwrite the past or preserve it is the slowly changing dimension decision, and it's worth making attribute by attribute rather than reaching for a default. A dimension that remembers the right history is what keeps point-in-time questions honest.

Get dimensions right — wide, well-labeled, conformed, deliberately versioned — and the warehouse stops feeling like a database and starts feeling like it was built for the questions people actually ask.

Also asListensoonSlidessoonPodcastsoonVideosoon

Have data that should be doing more?

Tell me about the pipeline that breaks, the metric nobody trusts, or the analysis stuck in a notebook. Let's operationalize it.