← All writing
Star Schema7 min read · Blog

The three kinds of fact table

Once you've fixed a process and its grain, there's a quieter decision hiding inside: what kind of fact table you're building. There are only three, and almost every modeling tangle I've untangled came down to someone forcing a question onto the wrong one. Knowing the three — and which questions each answers — saves you from that.

Transaction facts: one row per event

The default, and the most atomic. One row each time the thing happens: an order line placed, a payment captured, a button clicked. The measures describe that single event — quantity, amount, duration.

Transaction facts are the most flexible table you can build. Because each row is one real event, you can aggregate them any way later — by day, by customer, by product, by all three. They answer "how much / how many" across any slice. If you're not sure which kind you need, you almost always want this one. Capture the atomic events; everything else can be derived.

Periodic snapshot facts: one row per entity per period

Some questions aren't about events — they're about state over time. Account balance at month-end. Inventory on hand each day. Headcount per quarter. You can't answer "what was inventory on the 14th" by summing transactions cheaply, and some states (a balance) aren't even a clean sum of events.

So you snapshot: one row per entity per regular interval, recording the state as of that moment. The grain is "one row per product per store per day," and the measures are levels, not events. Snapshots trade storage for instant answers to "what was true on date X" — they're how you get trend lines for things that are balances, not flows.

Accumulating snapshot facts: one row per pipeline instance

The third kind models a process with a defined lifecycle — a series of milestones a thing passes through. An order moving from placed → picked → packed → shipped → delivered. A loan from application → review → approval → funding.

Here the grain is "one row per instance of the pipeline" — one row per order — and that row gets updated as the order hits each milestone, filling in a column of dates (and the durations between them) as it goes. This is the one fact type you deliberately update in place. It's purpose-built for the questions snapshots and transactions answer awkwardly: "how long does each stage take, and where do things get stuck?"

Picking the right one

Match the table to the shape of the question:

  • "How much happened?" — flows, counts, sums across slices → transaction.
  • "What was the level at a point in time?" — balances, on-hand, trends of state → periodic snapshot.
  • "How long between milestones, and where's the bottleneck?" — lifecycle and stage durations → accumulating snapshot.

The mistake is reaching for the one you built last instead of the one the question wants. Many mature warehouses carry all three for the same process — atomic transactions for flexibility, a daily snapshot for trends, an accumulating snapshot for cycle-time — each cheap to build once the grain is clear, and each answering what the others can't.

Also asListensoonSlidessoonPodcastsoonVideosoon

Have data that should be doing more?

Tell me about the pipeline that breaks, the metric nobody trusts, or the analysis stuck in a notebook. Let's operationalize it.