Ideas, written down.
Articles and field notes on operationalizing data — turning analyses into reliable systems, and metrics into decisions people trust.
AI agents won't replace your analysts — they'll kill the ad-hoc queue
The real win from agentic analytics isn't replacing people. It's retiring the endless backlog of "can you pull this?" — and freeing analysts for the work that compounds.
The data model is the product
Dashboards, metrics, and AI agents are all just views of the model underneath them. Modeling is the highest-leverage work in analytics — and the first thing teams skip.
Designing a star schema, step by step
The case for dimensional modeling is easy to make and hard to execute. Here's the actual four-step method — process, grain, dimensions, facts — that turns the idea of a star schema into one that holds up.
The semantic layer is the missing piece for trustworthy AI analysis
Point an LLM at raw tables and it will confidently average the wrong column. A semantic layer is what turns a fluent guesser into a grounded analyst.
Why the star schema still wins
Columnar warehouses were supposed to kill dimensional modeling. Thirty years on, the Kimball star is still the most legible, durable shape for analytics — and the reasons are worth understanding.
The three kinds of fact table
Transaction, periodic snapshot, accumulating snapshot. Most modeling mistakes come from forcing one kind of question onto the wrong kind of fact table — here's how to tell them apart.
Giving an agent a map: grounding LLMs in your data model
An agent is only as good as what you let it see. A practical look at the context, tools, and retrieval that turn a data model into something an LLM can navigate.
Declare the grain before you write a line of SQL
The single most important modeling decision is what one row of your fact table means. Get it fuzzy and every metric built on top is subtly, expensively wrong.
Designing dimensions people actually enjoy querying
Facts get the attention; dimensions decide whether the warehouse is a pleasure or a chore to use. Wide, denormalized, richly attributed dimensions are the difference — and the date dimension is where to start.
From question to query to action: building an agentic analytics pipeline
A reliable agentic analytics flow looks a lot like a data pipeline: discrete, observable stages — plan, retrieve, query, validate, answer, act — not one magic prompt.
Slowly changing dimensions, in plain English
A customer moves from Texas to New York. Do last year's sales move with them? That's the entire question slowly changing dimensions answer — and defaulting to the wrong one silently rewrites your history.
Use surrogate keys, not the source system's IDs
It's tempting to join facts to dimensions on the natural key from the source system. It's also the decision that quietly blocks history tracking, breaks on re-platforming, and slows every query. Here's why a meaningless integer wins.
Guardrails: letting an AI agent touch production data safely
The leap from an agent that answers questions to one that takes actions is where the risk lives. The controls that let you make that leap without losing sleep.
Metrics that survive reorgs
Every reorg arrives with a new VP who has opinions about what "active user" means. A model built on conformed dimensions and stable grain outlives the org chart that questioned it.
Five ways a star schema goes wrong
A star schema can be technically a star and still be a mess. The five failure modes I see most — mixed grain, over-snowflaking, non-conformed dimensions, natural keys, and fact-to-fact joins — and how to avoid each.
Why your dashboards aren't driving decisions
Most BI tools answer questions nobody is asking. The fix isn't a better chart — it's tying every metric to an owner, a decision, and a threshold, then deleting the rest.
The hidden cost of untyped pipelines
An upstream column silently changes type and three dashboards start lying for a week before anyone notices. Schema drift is a tax you pay in 3am pages — and data contracts pay it down.
From notebook to production
The analysis works on your laptop. Now it has to run at 6am, survive a dependency bump, and not page you when it does. The short checklist that gets a one-off to run reliably, without you.
Have data that should be doing more?
Tell me about the pipeline that breaks, the metric nobody trusts, or the analysis stuck in a notebook. Let's operationalize it.