Guardrails: letting an AI agent touch production data safely

An agent that reads is useful. An agent that acts — updates a segment, issues a refund, reorders inventory — is transformative and genuinely dangerous. The gap between the two isn't model capability. It's the guardrails you put around it. Here's the set I won't ship an acting agent without.

Least privilege, always

The agent gets its own service identity with the narrowest possible grants. Read paths run against a read replica with row- and column-level security so PII never enters context unless the use case demands it. Write paths are exposed only as specific, audited actions — never a general database connection. An agent should never hold credentials that can do something you wouldn't let an intern do unsupervised.

Validate the plan, not just the output

Because the agent composes structured query/action specs (not raw SQL), you can inspect and reject a plan before it executes: block writes outside an allowlist, cap the rows an action can affect, require a WHERE clause, refuse anything that touches a restricted table. Validation on the plan is cheaper and safer than cleaning up after the fact.

Human-in-the-loop where it counts

Tier actions by blast radius. Reading a number? Fully autonomous. Updating one record? Maybe autonomous with logging. Mutating 40,000 rows or moving money? Human approval, every time — with the agent presenting its plan and reasoning in plain language so the human can actually judge it. The goal isn't to slow everything down; it's to spend approval budget only where it matters.

Budgets and circuit breakers

Give the agent hard ceilings: token spend, warehouse cost, number of actions per session, rate limits per tool. A runaway loop should hit a wall, not your month-end bill. Trip a circuit breaker on anomalies — a sudden spike in failed validations or affected rows pauses the agent and pages a human.

Observability is non-negotiable

Every prompt, retrieval, plan, query, and action goes into an append-only audit log: who asked, what the agent saw, what it did, what it cost, whether a human approved. When something goes wrong — and it will — you need to reconstruct the decision exactly. This log is also how you improve: the failures show you the missing definitions and the ambiguous metrics to fix upstream.

The trust ladder

You don't flip a switch from "demo" to "acts on production." You climb a ladder: read-only → suggest-and-confirm → autonomous on low-risk actions → autonomous on more, as the audit log earns it. Each rung is a decision backed by evidence, not a leap of faith.

Done right, an acting agent feels less like a loose cannon and more like a well-supervised teammate: it does the rote work fast, it shows its reasoning, and it knows exactly where its authority ends. That boundary — not the model — is the product.