Why don't normal security scanners catch this?

Because the bug isn't in a line of code. It's in the seam between the model, the tool, and the database. A scanner sees an authenticated request hitting an authenticated endpoint with valid SQL. It can't tell that the agent widened the query on the way through.

Does fixing this mean rebuilding the app?

No. The fix is to treat the tool layer like an API. Tenant id comes from the verified session, every tool call gets re-scoped on the server, the database enforces row isolation underneath, and every model call is logged. That's a rescue, not a rebuild.

Where should tenant context come from in an agent app?

From the verified session on the server, never from the prompt and never from the client. The model's job is to suggest what to ask for. The tool layer's job is to decide who's allowed to ask.

AI Agent Security: Tenant Isolation in AI-Built SaaS

Key takeaway: AI-tool-built apps usually fail in production not because the code is bad, but because the agent layer was wired with no tenant context, no tool-call boundaries, and no audit trail. The fix is to treat the tool layer like an API: tenant id from the session, every call re-scoped on the server, isolation in the database underneath.

An app gets built fast with an AI tool. It works in the demo. A user types a vague question into the chat, the agent answers, everyone is happy. Six months later, in production, a different user types something vague and gets back a row that belongs to another customer.

Nothing was attacked. The agent just helpfully widened the read.

This is the failure mode I've been reading for most often in AI-tool-built products this year. It isn't really a code bug. It's a wiring choice nobody made on purpose.

Where the bug actually lives

The agent layer was wired to trust the model to decide the scope of the read, instead of enforcing the tenant boundary on every tool call.

In a normal app, a request comes in, the server checks who you are, the server decides what query to run, and the database returns the rows. The boundary lives in the server.

In an agent app, a request comes in as natural language. The model rewrites it into a structured tool call. The tool call hits the database. If the tool layer was wired the obvious way, by passing the model's filter straight through, then the model is deciding the scope of the read. The boundary moved from the server into the prompt, and the prompt is the one thing you can't trust.

A user asks for "recent activity." The model, doing its job well, produces a broad query. The tool runs it. Nothing on that path re-asked who's allowed to see what.

Why the demo never caught it

The demo only ever had one tenant, friendly inputs, and a model that produced sensible queries, so the boundary was never tested.

The demo had one account, one dataset, and prompts the founder typed themselves. Of course it worked. The model produced sensible queries because the inputs were sensible. The tool layer never had to defend anything because there was nothing to defend it from.

Production is different. Production has two tenants whose data happens to live in the same table. Production has a user who phrases something ambiguously. Production has prompts the founder never imagined. The boundary only gets tested when something stresses it, and the happy path never stresses it.

What to actually wire

Treat the tool layer like an API: tenant from the session, server-side re-scoping, database isolation, and an audit trail of what the model asked for.

The shape that holds up looks like this:

Tenant context comes from the verified session. Not from the prompt, not from a field the client sends, not from something the model decided. The server already knows who's logged in. Use that.
Every tool call re-scopes the query on the server. The model can suggest a filter. The tool layer adds the tenant scope on top of it, every time, with no opt-out. An interceptor is the right place for this, so a developer can't forget it on the next tool.
The database enforces isolation underneath. Row-level security on every table that holds tenant data. If the interceptor ever has a bug, the database refuses the read instead of leaking it. A missed check fails closed.
Every model call is logged. The user's words, the model's rewritten tool call, the parameters that actually ran, the rows returned. When something does cross a line, you can tell where it broke. Without this, you're guessing.

This is the same defense-in-depth pattern as a normal multi-tenant API. The reason it gets skipped in agent apps is that the model in the middle makes it feel like the rules are different. They aren't. The same boundary shows up one layer out when you expose a platform to agents through an MCP server.

The audit angle

Reading these seams, between the model, the tools, and the data, is most of what an audit of an agent app actually is.

A scanner won't catch this. It sees an authenticated request hitting an authenticated endpoint with valid SQL, and it shrugs. The problem isn't in any one line. It's in the relationship between three layers that were each, on their own, doing roughly what they were told.

Reading those seams is what an audit of an agent app actually is. Where the tenant context comes from on each tool call. What happens if the model produces a filter the developer didn't anticipate. What the database does if the application layer forgets. Whether there's a record of the actual call that ran.

If you've shipped a product with an agent inside it and you're not sure where the boundary actually lives, that's the read an audit is for. The fix is almost never a rebuild. It's wiring the seams the way you'd wire any other API, and then proving, with a log, that the wiring holds.

When the agent decides the scope of the read

Where the bug actually lives

Why the demo never caught it

What to actually wire

The audit angle

Frequently Asked Questions

Why don't normal security scanners catch this?

Does fixing this mean rebuilding the app?

Where should tenant context come from in an agent app?

Related Services

Keep Reading

Why MVPs that work in the demo break with real users

Tenant isolation that doesn't depend on a developer remembering

When your AI endpoint is a spending decision, not a feature

Ready to Build, Rescue, or Scale Your Product?