Key takeaway: When AI agents reach your data through an MCP server, they compose calls a human UI never would, so isolation enforced only in the UI or REST layer leaks. Tenant context must be injected at the tool boundary from the verified token, never from the prompt or a tool argument, and the SQL tool must take a tenant-scoped handle that can't address other tenants by construction. Row-level security in the database is a backstop, not the boundary.
On a capital-markets analytics platform we worked on, the question that took the most care wasn't in the REST API. It was one layer up, in the surface the AI agents actually talked to.
The platform exposes itself to LLM agents through a Model Context Protocol server. The agents get a set of tools: SQL analytics, the data catalog, saved views, documentation search, a few more. A user asks a question in chat, the agent picks tools, composes calls, and answers.
That tool surface is the part of the system most teams haven't fully sat with yet, because it behaves differently from a frontend.
The frontend is a known caller
With a React app you control the shape of every request, so auth and tenant checks on fixed endpoints, backed by row-level rules, are enough.
When your React app calls your API, you know the shape of every request before it leaves the browser. The pages were designed. The forms validate. The endpoints check auth and tenant on the way in. If anything tries to cross a tenant boundary, several layers refuse it before the database is even touched.
Multi-tenant isolation in that world is well understood. A guard reads the tenant from the verified token. Queries get scoped automatically. Row-level rules in Postgres back it up. The frontend can't compose its way past any of that, because the frontend isn't composing, it's calling fixed endpoints you wrote.
An agent is not a known caller
An LLM agent composes its own calls, so a helpful widening of a read can land on another tenant's data with a query that looks completely legitimate.
An LLM agent given a SQL tool and a catalog tool is composing. It reads the user's question, decides what to fetch, drafts a query, calls a tool, looks at the result, tries again. Helpful behaviour, and most of the time exactly what you want.
The failure mode we kept in front of us through the build was this. A user asks something ambiguous. The agent, trying to be helpful, widens the read to gather more context. If the SQL tool takes a raw query string and a database connection, that widened read can land on data from a tenant the user has no business seeing. Nothing about the call looks malicious. The query is well-formed. It just answers a question the user didn't quite ask, in a scope the user doesn't have.
The human UI would never have let that happen, because no page in the app addresses cross-tenant data. The agent doesn't know that, because the agent doesn't use pages.
Where isolation actually has to live
Move the boundary into the tool definitions: inject tenant from the verified token, hand the SQL tool a tenant-scoped handle, contract-test every schema, and keep row-level security as the backstop.
The practical answer is that the boundary has to move into the tool definitions themselves. The tool contract is what the agent calls, so the tool contract is what has to refuse the unsafe shape.
A few concrete moves that came out of the work.
- Tenant context is injected at the tool boundary from the verified token. Never read from the prompt. Never read from a tool argument. The agent cannot pass a tenant id, because tenant id is not a parameter the tool exposes.
- The SQL tool does not take a raw connection. It takes a tenant-scoped catalog handle that, by construction, can only address that tenant's data. A query that tries to reach across is not refused at the database, it is unrepresentable at the tool level.
- Every tool schema has contract tests. The point is to catch the silent widening: a future refactor that adds a field to a return shape, or relaxes a filter, and quietly gives the agent access to something it didn't have last week. The tests fail if the contract drifts.
- The database is still the backstop. Row-level isolation in Postgres still runs on every query. But if the agent reaches the database with a query that looks legitimate for the wrong tenant, the backstop won't fire, because to the database the call looks like a normal request from an authorized session. The earlier boundary has to do the work.
What to ask before you ship an agent
Ask where the tenant comes from on every tool call, and what stops the agent from composing a call the human UI never would.
If you're putting an AI agent in front of multi-tenant data, two questions cover most of the risk.
- Where does the tenant come from on every tool call? If the answer is anywhere downstream of the prompt, that is the bug.
- What stops the agent from composing a call the human UI would never make? If the only thing stopping it is that no human would ask for it, that is also the bug, because the agent will, eventually, by accident.
Reading the tool layer with those two questions in mind is most of what an audit of an AI-native system actually is, and the kind of review a security team will expect before your first enterprise customer. If you're about to expose your platform to agents and you haven't had that read done by someone outside the team that built it, that's the moment for one.
Frequently Asked Questions
Why isn't database row-level security enough for AI agents?
Because if an agent composes a well-formed query for the wrong tenant, the database sees a normal request from an authorized session and the row-level backstop won't fire. The boundary has to move earlier, into the tool definitions the agent calls.
Where should tenant context come from on an agent tool call?
From the verified token, injected at the tool boundary. Never from the prompt and never from a tool argument. The agent shouldn't be able to pass a tenant id, because tenant id isn't a parameter the tool exposes.
What is the main risk of giving an LLM agent a SQL tool?
An agent trying to be helpful can widen a read to gather context and land on another tenant's data. The query looks legitimate, so nothing flags it. A tenant-scoped handle that can't address other tenants by construction prevents it at the tool level.
Related Services
Need help with what you just read? These services are directly relevant.
