How Serpin designs AI agent systems to minimise fabrication, and catch what gets through.
Thirty minutes with me to talk through your AI agent systems.
We work through where these patterns would apply, and what to put in place next.
These patterns form a layered defence that follow how an AI workflow operates: the AI takes in inputs, reasons over them, produces an output, and finally takes an action in a real system.
Each layer is a point where a specific kind of fabrication can happen. Pattern 01 is the foundation, it sets the principle for how to split work between the AI and the code that surrounds it.
Use the AI for things that need judgement. Use code for anything that can be checked mechanically, like counts, formats, or whether an action was actually completed. Asking the AI to verify its own work is asking for fabrication.
Restrict what the AI is free to invent. Limit the information it can see to what the current task needs. Where you can, give it a fixed list to pick from rather than asking it to generate from scratch.
If the same AI both fetches data and reasons over it, you cannot independently check what it actually retrieved. Have one system fetch, and a different one reason on what was fetched.
When the AI is required to produce a complete answer, it may invent the missing parts rather than admit it could not find them. Design the task so the AI has a clear way to report what it could not do.
The AI can produce a citation or quote that sounds real but does not appear anywhere in the source material it was given. Match every claim back to the actual source, not just check the citation looks plausible.
A single accuracy score tells a reviewer nothing about which facts are unreliable. Tag each fact in the AI’s output as source-confirmed, inferred from the source, or assumed by the model, so a reviewer can challenge each one individually.
Before the AI takes an action in a real system or sends something to a person, run two checks: a code check for the rules that must hold, and a second AI looking specifically for fabrications the first AI might have missed. This is the last line of defence.
Use the AI for judgement. Use code for anything that can be checked mechanically. The AI is not a reliable verifier of its own work.
Some things the AI does well. It is good at judgement: is this source credible, does this argument flow, is the tone right. These are tasks where the model’s training adds real value, and where we want it making the call.
Some things the AI does badly. Counting. Format checking. Verifying that a value falls in a range. Confirming that a system action actually happened. The same AI that wrote an article with seven citations could tell you, when asked, that it cited eight. The same AI that updated a CRM record incorrectly could tell you it updated it correctly. These are not lies. They are completions. The model produces the response that fits the structure of the question, not the response that reflects what actually happened.
Code does not have this failure mode. Code can count. Code can verify a range. Code can match a citation against the source character by character. Code can read the CRM record back and confirm what was actually written.
The rule that follows is simple. For anything that can be checked mechanically, use code, not the AI. For anything that needs judgement, use the AI. Asking the AI to verify its own counts, formats, or actions is asking for fabrication.
For every claim the AI makes about its own output, including actions it says it has taken in a system, is there code that independently verifies that claim?
Reduce the AI’s freedom to invent. Limit what it can see, and where possible give it a fixed list to pick from rather than open-ended text to generate.
There are two ways to reduce fabrication at the input side. The first is to limit what information the AI sees. Give it only what the current task needs, when it needs it. A model handed irrelevant context is more likely to use it.
The second is to limit the form of what the AI is asked to produce. When the AI needs to make a choice between known options, give it a list and have it pick. The model returns an identifier from the list; code then looks up the real content. The model has no opportunity to invent an option that does not exist.
The AI may invent a product that does not exist, complete with features and a price point. It sounds right. It is not.
The model returns an ID. Code looks up the real product. Fabrication is impossible at this step.
In one production system Serpin designed, this single change took the fabrication rate from above 40% to under 10%. The model could no longer invent options. It could only select from a registered set, and code did the lookup.
What does the AI see, and what shape of output is it free to produce? Have we limited both to what the task actually needs?
If the same AI both fetches data and reasons over it, you cannot independently check what it actually retrieved.
When an AI agent fetches its own data, you have to take its word for what it retrieved. The retrieval is invisible. That trust is unsafe, because the same model that may fabricate in its output may also fabricate in what it reports having found.
The pattern is to split the work. One system fetches: either plain code, or a tightly-controlled agent with no creative latitude. A different system reasons on what was fetched. The retrieved content is recorded as the canonical input, available for audit. The reasoning model never goes to look itself, so it has no opportunity to invent a source.
The agent that reasons never goes to fetch. The system that fetches has no creative latitude. Two separate jobs.
This is one of the named anti-fabrication functions in the Bounded Agency framework: making sure that what the agent says it has retrieved is actually what it has retrieved.
Who fetches the data, the reasoning AI itself, or a separate system? How do we verify what was actually retrieved?
When the AI is required to produce a complete answer, it may invent the missing parts rather than admit it could not find them.
The AI is trained to complete the task you give it. If you ask for five sources and only three exist, the model may produce five by inventing two. Completing the task is its default behaviour. That completion pressure is the failure mode.
The fix is to design the task so the AI has a clear way to report what it could not do. “Find between three and six sources. If you cannot find three with appropriate evidence, return what you have, with a flag and a short note explaining why you could not find more.” The model now has a structured way to say “I cannot fully complete this.”
Where there is no permission to report a partial answer, the AI may fail by fabricating.
Can the AI return a partial result, flag uncertainty, or escalate? Or does the task force a complete answer every time?
A citation that exists is not a citation that traces. Check it against the sources you gave the AI, not the AI’s word that it traces.
The AI may produce a citation marker [7], a real-sounding publication name, a plausible article title, and a believable statistic, and none of it actually appears in the source corpus the AI was given.
The pattern is provenance, not presence. After the AI produces an output that references sources, run a verification step that matches each citation against the actual corpus. If a citation does not trace to a source in the verified-sources list, it is rejected. If a quote is claimed, character-level matching confirms the quote actually appears in the source.
Each citation is matched against the corpus the AI was given. No match means rejection, regardless of how plausible the citation reads.
This is the layer that catches the plausible-sounding fabrication, which is the most dangerous kind.
When the AI cites a source, do we verify it against the actual sources we gave it, or only that the citation exists?
A single accuracy score tells a reviewer nothing about which facts are unreliable. Tag each fact individually so the gaps are visible and challengeable.
Aggregate quality scores, such as “this output is 85% accurate”, are themselves fabrications. The model that produced an 85 could have produced a 73 or a 92 with equal confidence. There is no defensible derivation.
The architectural alternative is per-element provenance. Every fact in the AI’s output carries one of three labels. Source-confirmed: the source explicitly states this. Inferred: the source implies this through reasoning the model is confident in. Assumed: the model is filling in a likely-but-not-supported detail.
Indefensible. No reviewer knows which 15% is unreliable, where the gaps were filled, or which facts to push back on.
A reviewer sees that revenue figures were source-confirmed, growth rates were inferred, market sizing was assumed. Each label is challengeable.
A reviewer can see which specific facts were inferred or assumed, and challenge them individually. The model is also less likely to assume aggressively when it knows the assumption will be visibly labelled. The label is a structural disincentive against silent gap-filling.
This builds trust precisely because the system surfaces its own uncertainty rather than hiding it under a single number.
Does our output show which facts came directly from the source and which the AI filled in? Or do we collapse everything into one confidence score?
Before the AI takes an action in a real system, two checks sit between: a code check for the rules that must hold, and a second AI looking specifically for fabrications.
Before any API call, database write, system action, or external communication, the proposed output gets checked one more time against policy.
The check has two parts. First, code-enforced rules, the deterministic floor. The output must satisfy specified properties before it leaves the system. Second, an independent verifier, typically a separate agent with a different prompt or model, looking specifically for fabrications the earlier stages may have missed.
Two checks, not one. Code catches the structural failures the AI cannot reliably check. The independent verifier catches plausible-sounding fabrications code cannot distinguish from real content.
Neither check alone is sufficient. Together they form the last gate before the AI’s output affects the real world. When the gate fires, the system corrects or escalates. Correction may mean re-prompting the AI with the specific constraint, regenerating within budget, or routing to an alternative path. Escalation means handing the case to a human, with the relevant evidence and a specific decision to make.
AgentSpec (ICSE 2026) reports over 90% prevention of unsafe agent actions at runtime, with overhead measured in milliseconds. The control layer is fast. It is also the final line of defence before the AI’s output affects the real world.
What sits between the AI’s proposed action and the external system it would affect? Is the check both code-enforced and independently verified, with a defined correction or escalation path when it fires?
AI agents will sometimes make things up. Right now, it is a feature of the technology. Their remarkable ability to produce useful work comes with the risk that they will sometimes fabricate, and that fabrication is sometimes called hallucination.
The job of senior leaders is not to wait for fabrication-free AI. It is to design AI agent systems that prevent fabrication where they can, and catch anything that gets through.
We use the word fabrication. The AI is not perceiving falsely. It is completing the structure of a plausible response when it does not have the data. The output sounds right. It may or may not be right.
Three real-shaped examples. Each one passes the eye test. Each one is a fabrication the system reports as a success.
“As Greenfield & Brar (2024) note in the Journal of Operational Strategy, agentic systems show a 32% reduction in cycle time across mid-cap firms.”
The journal exists. The authors do not. The article does not. The percentage was invented by the AI to fit the surrounding argument.
“Updated next-step field on Acme Industries opportunity. Status moved to Stage 3. Logged activity note.”
The AI confirms it did all of this. The opportunity field still reads “Stage 2”. The activity note was never created. The update never happened.
“Posted entry to account 6400 (Marketing Expenses). Amount: £12,840. Memo: Q4 campaign creative.”
The entry posted. The account number was wrong. The real total now overstates marketing by £12,840 and understates the right account by the same amount.
All three pass every check that only looks at the surface. The seven patterns are designed against this.
Layered architecture, not a checklist.
What you have just read is one example of what Serpin does: design and build AI agent systems that ship, run reliably, and stand up to scrutiny. Anti-fabrication is one strand. Access control, data architecture, escalation, audit, and governance are the others.
If your team can answer all seven with a real “yes, and here is how”, the system is calibrated. Any “no” or “we don't know” is where fabrication enters.
Bounded Agency is Serpin’s model for designing AI agents that are reliable, secure, and deliver real value.
Anti-fabrication is one strand. The framework covers the controls senior teams need across the four levels of agent autonomy: access control, context management, data architecture, memory management, guardrails, audit, escalation, and governance.
The First AI Agent Playbook covers the decision before the architecture: which AI agent project to build first. The full Bounded Agency framework (31 pages) is available on request.
Thirty minutes with me to talk through your AI agent systems.
We work through where these patterns would apply, and what to put in place next.