Anti-fabrication patterns

7 patterns to stop your AI making things up.

How Serpin designs AI agent systems to minimise fabrication, and catch what gets through.

7 patterns 4 layers ~15 min read

Want help applying these patterns?

Thirty minutes with me to talk through your AI agent systems.

We work through where these patterns would apply, and what to put in place next.

 

Book a call → 30 mins · Google Meet · Free

Seven patterns to prevent AI fabrication.

These patterns form a layered defence that follow how an AI workflow operates: the AI takes in inputs, reasons over them, produces an output, and finally takes an action in a real system.

Each layer is a point where a specific kind of fabrication can happen. Pattern 01 is the foundation, it sets the principle for how to split work between the AI and the code that surrounds it.

Pattern
Layer
01

Agents decide. Code enforces.

Use the AI for things that need judgement. Use code for anything that can be checked mechanically, like counts, formats, or whether an action was actually completed. Asking the AI to verify its own work is asking for fabrication.

Foundation
02

Give the AI fewer ways to make things up.

Restrict what the AI is free to invent. Limit the information it can see to what the current task needs. Where you can, give it a fixed list to pick from rather than asking it to generate from scratch.

Input layer
03

Separate the system that retrieves from the system that reasons.

If the same AI both fetches data and reasons over it, you cannot independently check what it actually retrieved. Have one system fetch, and a different one reason on what was fetched.

04

Let the AI say “I don’t know.”

When the AI is required to produce a complete answer, it may invent the missing parts rather than admit it could not find them. Design the task so the AI has a clear way to report what it could not do.

Reasoning layer
05

Verify every output against the original source.

The AI can produce a citation or quote that sounds real but does not appear anywhere in the source material it was given. Match every claim back to the actual source, not just check the citation looks plausible.

06

Label every fact: source-confirmed, inferred, or assumed.

A single accuracy score tells a reviewer nothing about which facts are unreliable. Tag each fact in the AI’s output as source-confirmed, inferred from the source, or assumed by the model, so a reviewer can challenge each one individually.

Output layer
07

Build a final control layer before any external action.

Before the AI takes an action in a real system or sends something to a person, run two checks: a code check for the rules that must hold, and a second AI looking specifically for fabrications the first AI might have missed. This is the last line of defence.

Containment layer

How the patterns fit across each layer of an AI workflow.

FLOW OF AGENT WORK INPUT REASONING OUTPUT CONTAINMENT 02 Give the AI fewer ways to make things up 03 Split retrieval from reasoning 04 Permit safe failure 05 Verify against the source 06 Label every fact 07 Final control layer before external action FOUNDATION · UNDERPINS ALL LAYERS 01 Agents decide. Code enforces.

The seven patterns in detail.

Foundation

01Agents decide. Code enforces.

Use the AI for judgement. Use code for anything that can be checked mechanically. The AI is not a reliable verifier of its own work.

Some things the AI does well. It is good at judgement: is this source credible, does this argument flow, is the tone right. These are tasks where the model’s training adds real value, and where we want it making the call.

Some things the AI does badly. Counting. Format checking. Verifying that a value falls in a range. Confirming that a system action actually happened. The same AI that wrote an article with seven citations could tell you, when asked, that it cited eight. The same AI that updated a CRM record incorrectly could tell you it updated it correctly. These are not lies. They are completions. The model produces the response that fits the structure of the question, not the response that reflects what actually happened.

Code does not have this failure mode. Code can count. Code can verify a range. Code can match a citation against the source character by character. Code can read the CRM record back and confirm what was actually written.

The rule that follows is simple. For anything that can be checked mechanically, use code, not the AI. For anything that needs judgement, use the AI. Asking the AI to verify its own counts, formats, or actions is asking for fabrication.

What to ask your team

For every claim the AI makes about its own output, including actions it says it has taken in a system, is there code that independently verifies that claim?

Input layer

02Give the AI fewer ways to make things up.

Reduce the AI’s freedom to invent. Limit what it can see, and where possible give it a fixed list to pick from rather than open-ended text to generate.

There are two ways to reduce fabrication at the input side. The first is to limit what information the AI sees. Give it only what the current task needs, when it needs it. A model handed irrelevant context is more likely to use it.

The second is to limit the form of what the AI is asked to produce. When the AI needs to make a choice between known options, give it a list and have it pick. The model returns an identifier from the list; code then looks up the real content. The model has no opportunity to invent an option that does not exist.

Open prompt
“Suggest the best product for this customer.”

The AI may invent a product that does not exist, complete with features and a price point. It sounds right. It is not.

Constrained pick
“Pick one product ID from this list of seventeen.”

The model returns an ID. Code looks up the real product. Fabrication is impossible at this step.

In one production system Serpin designed, this single change took the fabrication rate from above 40% to under 10%. The model could no longer invent options. It could only select from a registered set, and code did the lookup.

What to ask your team

What does the AI see, and what shape of output is it free to produce? Have we limited both to what the task actually needs?

Reasoning layer

03Separate the system that retrieves from the system that reasons.

If the same AI both fetches data and reasons over it, you cannot independently check what it actually retrieved.

When an AI agent fetches its own data, you have to take its word for what it retrieved. The retrieval is invisible. That trust is unsafe, because the same model that may fabricate in its output may also fabricate in what it reports having found.

The pattern is to split the work. One system fetches: either plain code, or a tightly-controlled agent with no creative latitude. A different system reasons on what was fetched. The retrieved content is recorded as the canonical input, available for audit. The reasoning model never goes to look itself, so it has no opportunity to invent a source.

How the split works
RETRIEVER code or controlled agent no creative output VERIFIED CONTENT canonical record auditable lookup REASONER AI works only with what it was given fetches supplies RETRIEVAL IS SPLIT FROM REASONING

The agent that reasons never goes to fetch. The system that fetches has no creative latitude. Two separate jobs.

This is one of the named anti-fabrication functions in the Bounded Agency framework: making sure that what the agent says it has retrieved is actually what it has retrieved.

What to ask your team

Who fetches the data, the reasoning AI itself, or a separate system? How do we verify what was actually retrieved?

Reasoning layer

04Let the AI say “I don’t know.”

When the AI is required to produce a complete answer, it may invent the missing parts rather than admit it could not find them.

The AI is trained to complete the task you give it. If you ask for five sources and only three exist, the model may produce five by inventing two. Completing the task is its default behaviour. That completion pressure is the failure mode.

The fix is to design the task so the AI has a clear way to report what it could not do. “Find between three and six sources. If you cannot find three with appropriate evidence, return what you have, with a flag and a short note explaining why you could not find more.” The model now has a structured way to say “I cannot fully complete this.”

Where there is no permission to report a partial answer, the AI may fail by fabricating.

What to ask your team

Can the AI return a partial result, flag uncertainty, or escalate? Or does the task force a complete answer every time?

Output layer

05Verify every output against the original source.

A citation that exists is not a citation that traces. Check it against the sources you gave the AI, not the AI’s word that it traces.

The AI may produce a citation marker [7], a real-sounding publication name, a plausible article title, and a believable statistic, and none of it actually appears in the source corpus the AI was given.

The pattern is provenance, not presence. After the AI produces an output that references sources, run a verification step that matches each citation against the actual corpus. If a citation does not trace to a source in the verified-sources list, it is rejected. If a quote is claimed, character-level matching confirms the quote actually appears in the source.

Citation verification flow
AI OUTPUT cites source & claims a statistic CORPUS MATCH character-level verification step PASS citation traces REJECT citation fabricated

Each citation is matched against the corpus the AI was given. No match means rejection, regardless of how plausible the citation reads.

This is the layer that catches the plausible-sounding fabrication, which is the most dangerous kind.

What to ask your team

When the AI cites a source, do we verify it against the actual sources we gave it, or only that the citation exists?

Output layer

06Label every fact: source-confirmed, inferred, or assumed.

A single accuracy score tells a reviewer nothing about which facts are unreliable. Tag each fact individually so the gaps are visible and challengeable.

Aggregate quality scores, such as “this output is 85% accurate”, are themselves fabrications. The model that produced an 85 could have produced a 73 or a 92 with equal confidence. There is no defensible derivation.

The architectural alternative is per-element provenance. Every fact in the AI’s output carries one of three labels. Source-confirmed: the source explicitly states this. Inferred: the source implies this through reasoning the model is confident in. Assumed: the model is filling in a likely-but-not-supported detail.

Single score
“This report is 85% accurate.”

Indefensible. No reviewer knows which 15% is unreliable, where the gaps were filled, or which facts to push back on.

Per-fact labels
Source-confirmed · Inferred · Assumed

A reviewer sees that revenue figures were source-confirmed, growth rates were inferred, market sizing was assumed. Each label is challengeable.

A reviewer can see which specific facts were inferred or assumed, and challenge them individually. The model is also less likely to assume aggressively when it knows the assumption will be visibly labelled. The label is a structural disincentive against silent gap-filling.

This builds trust precisely because the system surfaces its own uncertainty rather than hiding it under a single number.

What to ask your team

Does our output show which facts came directly from the source and which the AI filled in? Or do we collapse everything into one confidence score?

Containment layer

07Build a final control layer before any external action.

Before the AI takes an action in a real system, two checks sit between: a code check for the rules that must hold, and a second AI looking specifically for fabrications.

Before any API call, database write, system action, or external communication, the proposed output gets checked one more time against policy.

The check has two parts. First, code-enforced rules, the deterministic floor. The output must satisfy specified properties before it leaves the system. Second, an independent verifier, typically a separate agent with a different prompt or model, looking specifically for fabrications the earlier stages may have missed.

The final gate
PROPOSED ACTION from the AI FINAL CONTROL LAYER CODE-ENFORCED RULES deterministic floor INDEPENDENT VERIFIER separate prompt or model EXTERNAL SYSTEM action emitted CORRECT re-prompt or regenerate ESCALATE hand to a human

Two checks, not one. Code catches the structural failures the AI cannot reliably check. The independent verifier catches plausible-sounding fabrications code cannot distinguish from real content.

Neither check alone is sufficient. Together they form the last gate before the AI’s output affects the real world. When the gate fires, the system corrects or escalates. Correction may mean re-prompting the AI with the specific constraint, regenerating within budget, or routing to an alternative path. Escalation means handing the case to a human, with the relevant evidence and a specific decision to make.

AgentSpec (ICSE 2026) reports over 90% prevention of unsafe agent actions at runtime, with overhead measured in milliseconds. The control layer is fast. It is also the final line of defence before the AI’s output affects the real world.

What to ask your team

What sits between the AI’s proposed action and the external system it would affect? Is the check both code-enforced and independently verified, with a defined correction or escalation path when it fires?

The fabrications that pass every surface check.

AI agents will sometimes make things up. Right now, it is a feature of the technology. Their remarkable ability to produce useful work comes with the risk that they will sometimes fabricate, and that fabrication is sometimes called hallucination.

The job of senior leaders is not to wait for fabrication-free AI. It is to design AI agent systems that prevent fabrication where they can, and catch anything that gets through.

A note on terminology

We use the word fabrication. The AI is not perceiving falsely. It is completing the structure of a plausible response when it does not have the data. The output sounds right. It may or may not be right.

Three numbers set the scene.

19.9%
Of citations generated by GPT-4o were entirely fabricated. The publication names existed. The articles did not.
Peer-reviewed study · 2025
1,000+
Documented cases worldwide where AI-generated content has been used in court filings and found to contain fabricated material.
Charlotin database
78%
Of executives lack confidence they could pass an independent AI governance audit within 90 days.
Grant Thornton 2026

The dangerous fabrications are the plausible ones.

Three real-shaped examples. Each one passes the eye test. Each one is a fabrication the system reports as a success.

Source · report.pdf, p4

“As Greenfield & Brar (2024) note in the Journal of Operational Strategy, agentic systems show a 32% reduction in cycle time across mid-cap firms.”

The journal exists. The authors do not. The article does not. The percentage was invented by the AI to fit the surrounding argument.

A made-up citation that sounds real.
Salesforce · opportunity 0067

“Updated next-step field on Acme Industries opportunity. Status moved to Stage 3. Logged activity note.”

The AI confirms it did all of this. The opportunity field still reads “Stage 2”. The activity note was never created. The update never happened.

A confirmation that the action never actually took.
NetSuite · journal entry JE-0142

“Posted entry to account 6400 (Marketing Expenses). Amount: £12,840. Memo: Q4 campaign creative.”

The entry posted. The account number was wrong. The real total now overstates marketing by £12,840 and understates the right account by the same amount.

A real action that did the wrong thing, reported as correct.

All three pass every check that only looks at the surface. The seven patterns are designed against this.

Every point where the AI generates text is a trust boundary.

The seven patterns enforce a check at each one.

The patterns operating together.

Layered architecture, not a checklist.

How the layers work

  • Pattern 01 sits underneath as the foundation. Use code for what can be checked mechanically, the AI for what needs judgement.
  • Patterns 02, 03, and 04 prevent fabrication at the source. They limit what the AI can see and produce, split retrieval from reasoning, and let the AI report when it could not complete the task.
  • Patterns 05 and 06 verify the output before it lands. They match each claim back to the source, and label every fact as source-confirmed, inferred, or assumed.
  • Pattern 07 stops anything that slips through. A final gate combining a code check and a second AI verifier, with a defined correction or escalation when either fires.

What you have just read is one example of what Serpin does: design and build AI agent systems that ship, run reliably, and stand up to scrutiny. Anti-fabrication is one strand. Access control, data architecture, escalation, audit, and governance are the others.

Take this to your team

The seven questions, one printable check.

If your team can answer all seven with a real “yes, and here is how”, the system is calibrated. Any “no” or “we don't know” is where fabrication enters.

Pattern 01
For every claim the AI makes about its own output, including actions it says it has taken in a system, is there code that independently verifies that claim?
Pattern 02
What does the AI see, and what shape of output is it free to produce? Have we limited both to what the task actually needs?
Pattern 03
Who fetches the data, the reasoning AI itself, or a separate system? How do we verify what was actually retrieved?
Pattern 04
Can the AI return a partial result, flag uncertainty, or escalate? Or does the task force a complete answer every time?
Pattern 05
When the AI cites a source, do we match the citation back to the actual source material, or only check that the citation looks plausible?
Pattern 06
Does our output show which facts came directly from the source and which the AI filled in? Or do we collapse everything into one confidence score?
Pattern 07
What sits between the AI’s proposed action and the system it would affect? Is the check both a code check and a second AI check, with a defined correction or escalation path?

Where this fits in the wider framework.

Bounded Agency is Serpin’s model for designing AI agents that are reliable, secure, and deliver real value.

Anti-fabrication is one strand. The framework covers the controls senior teams need across the four levels of agent autonomy: access control, context management, data architecture, memory management, guardrails, audit, escalation, and governance.

The First AI Agent Playbook covers the decision before the architecture: which AI agent project to build first. The full Bounded Agency framework (31 pages) is available on request.

Want help applying these patterns?

Thirty minutes with me to talk through your AI agent systems.

We work through where these patterns would apply, and what to put in place next.

 

Book a call → 30 mins · Google Meet · Free