72%
reduction in report preparation time
AI Risk & Audit Documentation System for EY
EY (Ernst & Young)
72%
Report prep time saved
50+
Concurrent engagements
<2%
Escalation rate
0
Hallucination incidents
Context
The business context
Big 4 audit practices live under unrelenting regulatory pressure. Every engagement demands exhaustive documentation - risk matrices, compliance cross-references, workpapers, and partner-ready summaries - before a single judgment call can be made. At EY, that documentation work was falling on senior auditors who should have been spending their hours on the high-stakes reasoning that clients pay for. The problem wasn't a shortage of talent. It was that talent was buried under assembly work.
The problem
5 specific problems that needed solving
Senior auditors spending 40+ hours per engagement on manual document assembly across 8 disconnected systems
Risk assessment drafts taking 24–36 hours before a partner could begin review - compressing decision windows during busy season
No single source of truth: compliance frameworks lived in PDFs, spreadsheets, and two separate legacy databases
Existing rule-based tools generated uncited outputs, making AI-assisted drafts unusable for regulated client work
Seasonal spikes (year-end audit season) created a 3× workload surge that existing headcount couldn't absorb
Our approach
Trustability first. Speed second.
Every prior attempt at AI in audit had failed for the same reason: hallucinated outputs that auditors couldn't trust or cite. Our core design principle was citation-first - every claim the system makes must link directly to a source document, regulatory reference, or data record. We spent the first two weeks not building software, but mapping exactly which document types lived where, who was authorised to access what, and what a 'good enough to hand to a partner' draft actually looked like. We then designed a human-in-the-loop architecture where AI handles assembly and cross-referencing, and every human touch is on judgment - not data retrieval.
Built evaluation harnesses before building the product - defined what 'correct' looked like with audit experts upfront
Chose LangGraph over a simple prompt chain to give each reasoning step an explicit, auditable state graph
Operated entirely inside EY's existing Azure tenancy - zero data left EY's perimeter at any point
Designed escalation triggers: if confidence on any claim fell below threshold, the agent flagged for human review rather than guessing
What we built
A multi-agent audit intelligence system
The system is a LangGraph-based multi-agent pipeline with four specialised agents: a Data Retrieval Agent that queries all 8 source systems via secure API connectors, a Regulatory Mapping Agent that cross-references extracted data against IFRS, SOX, and local GAAP frameworks stored in a structured knowledge base, a Draft Generation Agent that writes sections of the risk assessment with inline citations, and a Review Routing Agent that scores output confidence and routes low-confidence sections directly to a named partner queue. Everything is logged to an immutable audit trail in Azure Blob Storage.
Data connectors
Secure REST connectors to all 8 source systems - ERPs, legacy workpaper tools, and the firm's proprietary client data platform - with field-level access control per engagement team.
Regulatory knowledge base
Structured vector store of IFRS, SOX, and 12 local GAAP frameworks with versioning, so the system always cites the correct year of the standard and flags when a referenced standard has been updated since last engagement.
Citation engine
Every sentence in a generated draft carries a citation ID linking back to a specific source document and page. Reviewers click any sentence to see its evidence chain instantly.
Confidence routing
Any output section scoring below the defined confidence threshold is automatically routed to the relevant human reviewer with the agent's uncertainty reasoning pre-populated - so partners see exactly why the AI wasn't sure.
Audit trail
Every agent decision, API call, and human override is written to an immutable log in Azure Blob Storage. The entire decision history for a client report can be exported as a single compliance record.
Impact
What changed in production
The numbers only tell part of the story. The bigger shift was cultural: auditors started treating AI output as a first draft they refined, not a risk they managed around.
72% reduction in report preparation time. Zero hallucination incidents. 50+ concurrent engagements handled across 6 countries with full audit trails.
72%
Report prep time saved
50+
Concurrent engagements
<2%
Escalation rate
0
Hallucination incidents
“This system handles the most time-consuming part of our audit cycle. What used to take two senior staff three days now takes hours - with full traceability. Our partners now spend their time on the work that actually requires their judgment.”
Senior Audit Director
Senior Audit Director - EY
Learnings
What we took away from this project
Citation architecture is the product
We thought the hard part would be getting accurate AI outputs. The real challenge was building a citation system trustworthy enough that a senior partner would stake their sign-off on it. Every architectural decision - from the knowledge base schema to the confidence thresholds - was driven by that requirement.
Change management is as important as the build
The first version of the system was technically solid but adoption was slow. Auditors didn't distrust the technology - they distrusted change during the highest-stakes period of their year. We ran structured onboarding sessions, kept human override frictionless, and made the first two months a 'co-pilot mode' where auditors could see AI drafts alongside their manual work. Adoption accelerated sharply once they had direct comparison evidence.
Evaluation harnesses are non-negotiable in compliance
We built a suite of 200+ test cases before shipping to production - covering edge cases in regulatory cross-referencing, ambiguous jurisdiction overlaps, and known failure modes from previous AI attempts at EY. This harness now runs on every model update, giving the engineering team and EY's risk team a shared quality gate.
72%
reduction in report preparation time
At a glance
Tech stack
Capabilities
Build something similar?
We've solved this category of problem before. Let's scope yours.
Start a conversation View related serviceMore work