Thinkscoop
AI Risk & Audit Documentation System for EY
Professional Services 12 weeksAI AgentsEnterprise SolutionsAI Integration

72%

reduction in report preparation time

AI Risk & Audit Documentation System for EY

EY (Ernst & Young)

EY (Ernst & Young)

72%

Report prep time saved

50+

Concurrent engagements

<2%

Escalation rate

0

Hallucination incidents

Context

The business context

Big 4 audit practices live under unrelenting regulatory pressure. Every engagement demands exhaustive documentation - risk matrices, compliance cross-references, workpapers, and partner-ready summaries - before a single judgment call can be made. At EY, that documentation work was falling on senior auditors who should have been spending their hours on the high-stakes reasoning that clients pay for. The problem wasn't a shortage of talent. It was that talent was buried under assembly work.

The problem

5 specific problems that needed solving

Senior auditors spending 40+ hours per engagement on manual document assembly across 8 disconnected systems

Risk assessment drafts taking 24–36 hours before a partner could begin review - compressing decision windows during busy season

No single source of truth: compliance frameworks lived in PDFs, spreadsheets, and two separate legacy databases

Existing rule-based tools generated uncited outputs, making AI-assisted drafts unusable for regulated client work

Seasonal spikes (year-end audit season) created a 3× workload surge that existing headcount couldn't absorb

EY (Ernst & Young) - solution

Our approach

Trustability first. Speed second.

Every prior attempt at AI in audit had failed for the same reason: hallucinated outputs that auditors couldn't trust or cite. Our core design principle was citation-first - every claim the system makes must link directly to a source document, regulatory reference, or data record. We spent the first two weeks not building software, but mapping exactly which document types lived where, who was authorised to access what, and what a 'good enough to hand to a partner' draft actually looked like. We then designed a human-in-the-loop architecture where AI handles assembly and cross-referencing, and every human touch is on judgment - not data retrieval.

Built evaluation harnesses before building the product - defined what 'correct' looked like with audit experts upfront

Chose LangGraph over a simple prompt chain to give each reasoning step an explicit, auditable state graph

Operated entirely inside EY's existing Azure tenancy - zero data left EY's perimeter at any point

Designed escalation triggers: if confidence on any claim fell below threshold, the agent flagged for human review rather than guessing

What we built

A multi-agent audit intelligence system

The system is a LangGraph-based multi-agent pipeline with four specialised agents: a Data Retrieval Agent that queries all 8 source systems via secure API connectors, a Regulatory Mapping Agent that cross-references extracted data against IFRS, SOX, and local GAAP frameworks stored in a structured knowledge base, a Draft Generation Agent that writes sections of the risk assessment with inline citations, and a Review Routing Agent that scores output confidence and routes low-confidence sections directly to a named partner queue. Everything is logged to an immutable audit trail in Azure Blob Storage.

1

Data connectors

Secure REST connectors to all 8 source systems - ERPs, legacy workpaper tools, and the firm's proprietary client data platform - with field-level access control per engagement team.

2

Regulatory knowledge base

Structured vector store of IFRS, SOX, and 12 local GAAP frameworks with versioning, so the system always cites the correct year of the standard and flags when a referenced standard has been updated since last engagement.

3

Citation engine

Every sentence in a generated draft carries a citation ID linking back to a specific source document and page. Reviewers click any sentence to see its evidence chain instantly.

4

Confidence routing

Any output section scoring below the defined confidence threshold is automatically routed to the relevant human reviewer with the agent's uncertainty reasoning pre-populated - so partners see exactly why the AI wasn't sure.

5

Audit trail

Every agent decision, API call, and human override is written to an immutable log in Azure Blob Storage. The entire decision history for a client report can be exported as a single compliance record.

Impact

What changed in production

The numbers only tell part of the story. The bigger shift was cultural: auditors started treating AI output as a first draft they refined, not a risk they managed around.

72% reduction in report preparation time. Zero hallucination incidents. 50+ concurrent engagements handled across 6 countries with full audit trails.

72%

Report prep time saved

50+

Concurrent engagements

<2%

Escalation rate

0

Hallucination incidents

This system handles the most time-consuming part of our audit cycle. What used to take two senior staff three days now takes hours - with full traceability. Our partners now spend their time on the work that actually requires their judgment.
S

Senior Audit Director

Senior Audit Director - EY

Learnings

What we took away from this project

Citation architecture is the product

We thought the hard part would be getting accurate AI outputs. The real challenge was building a citation system trustworthy enough that a senior partner would stake their sign-off on it. Every architectural decision - from the knowledge base schema to the confidence thresholds - was driven by that requirement.

Change management is as important as the build

The first version of the system was technically solid but adoption was slow. Auditors didn't distrust the technology - they distrusted change during the highest-stakes period of their year. We ran structured onboarding sessions, kept human override frictionless, and made the first two months a 'co-pilot mode' where auditors could see AI drafts alongside their manual work. Adoption accelerated sharply once they had direct comparison evidence.

Evaluation harnesses are non-negotiable in compliance

We built a suite of 200+ test cases before shipping to production - covering edge cases in regulatory cross-referencing, ambiguous jurisdiction overlaps, and known failure modes from previous AI attempts at EY. This harness now runs on every model update, giving the engineering team and EY's risk team a shared quality gate.

72%

reduction in report preparation time

At a glance

ClientEY (Ernst & Young)
IndustryProfessional Services
Timeline12 weeks

Tech stack

GPT-4oLangGraphAzure OpenAIAzure Blob StorageTypeScriptNode.jsReact

Capabilities

AI Agents
Enterprise Solutions
AI Integration

Build something similar?

We've solved this category of problem before. Let's scope yours.

Start a conversation View related service