78%
reduction in processing time
AI Reconciliation Agent Cuts FinTech Processing Time by 78%
Global FinTech (Confidential)
78%
Processing time reduction
<5%
Escalation rate
35hr
Saved per week
200k+
Cases processed without incident
Context
The business context
In fintech operations, reconciliation is the work that never ends. Every transaction that doesn't match across systems becomes a discrepancy. Every discrepancy becomes a manual investigation. Every manual investigation takes 20–45 minutes of an ops analyst's time - pulling data from three systems, cross-referencing timestamps, checking counterparty records, and drafting a resolution action for compliance sign-off. At this company, 40+ hours per week of senior analyst time was consumed by a process that was fundamentally pattern-matching: most discrepancies had a known root cause, a known resolution, and a known compliance requirement. The team had tried rule-based automation once before. It had broken under the weight of edge cases and been abandoned.
The problem
5 specific problems that needed solving
40+ hours per week of senior ops analyst time spent on manual reconciliation - work that was largely pattern-matching on known discrepancy types
Previous rule-based automation abandoned after six months: edge cases (unusual counterparty formats, timing discrepancies, partial settlements) broke rules constantly
Three separate source systems with inconsistent data schemas - cross-referencing required manual translation that rules couldn't encode reliably
Compliance requirement for human sign-off on all resolutions: full automation wasn't viable, but the preparation work could be automated
Growing transaction volume meant the manual workload was increasing every quarter - the team needed to scale resolution capacity without scaling headcount
Our approach
Automate the reasoning, not just the rules.
The previous rule-based system had failed because rules can't handle ambiguity. Real-world reconciliation discrepancies don't follow a fixed taxonomy - they're edge cases, partial matches, timing anomalies, and data format inconsistencies that require judgment. We proposed a different framing: instead of trying to automate the resolution decision, automate the investigation and evidence assembly, and keep the resolution decision with a human reviewer. A well-designed system would reduce the time to resolution from 30 minutes to 2 minutes - not by eliminating the human, but by doing all the preparatory work the human was doing manually. That reframing unlocked the project: the compliance team approved it immediately because human sign-off was preserved.
Automate the investigation, not the decision: human reviewers approve resolutions, not just exceptions - keeping compliance requirements intact
LangGraph multi-step agent: each step in the investigation has an explicit state and can be inspected, replayed, or overridden by a reviewer
Evaluation harness built before the agent: defined what 'correctly resolved' meant across 15 discrepancy categories before writing production code
Weekly accuracy and escalation rate reporting to the ops team - building trust incrementally rather than asking for full adoption on day one
What we built
A reasoning agent for financial discrepancy resolution
The system is a LangGraph-based multi-step agent with direct API access to all three source systems. When a reconciliation discrepancy is flagged (by existing batch jobs or real-time triggers), the agent begins a structured investigation: it retrieves the transaction record from each of the three systems, normalises the data into a canonical schema, identifies the discrepancy type, queries relevant counterparty and timing records, and generates a draft resolution action with supporting evidence. The completed package - discrepancy summary, investigation trace, recommended resolution, and compliance documentation - is routed to a human reviewer who can approve in a single click or override with a note. Every decision is logged to an immutable audit trail in PostgreSQL.
Multi-source data retrieval
Secure API connectors to all three source systems with schema normalisation. The agent translates each system's native data format into a canonical transaction schema before any reasoning step - eliminating the manual translation that made rule-based approaches brittle.
Discrepancy classification
The agent's first reasoning step classifies the discrepancy into one of 15 defined categories (timing mismatch, partial settlement, counterparty format error, currency conversion delta, etc.) using a combination of rule-based classifiers and GPT-4o for ambiguous cases. Classification drives the investigation path - different categories trigger different evidence-gathering steps.
Evidence assembly
For each discrepancy category, the agent follows a defined investigation protocol: which counterparty records to retrieve, which timing windows to check, which reconciliation rules apply. The assembled evidence package includes all retrieved data, the agent's reasoning trace, and confidence scores for each conclusion.
Draft resolution generation
GPT-4o generates a structured resolution recommendation in the format required by the compliance team - including the root cause classification, the resolution action, the regulatory basis for the resolution, and any follow-up requirements. Reviewers see the full evidence chain and can approve with one click or override with a free-text note.
Evaluation harness
A suite of 500+ test cases (built from 6 months of historical reconciliation records) validates agent performance on every deployment. The harness measures accuracy by discrepancy category, escalation rate, and false negative rate. Any model update that degrades accuracy on the test suite is blocked from production.
Impact
What changed in production
The 78% reduction in processing time wasn't achieved by replacing human judgment - it was achieved by eliminating the 90% of each case that was data gathering and formatting, not judgment.
Processing time dropped 78%. Escalation rate held below 5%. 35 hours per week returned to the team. 200,000+ cases processed without a data incident.
78%
Processing time reduction
<5%
Escalation rate
35hr
Saved per week
200k+
Cases processed without incident
Learnings
What we took away from this project
Framing for compliance teams is a design decision
The first version of our proposal described the system as 'automated reconciliation.' The compliance team rejected it immediately - their mandate required human sign-off on every resolution. We reframed the system as 'automated investigation with human-approved resolution' and the same compliance team approved it the next day. The technical architecture was identical. The framing determined whether the project happened. Understanding what compliance teams can and cannot approve is as important as understanding what the technology can do.
Test suite quality determines production confidence
The evaluation harness was built from 6 months of historical cases - including every case the team could recall where a rule-based system had failed or where a human had overridden an automated suggestion. Building the test suite from real failure modes, not invented edge cases, gave the ops team something concrete to review. When they could see the agent handling the specific cases that had previously defeated automation, adoption followed naturally.
Escalation rate is a leading indicator of agent health
We tracked escalation rate weekly from day one - not as a failure metric but as a health signal. A rising escalation rate indicates data drift (source systems have changed in ways the agent hasn't adapted to) or scope creep (new discrepancy types are appearing that the agent wasn't trained to handle). Keeping escalation rate below 5% as a weekly SLA gave the team a clear, actionable signal to trigger investigation before accuracy degraded in production.
78%
reduction in processing time
At a glance
Tech stack
Capabilities
Build something similar?
We've solved this category of problem before. Let's scope yours.
Start a conversation View related service