Between 2022 and 2024 we implemented RAG systems in legal, banking, and B2B SaaS. The architectures looked different, but the success patterns were surprisingly consistent.
RAG was hyped as the answer to everything in 2023. In reality, it was an excellent fit for a specific class of problems: question-answering and drafting on top of large, changing bodies of text. What changed between the first prototypes and stable production deployments was not the headline pattern, but the implementation details that nobody talked about in the demos.
Across RAG deployments in legal, banking, and B2B SaaS between 2022 and 2024, we saw the same architectural lessons appear independently in each domain. The surface details differed - different document types, different access controls, different user expectations - but the success patterns were strikingly similar. This is what they were.
What Stayed the Same Across Domains
Whether we were working with policy documents in a bank, contracts in a legal team, or product docs in a SaaS company, three ingredients showed up over and over: carefully-chosen chunking strategies, aggressively-used metadata filters, and user experiences built around citations.
The citation effect
The simple act of linking every key statement to a source document changed how quickly users trusted the system - especially lawyers and compliance staff. Trust that took months to build through accuracy alone could be established in weeks when users could verify every claim instantly.
Chunking: Where Most RAG Systems Go Wrong First
The default chunking strategy - split on fixed token boundaries - produced poor retrieval quality in almost every domain-specific deployment we encountered. Contracts, policies, and technical documentation have semantic structure that matters: a clause is a unit, a section is a unit, a procedure step is a unit. Splitting across those boundaries produces chunks that are syntactically intact but semantically incomplete.
- Legal: chunk by clause and provision, not by token count - contract clauses are the natural unit of retrieval
- Banking: chunk by policy section with explicit metadata for effective date, jurisdiction, and policy version
- SaaS docs: chunk by heading level, keeping examples with the text that explains them
- All domains: include a short document-level summary as a prefix on every chunk - it dramatically improves relevance of retrieved passages
- Test chunking quality directly: retrieve against 20–30 known-answer questions before investing in evaluation infrastructure
Hybrid Search: Why Pure Vector Search Was Not Enough
Pure vector search underperformed in every enterprise domain we worked in. The reason was consistent: enterprise knowledge is full of precise identifiers - policy numbers, contract clause references, product codes, regulatory citations - that matter exactly, not approximately. Vector search treats these as semantic signals. Keyword search treats them as exact matches. You need both.
The pattern that worked was hybrid retrieval: dense vector search for semantic relevance combined with BM25 or keyword search for exact identifier matching, merged with a Reciprocal Rank Fusion step before the re-ranker. Adding this hybrid layer improved retrieval precision by 15–30% on domain-specific benchmarks in the deployments we measured.
Metadata Filters: The Underused Performance Lever
Metadata-based pre-filtering before vector search was one of the highest-leverage improvements we made repeatedly across domains. In banking, filtering by jurisdiction and effective date before running semantic search dramatically reduced irrelevant results from outdated or non-applicable policies. In legal, filtering by document type and matter meant the search space was always domain-appropriate.
The investment required was metadata curation at ingestion - tagging every document with the right attributes. That investment paid off many times over in retrieval quality and user trust. Teams that skipped it spent months debugging retrieval issues that metadata would have prevented.
Access Controls: The Non-Negotiable
In every regulated domain, access control was not a nice-to-have - it was a prerequisite for going live. The RAG system needed to know which user was querying, what documents they were authorised to see, and filter retrieval results accordingly at query time - not just at ingestion time. Users who could not see a document in the source system should not be able to retrieve it via the AI assistant.
Implementing this correctly required early alignment between the AI team and the IAM and data governance teams. Banks needed per-document authorisation tags synced from their DLP systems. Legal teams needed matter-level access controls. SaaS companies needed tenant isolation. The architecture differed, but the requirement was identical: the RAG system must respect the same access controls as the underlying documents.
Domain-Specific Evaluators Mattered More Than the Embedding Model
One of the most important and least intuitive lessons from cross-domain RAG work was that the choice of evaluation set and evaluator mattered more than the choice of embedding model. A domain expert who could identify when a retrieved passage was technically correct but contextually irrelevant caught failure modes that no automated metric found. Building a small team of domain evaluators into the development process was as important as building the technical stack.
Building something in this space?
We'd be happy to talk through your use case. No pitch - just an honest conversation about what's feasible.
Book a 30-minute callKey takeaways
- Different domains required different access controls, but similar retrieval logic
- Hybrid search plus strong metadata beat pure vector search in most cases
- Inline citations dramatically improved trust across all user groups
- Domain-specific evaluators mattered more than the embedding model choice
- Early investment in ingestion pipelines paid off for every new use case