Multimodal RAG Needs An Evidence Router

Builders can use Google's File Search update as a concrete checklist for designing retrieval systems that are scoped, inspectable, and useful for serious work.

The next useful retrieval product will not just find the right document. It will find the right slice of evidence, under the right scope, in the right format, with enough provenance for a person to trust it.

That is the practical signal in Google's May 5 Gemini API File Search update. Google says File Search now supports three capabilities that belong together: multimodal retrieval, custom metadata filtering, and page-level citations.

The thesis: multimodal RAG is becoming an evidence router, not a search box.

Why This Matters Now

Most production knowledge systems were designed as if the source of truth were text. That was always incomplete. Product truth lives in screenshots. Legal truth lives in PDFs. Engineering truth lives in diagrams. Support truth lives in images, transcripts, attachments, and old incident notes. Operations truth is scattered across the messy formats people actually use.

Google says File Search can now process images and text together, powered by Gemini Embedding 2. Its developer docs also describe File Search as managed RAG infrastructure: upload or import data, chunk it, embed it, index it, then use retrieval as model context.

That moves the design question. The hard part is no longer only, "Can the system retrieve something relevant?" The better question is, "Can it route a query to evidence that is scoped, inspectable, and useful for the decision being made?"

The Evidence Router Framework

An evidence router has four jobs.

1. Modality routing. The system should know when the answer may live outside text. A support agent investigating a damaged shipment may need a customer photo. A developer agent may need an architecture diagram. A procurement tool may need a PDF page, not a summary of the whole contract.

2. Scope routing. Retrieval needs boundaries. Google's update adds custom metadata filtering, which lets developers attach labels such as department, status, author, year, customer, or workflow and filter at query time. That is not a replacement for permissions, but it is a useful control surface. An answer about a finalized policy should not be grounded in a draft by accident.

3. Citation routing. The answer should point back to the evidence unit. Google's docs say File Search responses can include page numbers for documents and media IDs for image chunks in grounding metadata. That matters because "based on the file" is too vague for serious work. Users need to inspect the page, image, or source fragment that shaped the answer.

4. Lifecycle routing. Retrieved knowledge has a shelf life. Google's docs say File Search store embeddings persist until manually deleted or model deprecation, while raw uploaded files are deleted after 48 hours. Builders still need product-level policies for deletion, re-indexing, retention, and stale data.

What Builders Should Copy

Do not copy the feature list. Copy the product pattern.

If a workflow touches mixed media, stop forcing everything through a text-only pipeline. Transcribing, captioning, or manually describing every asset can be useful, but it also strips context. A native multimodal retrieval path can preserve visual and document evidence that text summaries miss.

If a workflow has roles, versions, departments, customers, or approval states, treat metadata as part of the retrieval contract. The useful primitive is not "search all files." It is "search the approved customer onboarding docs from the legal store, published after this date, visible to this role."

If a workflow requires trust, show the evidence unit. A citation should not be decorative. It should help a user jump to the source, verify the page, inspect the image, and decide whether the model's answer is supported.

If a workflow will run repeatedly, measure retrieval quality separately from answer quality. Track whether the system retrieved the right asset, whether filters excluded the wrong material, whether citations pointed to the useful page, and whether users still had to search manually after the answer.

Where This Can Break

Evidence routing can still fail.

Metadata can be wrong. Old files can remain indexed. Page citations can point to a source that was retrieved but misinterpreted. Multimodal embeddings can miss a critical visual detail. A managed RAG service can simplify infrastructure while adding vendor, retention, and compliance questions.

The lesson is not that retrieval is solved. The lesson is that serious AI products need retrieval controls that match how work actually happens.

The Takeaway

RAG is moving from "give the model more context" to "route the model to verifiable evidence."

That shift is healthy. It pushes AI products away from magic answers and toward inspectable systems. For founders and operators, the next step is concrete: map the evidence types in the workflow, label the boundaries, expose the citation unit, and evaluate retrieval before trusting the final answer.

The winners will not be the products that search the most data. They will be the products that route the right evidence to the right task at the right moment.