RAG is the new spreadsheet: why most “Enterprise GenAI” will be mediocre

RAG is having its spreadsheet moment. Not in the sense of “it’s revolutionary”, because spreadsheets were never revolutionary in a shiny way. They were revolutionary in the way a simple tool quietly colonizes every corner of an organization. They made analysis accessible. They also created a parallel universe of logic that nobody fully controlled, everybody relied on, and auditors eventually learned to fear.

Retrieval-Augmented Generation is following the same path, with the same promise and the same trap. Give people the ability to ask questions in natural language, point the model at “company knowledge”, and suddenly the friction of finding information collapses. It feels like productivity. It feels like empowerment. It feels like progress.

And then, a few months later, it starts feeling like noise. Retrieval is not the hard part anymore. The industry has largely solved the mechanics. You can embed content, store vectors, run semantic search, add reranking, build a chat interface, and ship something that demos beautifully. You can do it with a dozen vendors, or open source, or a managed cloud stack. The technology is available, the patterns are known, and the barrier to entry is falling fast.

The hard part is everything the spreadsheet never solved either: what data you are allowed to use, how trustworthy it is, who owns it, and why people behave the way they do around it. Most enterprise GenAI will be mediocre for one simple reason. Enterprises will treat RAG like an IT integration problem, when it is actually an information supply chain problem. The difference matters.

In a demo, the content is clean. The permissions are simple. The question is polite. The answer is impressive. In real life, the content is messy, duplicative, contradictory, half outdated, and scattered across repositories that have grown organically for a decade. There is no single “truth”. There are five versions, each written by a different team for a different audience, each with slightly different intent. The model dutifully retrieves something relevant, and then confidently summarizes the wrong thing.

That is not a failure of embeddings. It is a failure of information hygiene. Permissions make it worse. In a real company, access is not binary. It is layered by geography, role, project, sensitivity, union agreements, regulatory constraints, and sometimes political reality. “Everyone can see it” is rarely true. “Nobody can see it” is also rarely true. Most of the time, access is complicated in ways that only make sense if you lived through the organizational history that created it.

RAG systems tend to ignore that complexity at the beginning because it slows down delivery. Then they hit production and discover that “knowledge” is not a neutral asset. It is a controlled substance. And even if you solve content and permissions, incentives will still sabotage you.

People do not withhold information because they are villains. They do it because information is power, because speed beats documentation, because teams optimize locally, because nobody is rewarded for maintaining the boring master version, and because “I’ll just save my own copy” feels safe. The result is predictable. Your RAG system becomes a mirror of your organizational entropy. It will happily retrieve a policy that was replaced six months ago and still lives in someone’s folder called “final_v7_really_final”.

Mediocre RAG looks like this: it answers quickly, often plausibly, sometimes correctly, and rarely with the level of reliability you would bet a decision on. It becomes a tool people use for drafts, orientation, and reassurance, while continuing to escalate the important questions to humans because trust never fully forms.

It is the corporate equivalent of a dashboard that everyone praised and nobody uses to steer the business. So what does “good” actually look like?

Good is not “the model is smarter”. Good is “the system is governable”. Good is when the organization stops pretending that content is free and starts treating it like a product with ownership, lifecycle, and quality control. In practice, it means you curate what the model is allowed to retrieve. You do not point it at the entire digital landfill and hope ranking will save you. You build a smaller, more reliable corpus where the content has explicit owners, update cadence, and deprecation rules. You accept that less content, better maintained, beats more content, poorly trusted.

Good also means permissions are not bolted on. They are part of the architecture. Retrieval respects identity, attributes, and context so that the model never even “sees” what the user should not see. This is where many programs get stuck, because doing it properly forces you to confront identity maturity, repository discipline, and the reality of your access model across countries and functions.

And good means you measure truth, not delight. If you cannot evaluate answers with real business questions, you will ship something that feels impressive and behaves unpredictably. Mature RAG programs build evaluation sets, track citation quality, measure answer stability over time, and observe where users lose confidence. They treat hallucinations, mis-citations, and outdated retrieval as operational defects, not as amusing quirks of “AI being AI”.

There is also a cultural dimension that technology teams often underestimate. A good RAG system changes how people work with knowledge. It forces a new social contract: if you publish something that drives decisions, you own its accuracy and its lifecycle. If you want the AI to be reliable, you must give it reliable inputs. That is not a prompt problem. It is leadership. This is why the spreadsheet analogy is useful. Spreadsheets scaled because they were easy, and because they gave autonomy. They also created uncontrolled complexity because governance arrived late, and because incentives never aligned with maintainability.

RAG will scale the same way. The opportunity is to do better this time. To build a knowledge layer that is smaller, cleaner, permissioned properly, and evaluated like a production service. To accept that “enterprise GenAI” is not about deploying a chat interface, but about industrializing information trust.

If you want a contrarian conclusion, here it is. The winners will not be the companies with the most GenAI features. They will be the companies with the least ambiguous truth. The ones that can answer the boring questions reliably, across countries, across functions, across time.

That is not glamorous. It is, however, the difference between a demo and an operating advantage.

What is RAG in enterprise GenAI, in plain terms?

RAG (Retrieval-Augmented Generation) is the pattern where a model answers a user’s question by first retrieving relevant internal documents, then generating a response grounded in that content. It’s the bridge between “a smart model” and “your company’s actual knowledge”.

Why is “RAG is the new spreadsheet” a fair analogy?

Because it spreads fast, feels empowering, and becomes everywhere before governance catches up. Like spreadsheets, RAG can unlock real productivity, but it can also industrialize confusion if the underlying content is messy or uncontrolled.

Why will most enterprise GenAI with RAG be mediocre?

Because retrieval isn’t the hard part anymore. The hard part is the information supply chain: data quality, content ownership, versioning, access rights, and the incentives that produce duplicated, outdated, contradictory “truth” across teams and countries.

If retrieval works, why does RAG still give wrong answers?

Because “relevant” is not the same as “correct for this decision.” The system can retrieve a policy that is outdated, a document written for a different context, or a half-true deck that survived ten reorganizations. The model then synthesizes confidently from imperfect inputs.

What is the #1 hidden failure mode of RAG in production?

Stale truth. People underestimate how quickly content ages, how rarely it gets retired, and how often “the latest version” lives in someone’s private folder. RAG doesn’t fix that. It makes it searchable.

Why are permissions and identity the real battlefield for enterprise RAG?

Because enterprises don’t have simple access rules. They have layered entitlements by role, geography, project, sensitivity, and regulation. Good RAG enforces access at retrieval time so the model never sees what the user shouldn’t see. If you bolt this on later, you either kill usability or create risk you can’t defend.

Should we index everything (SharePoint, Teams, drives) to “maximize coverage”?

Usually no. Indexing everything maximizes ambiguity. A smaller curated corpus with owners, refresh cadence, and retirement rules often beats a massive digital landfill. Trust scales adoption. File count doesn’t.

What does “knowledge as a product” mean in a RAG program?

It means content has an accountable owner, quality standards, a refresh rhythm, and a deprecation process. If a document drives decisions, someone must be responsible for keeping it true. Without that, you’re asking GenAI to produce reliability from organizational entropy.

How do we measure whether RAG is actually good?

Not by “user delight” alone. You measure correctness on real business questions, citation quality, stability over time, and the rate of escalations to humans. You treat mis-citations, outdated retrieval, and inconsistent answers as operational defects, not “AI quirks.”

How do we reduce hallucinations and confident nonsense in RAG?

Start with the inputs: deduplicate, version, curate, and enforce permissions properly. Then constrain behavior: require citations, limit answers to retrieved context for certain categories, and route high-impact queries into controlled workflows or approvals.

What does “good” look like, concretely, for an enterprise RAG setup?

A curated knowledge layer with owners and lifecycle, identity-aware retrieval, observability that shows what was retrieved and why, and an evaluation loop that keeps quality from drifting. It feels less like a chatbot and more like a trustworthy information service.

When should we avoid chat-style RAG and use something else?

When the outcome must be deterministic, audit-grade, or drawn from systems of record with strict controls. In those cases you often need structured data products, verified queries, controlled workflows, or agent patterns that execute bounded actions rather than freestyle synthesis.

RAG is the new spreadsheet: why most “Enterprise GenAI” will be mediocre

Next PostThe Agent Factory: from cute demos to industrial-scale AI agents