The first time we hooked Claude into a project workflow at Oase, it drafted a client status update. The prose came back clean, structured, the right tone.
It didn't know which version of the homepage the team had shipped. It didn't know the client had rejected the green logo two weeks earlier. It didn't know the designer had moved feedback into a new thread.
The output was technically correct and operationally useless. If you've plugged AI into your agency this year, you've probably hit something similar.
RAND Corporation pegs the AI project failure rate at over 80%. MIT's NANDA project, in their State of AI in Business 2025 report, found 95% of enterprise GenAI pilots never reach production. Most diagnoses blame the model. We thought the same — until we built one for ourselves and figured out we were asking the wrong question.
The model wasn't wrong. It had nothing to work with.
What "context" actually means
When most people say "context" they mean the prompt — the instructions you type into the chat. That's a tiny slice.
The real context lives outside the prompt: who the client is, what's been decided, which assets are canonical, which got rejected, who's blocking who, what the last conversation actually said. In an agency, that information is usually distributed across half a dozen places at once.
Slack threads in three workspaces. Email chains forwarded into other email chains. A Drive folder where v3_FINAL_FINAL.psd lives next to v3_FINAL_use_this_one.psd. WhatsApp messages from the client at 11pm. A whiteboard photo someone took in the studio. The collective memory of the three people who were actually on the call.
The model sees none of this. You feed it a prompt and a polite question. It produces fluent, plausible output. The fluency hides the fact that none of the real context is in the room.
Why pilots quietly die
This is the shape of most AI failures we've watched up close.
A team prototypes with one project's data hand-loaded into the prompt. The demo lands. Leadership greenlights a rollout. Then the rollout meets reality: ten clients, twenty active projects, eighteen months of file history, four people whose job is partly to remember which version of which asset is current.
The prompt-loading trick that worked for one demo doesn't scale to actual operations. The team starts manually copy-pasting context into ChatGPT for each task. The savings disappear. Engagement drops. Six months in, somebody quietly stops renewing the seat.
That's what RAND's 80% and MIT's 95% are made of. Not models hallucinating. Pilots that worked once and couldn't survive the move from demo to daily use.
The inversion
After we hit this wall a few times, we changed the order of operations. Stop bolting AI on top of the existing tool stack. Build the substrate first. Then let the model work on top of something real.
The substrate is unsexy. It's the boring part: making sure that when a designer uploads a file, it gets described and indexed. Making sure feedback gets attached to the actual deliverable, not orphaned in a thread. Making sure every conversation, decision, and asset is queryable later by something other than human memory.
This is mostly data engineering. It doesn't demo well. You can't film a 30-second product video about it. It's the work most teams skip, and it's the work the studies are counting as missing when they tally the failures.
What we built into Oase
Three substrate pieces we put in before any user-facing AI shipped:
Auto-tagging at upload. Every file an agency uploads gets described and tagged by AI in the background. Not a button you press. At-rest, by default. Six months later, when somebody asks "where's the version of the Acme logo with the darker green?" — the answer is one query away instead of three Slack searches and a guess.
Project-scoped vector embeddings. Files, messages, decisions, project notes — all embedded so they're searchable by meaning, not just by filename. Scoped to the project, not cross-tenant, because clients shouldn't see each other's work and neither should an AI reading on their behalf.
Annotations anchored to the asset, not the chat. When a client clicks a pin on the homepage hero and says "this feels cold," the comment lives on the file. The next person opening that asset sees the annotation in place. The AI summary of feedback then has something coherent to read instead of pulling fragments out of Slack.
That's the substrate. The AI features sit on top of it, not the other way around.
What this looks like in practice
"What did the client say about the homepage hero?"
Without substrate: the model produces something fluent, generic, possibly wrong. You go check Slack anyway.
With substrate: the system returns the exact comments tied to the exact asset — the date, who left them, which round of revisions they came from. The model didn't get smarter. The system around it got smarter, and the model finally had something real to read.
The difference is invisible in a demo. It's most of the game in production.
Why teams skip this
Honest reason: the substrate work doesn't fit the AI narrative.
It's not a feature you ship in a sprint and tweet about. It's not something a generic chatbot wrapper sells. It's operational glue — exactly the work the existing tool stack refused to do, because their incentive is to sell the integration surface and let your team handle the rest.
Your team "handling the rest" is the part that breaks. Then the AI pilot fails. Then RAND adds you to the 80%.
What actually changes
If you're staring down the same numbers — 80% failure, 95% never shipping — the path out probably isn't a better model. It's the substrate the model is reading from.
Build the system that knows what was decided, what was uploaded, what was rejected, what's canonical. Make the AI a thin layer on top of that. The pilot stops being a demo and starts being an operations tool. The failure stats stop being your story.
Sources: RAND, The Root Causes of Failure for Artificial Intelligence Projects; MIT NANDA, State of AI in Business 2025, via Fortune.
