The Enterprise AI Knowledge Platform Their Own Team Couldn't Ship Case Study

Before we showed up

One of the world's largest beer brands. Digital marketing operations across markets in Europe, Asia, and Southeast Asia. More than 5,000 people in the global digital team. They found us through a Netherlands-based agency partner.

The agency's first call was direct: this is a Fortune 500 enterprise client and they needed someone who could build at that level. They had tried to find the right person before. They were not sure we were it.

The gap

The brief was open: "Help us understand what AI can do for us."

Two calls in, the real problem became clear. Over years of global campaigns, the team had produced an enormous amount of valuable material. Campaign retrospectives, media plans, performance reports, brand documentation, creative effectiveness studies, market-specific playbooks. None of it was obsolete, and almost none of it was findable.

They did not need an AI feature. They needed their own knowledge made usable again.

We did not write a discovery memo or propose a long-running plan. We asked one question instead: what does someone on your team do when they need to find something? The answer told us everything.

What was on fire

More than 10,000 documents scattered across SharePoint, internal drives, PDFs, PowerPoint decks, Excel files, and four specialized platforms. Finding the right document required knowing it existed. Understanding whether it applied to Netherlands, Vietnam, or Malaysia required reading the whole thing.

The regional problem was the quiet dangerous one. Documents from different markets lived side by side with no separation. A team in one market could pull insights that sounded credible but applied to the wrong place. A wrong answer that looks right is more dangerous than a missing answer, and decisions were being made on information that appeared valid but did not apply.

The business intelligence workflow was fragmented across four tools. Answering one business question meant logging into all four, exporting data, reconciling numbers, rewriting context. 15 to 20 hours lost per day across the team to searching and reconciling instead of thinking.

They had already tried to solve this internally. Their in-house AI project never reached production.

The hard part

The biggest risk was not that the system would fail. It was that it would appear to work while being quietly wrong.

We spent the first phase deliberately breaking things. Phase one was a discovery exercise, not a build, with the goal of finding every failure mode before optimizing anything.

Standard RAG broke in every direction. Too much retrieved context spiked costs and hit rate limits; too little and answers became incomplete or wrong. Similarity-based retrieval ignored regional boundaries and returned results that sounded credible but applied to the wrong market. Hallucinations appeared most often on broad, multi-document questions, exactly the questions that mattered most.

There was a formal review call midway through. A real decision gate. The client could have ended the engagement. Instead, it became the moment trust was built. We had not solved everything. We had proven the problem was tractable and shown exactly where the edges were. Their head of data science reviewed the approach and said he would have done it the same way.

That moment did not come from showing a polished demo. It came from being honest about what had broken and why, and showing we understood the problem better than anyone who had tried it before.

Most AI projects optimize for speed. We optimized for trust first.

Building the platform

We built a custom enterprise knowledge and business intelligence platform.

The core is a staged retrieval pipeline we built from scratch. Off-the-shelf patterns were not sufficient. We chose Weaviate over simpler vector stores because the business data had structured relational properties that similarity search alone could not capture cleanly.

Every question is classified by intent: factual, analytical, or comparative. Region and permission boundaries are enforced before any retrieval begins. Context is narrowed hierarchically before expanding, which keeps both relevance and cost manageable. Low-signal content is filtered before the model sees it. Multi-document synthesis is handled explicitly rather than left to the model. This achieved 99.1% accuracy across correctness, completeness, and hallucination reduction, including on broad questions spanning multiple documents and years of campaigns.

On top: a business intelligence agent that takes natural language questions, translates them into structured queries, runs aggregations, and generates charts automatically. A campaign debrief agent that turns retrospectives from hours of manual synthesis into structured review sessions.

The platform sits as an orchestration layer over four existing platforms. It detects intent, routes to the right system, merges results, and attributes sources. Nothing was replaced. Everything was connected.

Week 1-3: Feasibility phase. Deliberately broke every naive approach. Mapped the failure modes.
Week 4: Formal review call. Decision gate passed. Architecture validated.
Week 5-24: Custom retrieval pipeline, BI agent, debrief agent, integrations, security layers.
Week 25-31: Security hardening, cross-market testing, refinements across all region configurations.
Week 32: Team workshops. Training on when to trust each agent and when to push back.
Week 33: Platform handed over, ready for global rollout.

Stack: FastAPI, Python 3.11, PostgreSQL, SQLAlchemy, Weaviate, Supabase, OpenAI GPT-4 and O3, LangChain, Langfuse, NLTK, spaCy, Next.js 15, React 19, Material-UI, Zustand, TanStack React Query, Docker, Redis, Caddy.

What shipped

The platform is live and in daily use across the global digital team. More than 5,000 users. More than 10,000 documents ingested and queryable. Four platforms connected.

Before: four tools with manual exports, costing 15 to 20 hours per day across the team. After: one interface that takes natural language questions and returns results in seconds with full source attribution.

By week two of the build, the retrieval pipeline was returning accurate answers on test documents. By week five, the BI agent was handling real analytical queries. Every milestone hit on time across eight months, with weekly demos throughout. Nothing went dark between check-ins.

Retrospectives that required hours of manual synthesis are now structured review sessions. BI reports that meant jumping between four platforms are a single query.

The in-house project that could not reach production has been replaced by a system running globally.

"The model was never the problem. We spent eight months proving that the retrieval was."

In their words

"You blew them out of the water. Their head of data science said he would have done it the same. That's a big compliment."

Netherlands-based agency partner

What we'd change

We built regional access controls and security requirements in parallel with the retrieval architecture. We would start there now. Lock the security model in week one and build everything around it from the beginning rather than alongside it. At enterprise scale, security is not a layer you add. It is the foundation you design from.

The Enterprise AI Knowledge Platform Their Own Team Couldn't Ship