Procurement is Guarding the Wrong 10%
Eighty to ninety percent of enterprise data is unstructured. Procurement maintains an obsession with the other ten.
Sit through enough procurement reviews and you hear the same article of faith. Get the data clean first. Normalise the spend cube, reconcile the supplier master, scrub the categories, and then we can do something intelligent with it. Garbage in, garbage out. Everyone nods. Eighteen months later the project is still running and the insight never showed up. What’s troubling about this is that the maths points the other way, and has done for a while.
Gartner has said for years that 80 to 90% of new enterprise data is unstructured. IDC’s Data Age research has tracked the same shift, leaving the structured tables we’ve built our entire analytical apparatus around as the shrinking exception, not the rule. So procurement pours its energy into perfecting that 10%: the PO lines, the contract headers, the scorecard fields. The other 90% gets treated as exhaust. The emails and the call notes. The supplier comms where a deal quietly drifts from what was signed. That isn’t noise. It’s where the relationship actually lives, and the relationship is what moves the margin.
McKinsey have been pointing at this for a while too. Operational data is the next real play for AI, not the cleaned-up reporting layer we already know how to query. Which is completely logical when you remember what these models are. LLMs are trained on pattern recognition over messy, contextual, unstructured text. The 90% we’ve been ignoring is the exact data type they’re built to read.
So why do we keep guarding the 10%?
The instinct wasn’t stupid. It’s just out of date.
Let’s be fair to the clean-data crowd, because the instinct came from somewhere real. For thirty years the tools genuinely couldn’t cope with anything else. Rules engines, BI, SQL: every one of them needs normalised rows or it falls over. “Garbage in, garbage out” was a mechanical fact, not a philosophy. So we learned to flatten everything into columns before we’d trust it, because columns were the only thing the machine could read.
That’s the bit worth sitting with. We never cleaned data because clean data was the goal. We did it because the only tools we had couldn’t read anything that hadn’t been flattened first. Large language models tear that constraint up. They’re pattern extractors trained on precisely the contextual mess procurement has spent a career trying to throw away. Unstructured text isn’t a problem to be scrubbed before the real work starts. It is the real work. It’s the native input.
We never cleaned data because clean data was the goal. We did it because the only tools we had couldn’t read anything that hadn’t been flattened into columns first.
So why are we still flattening?
It’s self-protection, not ignorance
The easy answer is that procurement hasn’t caught up. Not quite buying it, because it lets us off the hook and the real reason is more interesting.
Procurement is a governance function carrying a long-standing credibility problem at the top table. A function unsure of its standing reaches for the defensible number, not the true one. A signed PO line survives an audit and a CFO challenge. “The model read margin erosion in the tone and cadence of the supplier correspondence” does not, even when it’s the sharper call. We hold onto the clean 10% because it’s the bit we can defend in a boardroom.
There’s a more basic problem underneath that. Most of the 90% doesn’t even sit in our systems. It’s in the business, in operations, in supplier-facing inboxes procurement has never controlled. Hard to be precious about data you don’t own.
One caveat, because the argument doesn’t need overselling. Unstructured doesn’t mean low quality. A model pulls real signal from material that’s messy but coherent, and it will also hallucinate with total confidence when the input is genuinely thin or contradictory. So the data-quality instinct isn’t pure cargo cult. The failure mode just moved. It used to be “flatten it into columns.” Now it’s “make sure the whole picture is actually in there.” The worry hasn’t changed. The shape of it has.
This is the same mistake throttling AI’s EBIT case
Look at McKinsey’s State of AI 2025. Nearly 2,000 organisations. 88% now use AI in at least one function, and only 39% report any enterprise-wide EBIT impact at all, most of them under 5%. Somewhere around 6% are seeing real value at scale. Boards are right to keep asking where the money went.
The reflex explanation is dirty data. We think that’s the wrong diagnosis. Most enterprises aimed AI at the clean, structured, low-value 10%, automating tidy workflows nobody was bleeding on, and left the messy 90% where the margin actually erodes untouched. In McKinsey’s own read, the single thing most strongly linked to EBIT impact is fundamental workflow redesign. Put plainly: the firms that saw the money went where the mess was.
So “AI doesn’t move the bottom line” and “we have to clean the data first” turn out to be the same mistake seen from two sides. One of them sounds like caution. It quietly produces the other.
We’ve seen this film before
We named the post-signature value gap Dynamic Margin Erosion. The idea took years to land, because there was no language for value bleeding out of a deal after the ink dried. Everyone could see the symptom. Nobody had named the mechanism. It’s going mainstream now under a handful of other labels, which is fine by us. The idea was always the point.
This is the same shape, with one difference worth naming. DME’s lag was conceptual; we were missing the words. The clean-data lag is technology arriving faster than procurement’s willingness to trust it, tangled up with institutional self-interest. So it’ll move quickly, but procurement will adopt it last, not first. The early movers won’t be the category managers waiting for permission to trust an inference. They’ll be the P&L owners who never needed permission in the first place.
That’s the uncomfortable part. The 90% has been sitting there the whole time, full of the context that tells you exactly why a supplier relationship is about to cost you. We couldn’t read it, so we called it mess and looked the other way. The thing that can read it is already here.
The only real question is whether procurement picks it up before the people it answers to do.
We know which way we’d bet.
Sources
Gartner, on unstructured data (80–90% of new enterprise data)
https://benelux.nttdata.com/insights/blog/what-is-unstructured-data
IDC / Seagate, Data Age 2025
McKinsey, The State of AI 2025
https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai