Sanity Bytes: The CTO's AI Playbook – Part 1: You Can't Run a Formula 1 Car on a Country Lane

You can't run a Formula 1 car on a country lane. But that's essentially what a lot of organizations are trying to do with AI right now. The model is state-of-the-art. The ambition is genuine. The vendor demo was spectacular. And the network it all has to run on was built a decade ago, for a completely different world.

I keep coming back to a finding from Colt's 2024 Digital Infrastructure Report: “66% of CTOs openly admit their current network infrastructure cannot fully support GenAI workloads.” Two thirds. Not a fringe group, the majority. And yet the pressure to deploy AI is only accelerating.

There's a word for what happens when you push something beyond the limits it was designed for. It breaks. Sometimes slowly. Sometimes expensively. Almost always at the worst possible moment.

The same report found that 76% of CTOs believe that rushing GenAI integration will cause serious long-term damage to their infrastructure planning. They know the risk. They can see it coming. And many of them are being asked to press ahead anyway.

This is the paradox sitting at the heart of most enterprise AI strategies right now. The urgency to move fast is completely real: competitive pressure, board expectations, the fear of falling behind. But the infrastructure these organizations are building on wasn't designed for what AI actually demands.

So what does AI actually demand? Let me be specific, because I think this is often talked about in vague terms.

Compute

Running large language models even inference only, not training requires GPU resources at a scale most enterprise IT teams have never had to provision before. Cloud GPU availability has been constrained, costs have risen sharply, and the gap between what an AI team needs and what the infrastructure team can actually deliver is a genuine operational headache.

Network bandwidth

Real-time AI applications are brutally sensitive to latency. If your users are distributed across offices, regions or hybrid environments, and your wide-area network wasn't built for this kind of traffic profile, you'll feel it. The AI works in the lab. It stutters in production.

Storage architecture

Traditional storage solutions were built for structured data rows, columns, databases. AI consumes unstructured data: documents, conversations, emails, PDFs, recordings. Vector databases, object storage at scale, real-time retrieval systems, these all require architectural decisions that most organizations are making reactively, mid-project, under pressure. That's never when you want to be making infrastructure decisions.

There's also a cost dimension that I think gets underestimated in a lot of boardroom conversations. Gartner estimates that enterprise GenAI deployments typically cost between $5 million and $20 million to implement properly. Those numbers assume the infrastructure is ready. If it isn't, you're adding remediation costs on top of deployment costs and that bill usually arrives after the initial budget has already been approved.

I worked with a financial services firm last year whose AI program had been running for eighteen months. They'd invested heavily in licenses, models and a talented team. What they hadn't invested in was the network capacity to run inference at scale across their distributed branch infrastructure. By the time they discovered the gap, they had a production-ready model and no reliable way to serve it to the people who needed it. Another nine months and several hundred thousand pounds in network remediation later, they got there. The CTO described it as "paying twice for the same journey because we forgot to check the fuel before we left."

Here's the counter-intuitive bit: the organizations that invest most heavily in AI tools are often the ones who discover infrastructure gaps latest because they've been so focused on the capability layer that the plumbing only becomes visible when it fails under load.

The CTOs I most respect in this space share one habit: they run an infrastructure readiness assessment before committing to use cases, not during them. It sounds obvious. It happens far less often than it should.

They also tend to treat cloud strategy and AI strategy as the same conversation choosing the right blend of public cloud (for GPU access and scale), private cloud (for sensitive data), and edge (for latency-sensitive workloads) as a deliberate AI decision, not an infrastructure afterthought.

And critically, they build for the scale they intend to reach, not the pilot they're starting with. Provisioning for a proof of concept and then being shocked when it can't support production is one of the most expensive mistakes in enterprise AI. It's also one of the most common.

If there's one message, I'd want to land with any board considering AI investment, it is “this is not a software purchase. It is an infrastructure transformation”. The organizations treating it that way are making steady, sustainable progress.

So here's a practical challenge: before you approve the next AI use case or sign the next model license, ask your team one question: "does our current infrastructure actually support this at the scale we intend to reach?" If the answer isn't a confident yes, that conversation needs to happen before the budget is committed, not after.

Next in the series: Part 2 − Your data is a mess. And AI will make it worse.

Sanity Bytes

Thursday, June 4, 2026

The CTO's AI Playbook – Part 1: You Can't Run a Formula 1 Car on a Country Lane

No comments:

Post a Comment

Blog Archive