I’ve spent my profession swimming in information — as former Chief Information Officer at Kaiser Permanente, UnitedHealthcare, and Optum — and at one level, I had oversight of almost 70% all of America’s healthcare claims. So once I inform you the issue with enterprise AI isn’t the mannequin structure however the information that fashions are being fed, imagine me: I’ve seen it firsthand.
LLMs are already peaking
The cracks are already displaying in LLMs. Take GPT-5. Its launch was plagued with complaints: it failed fundamental math, missed context that earlier variations dealt with with ease, and left paying prospects calling it “bland” and “generic.” OpenAI even needed to restore an older mannequin after customers rejected its colder, checklist-driven tone. After two years of delays, many began asking if OpenAI had misplaced its edge — or if your entire LLM method was merely hitting a wall.
Meta’s LLaMA 4 tells the same story. In long-context checks — the form of work enterprises really need — Maverick confirmed no enchancment over LLaMA 3, and Scout carried out “downright atrociously.” Meta claimed these fashions may deal with thousands and thousands of tokens; in actuality, they struggled with simply 128,000. In the meantime, Google’s Gemini sailed previous 90% accuracy on the identical scale.
The info downside nobody desires to confess
As an alternative of confronting the boundaries we’re already seeing with LLMs, the business retains scaling up — pouring extra compute and electricity into these fashions. And but, regardless of all that energy, the outcomes aren’t getting any smarter.
The reason being easy: the web information these fashions are constructed on has already been scraped, cleaned, and retrained again and again to dying. That’s why new releases really feel flat — there’s little new to study. Each cycle simply recycles the identical patterns again into the mannequin. They’ve already eaten the web. Now they’re ravenous on themselves.
In the meantime, the actual gold mine of intelligence — personal enterprise information — sits locked away. LLMs aren’t failing for lack of information — they’re failing as a result of they don’t use the proper information. Take into consideration what’s wanted in healthcare: claims, medical data, medical notes, billing, invoices, prior authorization requests, name middle transcripts — the knowledge that really displays how companies and industries are run.
Till fashions can prepare on that form of information, they’ll at all times run out of gasoline. You’ll be able to stack parameters, add GPUs, and pour electrical energy into greater and greater fashions, however it received’t make them smarter.
Small language fashions are the long run
The way in which ahead isn’t greater fashions. It’s smaller, smarter ones. Small Language Fashions (SLMs) are designed to do what LLMs can’t: study from enterprise information and deal with particular issues.
Right here’s why they work.
First, they’re environment friendly. SLMs have fewer parameters, which implies decrease compute prices and sooner response instances. You don’t want a knowledge middle stuffed with GPUs simply to get them operating.
Second, they’re domain-specific. As an alternative of attempting to reply each query on the web, they’re skilled to do one factor properly — like HCC danger coding, prior authorizations, or medical coding. That’s why they ship accuracy in locations the place generic LLMs stumble.
Third, they match enterprise workflows. They don’t sit on the skin as a shiny demo. They combine with the information that really drives what you are promoting —billing information invoices, claims, medical notes — and so they do it with governance and compliance in thoughts.
The longer term isn’t greater — it’s smaller
I’ve seen this film earlier than: large investments, limitless hype, after which the conclusion that scale alone doesn’t remedy the issue.
The way in which ahead is to repair the information downside and construct smaller, smarter fashions that study from the knowledge enterprises already personal. That’s the way you make AI helpful — not by chasing dimension for its personal sake. And I’m not the one one saying it. Even NVIDIA’s personal researchers now say the future of agentic AI belongs to small language models.
The business can hold throwing GPUs at ever-larger fashions, or it may possibly construct higher ones that really work. The selection is apparent.
Picture: J Studios, Getty Pictures
Fawad Butt is the co-founder and CEO of Penguin Ai. He beforehand served because the Chief Information Officer at Kaiser Permanente, UnitedHealthcare Group, and Optum, main the business’s largest staff of information and analytics specialists and managing a multi-hundred-million greenback P&L.
This publish seems by way of the MedCity Influencers program. Anybody can publish their perspective on enterprise and innovation in healthcare on MedCity Information by way of MedCity Influencers. Click here to find out how.


