- February 23, 2026
- Posted by: EBAN Team
- Category: News, Uncategorized
By Guido Jouret, Sophia Business Angels, Member.
LLMs can reach the places traditional IT can’t
LLMs excel in taking in fuzzy inputs (documents, natural language, images, audio, and any other types of unstructured information). They can generate fuzzy outputs (other text, images, or audio). They’re increasingly good at producing structured output also: for example, writing longer and longer computer code. The great thing about code is that it’s verifiable: it either runs or it doesn’t. If we have known good outputs, we can even check that it runs correctly (at least in those use cases). But even in the case of fuzzy output, LLMs can perform very useful work: ask an LLM to ingest your goals and savings to produce a recommended investment portfolio and it’ll do at least as well as any financial advisor. There’s no ‘best’ portfolio: it depends on your risk tolerance, the future is unknown, and other investment choices can provide a similar return over time. The output is fuzzy–but still incredibly useful.
Right now, LLMs inform decisions. Companies want to move beyond this to where they actually do the work. Generative AI = task, Agentic AI = job. To turn the LLM into an AI agent, we need to enable them to remember, plan, and act.
The LLM is primed with an initial prompt that describes its role. This lists the actions it can perform, the methods it should use, and possibly examples of inputs & outputs. The agent is triggered by a task (e.g. inbound email, chat, or problem ticket). The agent solves the task taking into account the constraints and methods described in the role. The output of the agent is a set of decisions that translate into actions on systems outside the agent. These actions are implemented via API calls and can include routing the task to someone (or some other AI agent), editing/creating/deleting data, issuing payment, etc.
Using an LLM to plan your next ski vacation is very different than an AI agent that is assigned to solve millions of customer service issues. The superpower that LLMs have to understand fuzzy inputs has a dark side: it makes them stochastic (see my previous article: https://www.linkedin.com/posts/gjouret_much-is-made-about-ai-killing-off-enterprise-activity-7425919667874996224-f6JT). Producing a good agent runs into what I call the 80/20 rule of developing solutions using LLMs:
The first prompt gets you 80% there — the next 80% of effort gets you the last 20%
The “LLM solved this in one-shot” may be true for simple problems, but for anything complex, it’s an urban myth. The issues developers see include:
-Whack-a-mole: fixing one problem creates new ones in areas that were working.
-Context-rot: LLMs using more than 60% of the context window see sharp declines in performance.
-Leaky-bucket: longer sessions result in contexts getting compressed and reloaded. After compaction, the new context loses details of the prior interactions. It’s like carrying water in a leaky bucket.
-Limits of prompt refinement: better prompts can produce better output, but can never eliminate stochasticity entirely.
-LLMs are resource-hungry: solving complex problems consumes lots of tokens. Running lots of LLM instances makes it worse.
But here’s the real problem: building an AI agent with a general purpose LLM (only) is the wrong approach. The LLM is a jack-of-all-trades, not a specialist in the specific role you need it to play as an agent. It’s also using a 100% non-deterministic approach to solve problems that may have components that can be solved much more accurately and efficiently.
We need to apply Ashby’s Law of Requisite Variety:
A control system must match or exceed the range of states (variety) in the system or disturbances it regulates to maintain stability.
In other words, the AI agent should only be as fuzzy as required–anything more just adds to unpredictability or cost. An interesting example of this is a Claude Skill. Anthropic created this feature to enable the LLM to repeat tasks more readily. A skill is just a markdown file containing prose that follows a certain structure: purpose, workflow, input formats, how to perform certain actions, and examples. Sprinkled into the file, however, are snippets of Python code that can perform specific data-processing tasks. These are the deterministic (crunchy) bits in the stochastic (soft & gooey) cookie.
This then, is how we should design AI agents that are “just right.”
The “goldilocks” approach to building AI agents: 4 phases
1. Tune the prompt
Iterate until the role + task prompts give you the “best result” you can manage with the LLM. We’re training the LLM to solve the problem by applying neural networks to the entire problem:
2. Prompt reification
This involves doing what Claude does in building skills: making some fuzzy parts of the solution concrete: applying the LLM to the fuzzy inputs but transforming data and performing calculations with code:
The code can now be moved outside the LLM and accessed via Model Context Protocol (MCP): the LLM will invoke these components via an API. We can maintain and evolve these external code bits independently of the rest of the AI agent.
3. LLM specialization with reinforcement learning
We’ve now created a partially deterministic agent. Those parts will run reliably and efficiently. If our starting point was a general-purpose LLM, we should now swap that out for an open-source LLM that we can specialize to the role we want the agent to play. Reinforcement-learning (RL) can turn our mule into a racehorse. According to the company Adaptive ML, such specialized LLMs can cost 50-90% less than generalist LLMs. Even better, they become more accurate (on the chosen tasks):
4. Add domain-expertise to the LLM via RL
Now that we have a lean and mean LLM without any extra bits we don’t need, we can use RL to further train the model by teaching it on proprietary data. This makes the AI agent even more effective. More important, we now have created a defensible “moat” of competitive differentiation. When you use a generalist open-source or commercial LLM, anyone else can duplicate your agent. By the time you’ve used RL to specialize and further train your agent, your LLM is no longer the same: it has unique IP that conveys a competitive advantage. I believe that this, much more than lower cost, is why open-source LLMs will ultimately win in the enterprise.
Conclusions
Gartner predicts 40% of enterprise apps will feature AI agents by 2026. Initially, agents will most likely require “humans in the loop” to validate decisions. Then we’ll move to “humans on the loop” where they inform humans of the actions they’ve taken. Building these agents and giving them more autonomy will require more than just dumping a generalist LLM into an agent harness. By following the 4 steps outlined here, AI agents can be made more reliably and at lower cost.
About the Contributor
Dr. Guido Jouret is a technology executive with over 30 years of leadership experience across software, networking, cloud, and hardware sectors. A member of Sophia Business Angels and advisor to Silicon Valley startups, he serves on the boards of US Signal and Farcast and previously sat on the board of Poly, which was acquired by HP in 2022.
He most recently worked as Chief Development Officer at Plume and has held senior roles including Chief Digital Officer at ABB, CTO at Nokia Technologies—where he led the launch of its digital healthcare division—and President of Digital Platforms at Envision Energy. Earlier, he spent two decades at Cisco, helping build new businesses such as TelePresence and leading the Internet of Things Business Unit. He holds an Electrical Engineering degree from Worcester Polytechnic Institute and a PhD in Computing from Imperial College London.

