The internet is full of prompt tricks. Management researchers solved the real problem fifty years ago.
A new study from researchers at Stanford found that AI coding agents are inefficient: less than half of agent-produced code survives into final commits, and sessions where humans write very little of the code introduce more problems than those where humans write more. The paper’s conclusion, broadly absorbed in the developer community, is that heavy AI delegation is risky.
The finding has something real to it, but the framing has a flaw. It measures what the human produced, not how well they managed the agent producing everything else.
Think about the best manager you’ve ever worked with. How much did they personally write, build, or code? Probably very little. That was the point. A good manager’s job is to help their team produce a lot, very well. Their value shows up in everyone else’s output.
We’ve known this about human management for fifty years. We’ve somehow forgotten it when talking about AI.
The principal-agent problem is your problem
In 1976, economists Michael Jensen and William Meckling published “Theory of the Firm”, one of the most cited papers in economics. Their central insight: any time you delegate work to someone else, you create an agency problem. The agent doing the work has different information than you do, and they will optimize for the signals you give them. This isn’t a character flaw; it’s structural.
AI has the same problem. A model optimizes for the task as you’ve specified it. If your specification is vague, the model fills in the gaps with its best guess at what you probably meant, which may not match what you needed.
The fix is a real brief: one that specifies audience, argument, constraints, and what success looks like. That’s what you’d give a capable person, and what you should give the model.
What the Stanford study measured as “risky delegation” may be better understood as bad management: unclear briefs, no oversight, no feedback. The humans who got poor results may not have been using AI too much; they quite likely were managing it badly.
Calibrate oversight to the task
Paul Hersey and Ken Blanchard’s Situational Leadership framework made a simple but powerful argument: effective managers don’t apply the same level of direction and support to every situation. They adjust based on the task-specific competence and reliability of the person doing the work.
High confidence on a familiar task? Delegate and stay out of the way. Novel task, uncertain territory? Stay close, review, give direction. Most managers, and most AI users, apply a fixed level of oversight regardless of what’s being asked.
AI is good at certain things and unreliably good at others. The prompt-engineering instinct is to find the right magic words for any task. The management instinct is to ask what level of oversight this specific task requires, and then build that into how you work as a calibrated habit.
For anything a customer will see, any decision that matters, anything in unfamiliar territory: stay close. Review it the way you’d review work from a capable person who’s new to your business. For research, drafts, and well-scoped work in familiar domains, delegate more freely. The calibration matters more than the phrasing.
A brief is not a prompt
Edwin Locke and Gary Latham spent decades studying what makes goals work. Their Goal Setting Theory, one of the most replicated findings in organizational psychology, showed that specific, challenging goals consistently outperform vague or “do your best” goals. The mechanism: specificity reduces interpretation, which reduces variance in the output.
A typical AI interaction: “Summarize the key points from this document.”
A brief: “Summarize the key points from this document for our operations lead, who needs to decide whether to move forward with the vendor. She’s time-poor and skeptical. Three to five points maximum. Flag any red flags explicitly; don’t bury them. Skip the background on things she already knows.”
Same task. Completely different target and output. The second one is an actual brief: it specifies who it’s for, what they’ll do with it, what matters, and what to avoid. This was a simple example; the principle becomes critical when the agent needs to do something complex, with a specific methodology, or a defined output.
Herbert Simon’s concept of bounded rationality, from “Administrative Behavior” first published in 1947, explains why this matters: agents don’t search all possible solutions. They find a good enough answer given the information in front of them. The quality of the context you provide is the ceiling on what the model can produce.
Close the loop
J. Richard Hackman and Greg Oldham’s Job Characteristics Model identified feedback, specific and timely information about how the work is going, as one of the core drivers of sustained performance.
Most people skip this with AI. You get an output, you use it or you don’t, and you move on. The interaction doesn’t compound.
The people who get the most from AI work differently. They treat the first output as a draft. They give specific feedback: not “make it better” but “the opening buries the argument, the third section is the real point, the conclusion is too soft. Try again with those constraints.” They close the feedback loop with enough specificity that the next attempt is different from the first.
The habit underneath: treat AI interaction as a managed working relationship.
What you own
The management analogy has a limit. A good manager of a human team has people who bring their own judgment, initiative, and expertise to the work. Your AI doesn’t bring initiative; it brings capability you have to direct.
The things that belong to you are non-negotiable. You own the architecture of what you’re building. You own the decisions that matter. You own the standards the work has to meet. You own the acceptance criteria: whether this output is good enough, or whether it goes back for another pass.
That’s the job. A manager who abdicates those things is doing it wrong.
The BCG and McKinsey research on AI adoption keep pointing at the same thing. BCG found that 70% of the effort to scale AI use to value should be spent on people and process issues. McKinsey’s 2025 State of AI survey found that defined human-in-the-loop oversight was one of the clearest practices distinguishing AI high performers from the rest.
The bottleneck has never been the model. It’s always been management. We just haven’t been calling it that.
We can help
If you’ve read this and want to put these ideas into practice, we run workshops for executives and teams learning to manage AI well: building briefs, calibrating oversight, and closing feedback loops on real work. Book a 20-minute intro →
Josh is co-founder of Kynetyk, where he writes about AI, builds products at the intersection of AI and human experience, and helps companies design AI strategies that actually work. Reach out at josh@kynetyk.ai.