22 min read
Edward Chenard
Original Framework
Original Framework Agentic AI 22 min read

The Agent Yield Framework

Why "Digital Labor" Is the Wrong Way to Measure AI Agents

Klarna saved $60M replacing 700 people with AI agents. Then customer satisfaction tanked, the CEO admitted they went too far, and they started rehiring humans. The $60M in "savings" cost them something they couldn't measure on a spreadsheet. This is what happens when you measure agents as labor instead of leverage.

Edward Chenard
Edward Chenard
CAIO • CDO • VP Product • Deployed AI Agents in Production Since 2023

I first saw AI agents described in a research paper in the spring of 2023. Most people I talked to thought they were a curiosity — interesting technically, maybe useful in a few years. I saw something different. I saw a tool that could fundamentally change how fast organizations get to revenue.

Since then, I've deployed agents in production. At Hire Humans, for advisory clients, across internal and external-facing workflows. I started with CrewAI back in 2024 when the tooling was still rough and the frameworks barely worked. Today, with agentic IDEs like Antigravity, building agents is dramatically easier. But the technology was never the hard part.

The hard part is that most companies are measuring agent economics completely wrong.

They're asking: "How many people can this agent replace?"

They should be asking: "How much faster does this agent get us to revenue?"

That single question — whether you measure agents as labor or as yield — is the difference between Klarna's expensive reversal and my client's 40% improvement in speed-to-decision. It's the difference between a cost-cutting exercise and a revenue strategy.

Let me show you the math.

The Digital Labor Trap

Salesforce calls AI agents "digital labor." Their entire Agentforce positioning is built around it — digital workers that can be hired, deployed, and measured by the task. Klarna measured it in "FTE equivalents." The consulting firms and analyst decks measure it in "headcount savings." The entire agentic AI industry has converged on a single metric: how many humans does this replace?

This framing is wrong. And it leads to predictable failure.

Here's why. AI thinks in terms of math, not language like we humans do. There is a real cost for that mathematical thinking, and it is often accuracy — because language is not exact. When a customer says "this isn't working right," that sentence carries frustration, context, history, expectation, and sometimes a complaint that has nothing to do with the product. An AI agent does math on those words. A human does empathy. These are not the same thing.

Replacing people with agents is not a good idea. Using agents to help people work faster and smarter — that makes a lot more sense. Agents help companies get to revenue generation faster. They can help cut costs too, but not human costs. Those are still critical, because human as the loop — not just in the loop — will become more important as we understand AI going forward.

Let me show you how the "digital labor" framing has already failed, at scale, with real money.

Klarna: The $60 Million Lesson

In 2024, Klarna rolled out an AI chatbot built on OpenAI and replaced approximately 700 customer service agents. The numbers were impressive on paper. The AI handled two-thirds of all customer inquiries. Response times improved 82%. Repeat issues dropped 25%. CEO Sebastian Siemiatkowski said the AI was doing the equivalent work of 700 full-time agents. By Q3 2025, the company claimed $60 million in savings.

Then reality showed up.

Customer satisfaction dropped. Complaints about generic, repetitive, insufficiently nuanced responses piled up. The AI couldn't handle anything that required empathy, judgment, or context beyond the FAQ. Siemiatkowski himself eventually admitted: "We focused too much on efficiency and cost. The result was lower quality, and that's not sustainable."

By mid-2025, Klarna was rehiring human agents — what Siemiatkowski called an "Uber-type" customer service workforce. The company that was the poster child for AI replacing humans became the poster child for why that doesn't work.

But here's the number nobody talks about.

THE KLARNA NUMBERS NOBODY DISCUSSES
$0.32
Cost per transaction
Q1 2023
$0.19
Cost per transaction
Q1 2025
↑ 19%
Total CS costs rose
year-over-year anyway

Cost per transaction dropped 40%. But total costs rose because volume grew faster than efficiency gains. Q3 2025: $50M in CS costs, up from $42M the year before. The "$60M in savings" was theoretical. The cost increase was real.

The lesson isn't "AI agents don't work." They do. I deploy them. The lesson is that Klarna measured agents like labor — bodies in, bodies out — when they should have measured agents like business tools. They looked at FTE equivalence. They should have been looking at yield.

Salesforce: Three Pricing Models in 18 Months

Klarna isn't the only one struggling with agent economics. The entire industry can't figure out what agents should cost — because they keep pricing them like labor.

Salesforce launched Agentforce in October 2024 with a simple model: $2 per conversation. Within months, enterprise customers revolted. A support team of five agents handling 70 conversations each per day would spend over $20,000 a month. For mid-sized companies, that was a non-starter.

By May 2025, Salesforce completely overhauled the model to "Flex Credits" — $0.10 per action, 20 credits per action, sold in packs of 100,000 for $500. Then they added per-user licensing at $125-$650/month. Then hybrid "Flex Agreements" that let you swap between the two. The pricing page now reads like a tax code.

This isn't a Salesforce problem. It's a structural problem with the "digital labor" framing. If you price agents as labor, you need a stable cost-per-unit — like a salary. But AI agents don't have stable costs. An agent resolving a password reset costs pennies. An agent analyzing a complex customer dispute might require dozens of LLM calls, multiple database lookups, and several rounds of reasoning. The cost variance between those two tasks can be 100x.

You can't budget for agents the way you budget for headcount. So stop trying.

What Agents Actually Are

I've been deploying agents since 2023. In that time, not once has the value come from replacing a human. Every time — every single time — the value came from making a human faster, smarter, or more capable than they were without the agent.

That's not a feel-good statement. It's a description of where the yield actually lives.

An agent is a tool. A powerful one, but a tool. A hammer isn't a "digital carpenter." A calculator isn't a "digital accountant." An AI agent isn't a "digital employee." It's software that can reason, plan, and execute sequences of tasks — which makes it extraordinarily useful as an assistant to people who know what they're doing. But it doesn't know what it's doing. Not in the way a person does.

I've seen this firsthand. At Hire Humans, our agents were incredible at research — scanning markets, building candidate pools, analyzing job fit. But the insight that changed a client's entire hiring strategy? That came from a human looking at what the agent surfaced and understanding what it meant. The agent found the data. The human found the strategy.

Agents help companies get to revenue generation faster. They can help cut costs too, but not human costs. Those are still critical. Human as the loop — not just in the loop — will become more important as we understand AI going forward.

So if agents aren't labor, what's the right way to measure them?

The Agent Yield Framework

Here's the shift. Instead of asking "how many FTEs did we eliminate," ask: "What is the yield on our agent investment?"

Yield is a financial concept everyone in business already understands. Bond yield. Dividend yield. Crop yield. It's the return you get on something you invest in. It's forward-looking, not backward-counting.

Agent Yield = Revenue Acceleration ÷ Total Agent Spend
Where Revenue Acceleration = the measurable improvement in speed-to-revenue, speed-to-decision, or speed-to-market attributable to the agent.
Where Total Agent Spend = development + inference + monitoring + governance + human oversight (15-25% of direct AI spend).

When you measure agents by yield instead of FTE replacement, you get a completely different picture. You see agents as business investments with returns, not cost-cutting tools with savings. And that changes every decision you make — which agents to build, where to deploy them, and how much to spend.

Through deploying agents across multiple contexts, I've found they deliver yield in three distinct ways. Each one has different cost structures, different yield profiles, and different implications for your team.

The Three Agent Archetypes

🔍
TYPE 1

Intelligence Agents

What they do: Analyze, synthesize, and surface patterns. They take large volumes of information and extract what matters.

Where the yield lives: Faster decisions. When your team gets actionable intelligence today instead of next week, the revenue impact compounds. Every day of delayed decision-making is a day of lost revenue.

FROM MY DEPLOYMENT

For one client, I deployed an agent system to analyze their business reports. The agents didn't write the reports — the analysts still did that. The agents accelerated the analysis phase, surfacing patterns and anomalies that humans could act on immediately. Speed-to-decision improved by 40%. Same team, same analysts, better outcomes in less time. The agent yield was clear within the first month.

TYPE 2

Acceleration Agents

What they do: Prototype, build, and mock up. They compress weeks of development into hours by helping people create working versions of ideas.

Where the yield lives: Time-to-market. When your PM can walk into a stakeholder meeting with a working demo instead of a slide deck, the sales cycle collapses. Decks don't close deals. Demos do.

FROM MY DEPLOYMENT

At a client engagement, product managers used agents to build proof-of-concept demos. Previously, they'd write specs, wait for engineering bandwidth, and get a working demo in 6-8 weeks. With agents, the PM built the demo themselves in hours. Engineering wasn't eliminated — they still built the production version. But the PM could now test ideas and validate demand before engineering ever started. The go-to-market acceleration was massive.

🧭
TYPE 3

Discovery Agents

What they do: Research, explore, find connections humans would miss. They cover ground — markets, datasets, competitive landscapes — at a speed and breadth no team of humans can match.

Where the yield lives: Strategic decisions. When an agent reveals something your team didn't know — a market gap, a constraint, a hidden pattern — the yield is the value of the strategy that changes because of it.

FROM MY DEPLOYMENT

At Hire Humans, a client wanted to hire for a specialized role in a rural location. Our agents analyzed the entire market and determined that only 40 people in the country had the required skills, and none of them were actively looking. That insight — delivered in hours instead of weeks of recruiter research — changed the client's entire approach. They restructured their offer package to attract passive candidates. No human was replaced. The humans made a better decision because the agent gave them information they didn't have.

Notice the pattern across all three archetypes. In every case, the yield came from the human acting on what the agent produced. The agent found the patterns. The human found the strategy. The agent built the demo. The human closed the deal. The agent mapped the market. The human changed the approach.

Agent yield is generated at the junction of machine speed and human judgment. Remove either one, and the yield collapses.

The Four Metrics That Actually Matter

If you're measuring agent ROI by headcount reduction, you're running a cost-cutting exercise, not a revenue strategy. Here are the four metrics that belong in an Agent Yield assessment:

1

Revenue Acceleration Rate

How much faster do we reach revenue with this agent versus without?

A PM building POCs in hours instead of 6-8 weeks isn't a cost savings — it's a 40-60x acceleration to market. The revenue implication of getting to market months earlier dwarfs any headcount savings. Measure the time delta, then calculate the revenue impact of that delta.

2

Decision Velocity

How much faster are we making good decisions?

40% faster speed-to-decision doesn't just save time — it compounds. The organization that decides in 2 days what a competitor decides in 5 doesn't just win once. It wins every cycle, on every decision, across every team that's agent-assisted. Decision velocity is a multiplier, not an additive.

3

Cost Per Agent-Assisted Outcome

What's the total cost of the outcome the agent helped produce?

Not "cost per inference" or "cost per conversation." The full cost of the outcome: development, inference, monitoring, governance, and human review. An agent-assisted outcome that costs $500 but prevents a $50,000 mistake, or accelerates a $500,000 deal by two weeks, has excellent yield. Measure the outcome, not the token.

4

Human Leverage Ratio

How much more can each person accomplish with agent support?

Not "how many humans did we eliminate" — "how much more output does each human generate?" A team of 5 with agent assistance producing the output of 15 is a 3x leverage ratio. That's a revenue multiplier, not a cost cutter. The humans are still there. They're just doing work that matters more.

Compare these to the standard "digital labor" metrics — headcount equivalent, FTE replacement, cost per automated conversation — and the difference is clear. The standard metrics measure what you subtracted. The yield metrics measure what you generated.

Digital Labor vs. Agent Yield: Side by Side

Dimension The "Digital Labor" Model The Agent Yield Model
Core question "How many people can we replace?" "How much faster can we get to revenue?"
Primary metric Headcount reduction / FTE equivalence Revenue acceleration rate / Decision velocity
Agent role Employee replacement Human leverage tool
ROI calculation Salary savings minus agent costs Revenue generated minus total agent spend (including humans)
Human oversight Overhead that reduces savings Cost-of-accuracy investment that protects yield
Pricing model fit Per-conversation / per-FTE (keeps breaking) Per-outcome / per-yield (aligns cost to value)
What it optimizes The cost line The P&L
Case study Klarna: $60M "saved," then rehired 40% faster decisions, hours-not-weeks POCs, better strategic hires

The Three Mistakes That Kill Agent Yield

I see the same three mistakes in nearly every agent deployment that fails to produce yield. Each one stems from the "digital labor" mindset.

1

Measuring Savings Instead of Yield

This is the Klarna mistake. You deploy agents, measure how many human hours they "save," claim victory on a cost line, and miss the fact that your total costs went up because volume grew. Or worse — quality drops, customers leave, and the "savings" evaporate. Cost savings are not yield. Yield is what the agent helps you generate that you couldn't generate before, or couldn't generate as fast.

The fix: For every agent deployment, define the yield metric before you write the first line of code. "This agent will reduce speed-to-decision from 5 days to 3 days on market reports" is a yield statement. "This agent will replace 2 analysts" is a cost statement. Build the first one.

2

Skipping Human Oversight Architecture

AI thinks in math. Business runs on language. And language is not exact — which is the majority of the time in any business context. The gap between mathematical precision and linguistic ambiguity is where agent failures live. Klarna's AI gave "generic, repetitive, and insufficiently nuanced replies" because the agent was doing math on language problems and nobody was checking the output. Gartner estimates that enterprises should budget 15-25% of direct AI expenditure for human oversight. That's not overhead — that's the cost of accuracy.

The fix: Design human oversight into the system from day one. Budget for it as a line item, not an afterthought. Human as the loop. If your agent deployment doesn't include a line item for human review, you haven't budgeted — you've guessed.

3

Deploying to Replace Instead of to Accelerate

"Let's replace the team that does X" sounds efficient. It's not. It's a bet that the agent can handle 100% of what the humans handle — including the edge cases, the judgment calls, the situations nobody anticipated. That bet almost always loses. The better bet is: "Let's make the team that does X three times faster." Now you're measuring leverage, not replacement. You're building yield, not cutting costs.

The fix: Reframe every agent project from "who does this replace?" to "who does this make better?" The yield comes from the human using the tool, not from the tool replacing the human.

Why Every Pricing Model Keeps Breaking

This brings us back to the pricing chaos. Salesforce revamped Agentforce pricing three times in 18 months because they keep pricing agents using labor economics when they should be using tool economics.

Per-conversation pricing ($2/chat) failed because conversations aren't units of work — some take 30 seconds, some take 30 minutes of compute. Per-action pricing ($0.10/action) is better but still measures inputs, not outcomes. Per-user licensing ($125-$650/month) brings predictability but disconnects cost from value entirely.

The Agent Yield framework suggests a different approach: price agents by the outcome they help produce, not by the labor they simulate.

This is already where the market is heading. Usage-based pricing hit 61% adoption in SaaS by 2022. Outcome-based pricing is expected to reach 30% adoption by 2025-2026. The companies that get this right early — aligning agent cost to agent yield — will have a significant competitive advantage in how they invest, scale, and measure their AI deployments.

The ones still measuring FTE equivalents will be making Klarna's mistake at scale.

The Agent Yield Diagnostic

Before you deploy your next agent — or evaluate the ones you've already deployed — score them against these five questions. This is how you know if you're building yield or just cutting costs.

AGENT YIELD DIAGNOSTIC — SCORE EACH DEPLOYMENT
1.
Can you name the revenue this agent accelerates?
Not "savings" — revenue. Faster deals, faster launches, faster decisions that lead to revenue. If the only number you can point to is a cost reduction, you don't have yield. You have a budget cut.
2.
Does your cost calculation include human oversight?
Development + inference + monitoring + governance + 15-25% for human review. If you're not budgeting for humans in the loop, you're not budgeting. You're pretending.
3.
What is the Human Leverage Ratio?
How much more can each person accomplish with this agent? If the answer is "the same, but with fewer people," you're optimizing for the wrong thing. If the answer is "3x more output per person," that's yield.
4.
What happens when the agent is wrong?
Every agent will be wrong sometimes. What's the cost of an error? Who catches it? How fast? If you haven't designed for failure, you haven't designed. The cost of uncaught agent errors belongs on the cost side of your yield equation.
5.
Which archetype is this agent?
Intelligence (faster analysis), Acceleration (faster building), or Discovery (better strategic insight)? Each archetype has different yield profiles and different cost structures. If you can't classify it, you probably can't measure it.

Scoring: If you answered "yes" or had a clear answer for 4-5 questions — you're measuring yield. If you answered "yes" for 2-3 — you're in the gray zone, probably still measuring labor metrics dressed up as yield. If you answered 0-1 — you're running a science project with a budget. Stop and reframe before spending another dollar.

How This Connects

The Agent Yield Framework doesn't exist in isolation. It's the measurement layer that sits between two of my other frameworks:

The Velocity Gap explains why organizations are slow — the 8 friction defaults that block AI-native work. The Agent Yield Framework gives you a tool to measure whether your agent deployments are actually making you faster. And the Profit Center Framework shows how to position the whole operation — humans and agents together — as a revenue engine rather than a cost center.

If your Velocity Gap score is high (lots of friction), don't expect agents to fix it. They'll inherit the friction. Fix the org first, then deploy agents into an environment where yield can actually compound.

The Bottom Line

The agentic AI market is going to $200 billion. Gartner says 40% of enterprise applications will include agents by end of 2026. Salesforce already has 9,500+ paid Agentforce deals. This isn't a future trend. The money is being spent now.

A lot of it is going to be wasted. Not because agents don't work, but because companies are measuring the wrong thing. "How many people can we replace?" leads to Klarna — theoretical savings, real quality problems, and an expensive reversal.

"What's our agent yield?" leads to what I've seen work in practice: 40% faster decisions, POCs in hours instead of months, and hiring insights that change entire strategies.

"If the only number you can point to is a cost reduction, you don't have yield. You have a budget cut with a chatbot."

Agents are tools. Powerful ones. But the yield comes from the human using the tool — not from the tool replacing the human. Measure accordingly.

THE AGENT YIELD FRAMEWORK — SUMMARY

1. "Digital labor" is the wrong framing. Agents are tools that accelerate human output, not replacements for human workers. Klarna proved this at $60M scale.

2. Measure yield, not savings. Agent Yield = Revenue Acceleration ÷ Total Agent Spend. If you can only point to cost reductions, you don't have yield.

3. Three archetypes generate yield differently. Intelligence Agents accelerate decisions. Acceleration Agents compress time-to-market. Discovery Agents improve strategic choices. Know which you're building.

4. Four metrics matter. Revenue Acceleration Rate, Decision Velocity, Cost Per Agent-Assisted Outcome, and Human Leverage Ratio. Not headcount reduction.

5. Human oversight is not overhead. It's a cost-of-accuracy investment (15-25% of direct AI spend). Human as the loop, not just in the loop.

6. The question is not "who does this replace?" It's "who does this make better?" The yield comes from the human using the tool, not from the tool replacing the human.

📘 GO DEEPER — IMPLEMENTATION GUIDE

The AI Agent P&L Guide — $49

This article gives you the Agent Yield formula. The guide gives you the full financial model to get agent investments funded.

Complete Agent P&L statement template with revenue lines and cost layers
Five TCO layers most companies underestimate
Pilot-to-production model with decision gates
4 worksheets including the Agent Yield Calculator
Get the Guide → Instant download • Worksheets included
Share this:

Get frameworks like this in your inbox

AI strategy insights for leaders who measure in revenue, not hype. No fluff — just the hard-won lessons from 20 years of enterprise AI.

Edward Chenard
Edward Chenard
AI Revenue Strategist

I spent 20 years building AI and data products at Best Buy, Target, C.H. Robinson, and Olo. I've launched 100+ products, built teams from 2 to 300+, and contributed to over $2.5B in AI-driven revenue — including the data architecture for Olo's $3.6B IPO. I've deployed AI agents in production since 2023. Now I publish the frameworks so other leaders can skip the expensive mistakes.