Insights

Goldman Found No Economy Wide AI Productivity Gain. Two Use Cases Hit 30%.

AI productivity at the economy level is flat, yet Goldman found a 30% gain in two narrow use cases. What that means for ticket-heavy operations.

AUTHOR

Ralf Klein

Goldman Found No Economy Wide AI Productivity Gain. Two Use Cases Hit 30%.

Goldman Sachs went looking for artificial intelligence in the productivity numbers and could not find it. After analyzing fourth quarter 2025 earnings calls across the S&P 500, senior economist Ronnie Walker concluded there is no meaningful relationship between AI adoption and productivity at the economy wide level. The same analysis found something most headlines skipped: a median productivity gain of around 30 percent in exactly two places.

That contradiction is the whole story. Aggregate AI productivity is statistically invisible. Concentrated AI productivity is enormous. The difference between those two facts is not the technology. It is where companies point it.

The Null Result Is Real, and So Is the 30 Percent

Walker's team read what management said and then checked it against what the data did. The talking was loud. According to Goldman's analysis of S&P 500 earnings calls, 70 percent of management teams discussed AI. Only 10 percent quantified its impact on a specific use case. Just 1 percent could put a number on the earnings effect. Most companies are narrating a transformation they cannot measure.

Yet among the firms that did implement AI successfully, the gains were not modest. The two use cases that delivered a roughly 30 percent median productivity boost were customer support and software development. Not "AI across the business." Two specific, bounded functions where the work repeats and the output is countable.

Why the Average Hides the Win

An economy wide average is a blunt instrument. It blends the company that rebuilt its support queue around an AI agent with the hundred companies that bought a chatbot license, ran a pilot, and quietly shelved it. The winners and the stalled cancel each other out, and the headline reads zero.

The MIT NANDA project found the same shape from a different angle. Its State of AI in Business 2025 report tracked 30 to 40 billion dollars of enterprise GenAI investment and concluded that 95 percent of pilots delivered no measurable return. The 5 percent that worked were not lucky. They were focused. MIT also found that more than half of GenAI budgets went to sales and marketing tools, while the biggest ROI showed up in back office automation: removing outsourced process work, cutting agency spend, and streamlining operations. Companies were spending where AI looked exciting and earning where AI was boring.

Put the two studies together and the lesson is uncomfortable. The aggregate null is not evidence that AI fails. It is evidence that most deployments are spread too thin to register. Spread AI across everything and you get zero. Concentrate it on one repeatable, measurable job and you get 30 percent.

Customer Support Is Not a Coincidence

Look at what the two winning use cases have in common. Customer support and software development are both high volume, repetitive, and already documented. The work follows known patterns. The output can be scored: tickets resolved, response time, code shipped, defects caught. When a task is repeatable and measurable, an AI system has clear examples to learn from and a clear number to be held against. When a task is fuzzy, AI produces fuzzy output that nobody can prove worked.

For anyone running a ticket-heavy operation, customer support landing on that short list is the signal worth acting on. A support queue is the cleanest version of the conditions that produce the 30 percent. High volume, repeating shapes, written-down decision rules, and a cost per ticket you can already calculate. It is the part of the business most likely to compound, and it is usually the part still being run on headcount because that felt safer than betting on a pilot.

What This Means If You Run a Ticket-Heavy Operation

The strategic move is the opposite of the instinct. The instinct, fed by every vendor deck, is to deploy AI broadly so no team feels left out. The data says broad deployment is how you join the 95 percent. Pick one ticket flow instead. Make it the one with the highest volume and the decision rules already written down, usually intake and triage in support, maintenance, or service operations.

Then insist on a number before you start. Not "completion rate," which is a vanity metric, but a hard business figure: cost per resolved ticket, median response time, percentage of cases closed without a human touching them. The successful firms in Goldman's data were the 1 percent who could quantify the earnings impact. Being able to measure the outcome is not a reporting nicety. It is the thing that separates the 30 percent from the noise.

One caution sits inside the same Goldman dataset. Firms that discussed AI alongside workforce decisions cut job openings by 12 percent, steeper than the 8 percent across all firms. AI framed as a headcount story tends to shrink the organization's capacity to absorb the next wave of work. AI framed as a capacity story, where an agent resolves the routine tickets so people handle the exceptions, does the opposite. The same tool produces a very different operation depending on which story you build it around.

What Concentration Looks Like in Practice

Picture two operations with identical support volumes, say four thousand tickets a month. The first buys an AI assistant and rolls it out as an optional helper across every team: support, sales, HR, finance. Adoption is patchy, nobody owns the outcome, and at quarter end the cost per ticket has not moved. This operation becomes a line in the 95 percent.

The second points the same budget at one flow. It maps the top ten ticket types that make up most of its volume, writes the decision rules its best agents already follow, and puts an AI system in charge of intake, triage, and resolution for those types, escalating only the exceptions. Within a quarter, a measurable share of tickets close without a human, response time drops, and the cost per resolved ticket falls. Same technology, same spend, opposite result. The only variable that changed was focus.

The software development half of Goldman's finding works the same way and reinforces the point. Coding assistants compound because the task is bounded, the feedback is immediate, and quality is testable. The lesson is not "AI is good at code." It is that bounded, testable, high-frequency work is where AI converts effort into measurable output. Every ticket-heavy operation has at least one process that fits that description, and it is almost never the one getting the most attention in the AI strategy deck.

The Reframe

The Goldman null result will be quoted for the next year as proof that the AI boom is oversold. Read one layer down and it says the opposite. The technology works exactly where the conditions are right, and it works at a scale, 30 percent, that almost no other operational lever can match. The companies seeing nothing are not using bad AI. They are using good AI everywhere and measuring it nowhere. The ones seeing 30 percent found the one or two flows where the work repeats, pointed an agent at it, and counted the result. The dream to stop buying is the broad one. The flow to start with is the one already generating tickets every hour.

BLOG

Other insights

More insights

Bloxs and AI: Why Tenant Communication Is the Biggest Untapped Win in Property Management

Insights

Apr 17, 2026

Bloxs and AI: Why Tenant Communication Is the Biggest Untapped Win in Property Management

How Bloxs users are combining AI to automate tenant communication, and what the data shows about adoption rates, ROI, and what actually works.

Insights

Apr 13, 2026

Anthropic Built an AI That Found 3,000 Zero-Day Vulnerabilities. Then They Refused to Release It.

Claude Mythos scores 93.9% on SWE-bench and found 3,000+ zero-days including a 27-year-old OpenBSD bug. Anthropic won't release it publicly.

Insights

Apr 10, 2026

72% of Tenants Leave Over Response Time, Not the Repair Itself

AI responds to maintenance tickets in 3 minutes. Property managers take days. Data shows this response gap drives 72% of tenant non-renewals.

Goldman Found No Economy Wide AI Productivity Gain. Two Use Cases Hit 30%.

Ralf Klein

The Null Result Is Real, and So Is the 30 Percent

Why the Average Hides the Win

Customer Support Is Not a Coincidence

What This Means If You Run a Ticket-Heavy Operation

What Concentration Looks Like in Practice

The Reframe

Other insights

Bloxs and AI: Why Tenant Communication Is the Biggest Untapped Win in Property Management

Anthropic Built an AI That Found 3,000 Zero-Day Vulnerabilities. Then They Refused to Release It.

72% of Tenants Leave Over Response Time, Not the Repair Itself

[email protected]

+31 682 0545 07