(Or: a brief history of a six-month KPI.)
TL;DR
Sometime in early 2025, a handful of large tech companies decided that the way to measure AI productivity was to count the tokens each employee burned through. Internal leaderboards appeared. Performance reviews started referencing token volumes. NVIDIA's CEO suggested engineers should have annual token budgets equivalent to half their salary. And then, almost on schedule, the metric did what every measure does when it becomes a target. Engineers gamed it. Costs exploded. Meta quietly dismantled its leaderboard. The story now has a name (tokenmaxxing), a body count (Uber's $3.4 billion annual AI budget, gone in four months), and a lesson fifty years old that nobody listened to. Hi, Goodhart.
A short history of a bad idea
In 2025, the question every executive started asking was the same one. Are my people actually using AI? The honest answer was hard to come by. So the proxy of choice became tokens.
Tokens are easy to count. Every prompt, every reply, every agent call gets logged. Dashboards lit up. Internal leaderboards appeared at Meta, OpenAI, and Shopify, listing engineers by daily, weekly, and monthly token volume, with the leaders quietly celebrated and the laggards politely nudged. Performance reviews started including token-volume numbers. CNBC reports that almost every Fortune 500 is now tracking overall AI usage in some form.
At NVIDIA GTC 2026, Jensen Huang publicly suggested that engineers should have annual token budgets equivalent to roughly 50 percent of their salary, or around $250,000 in tokens a year. That is, depending on your perspective, either an extraordinary investment in productivity or an extraordinary thing to put in a slide deck.
The phenomenon now has a name. Tokenmaxxing.
The bills, and the gaming, arrive
The numbers got large quickly. Uber exhausted its entire annual AI budget, $3.4 billion, in four months. Per-engineer monthly AI costs at top tech firms now sit between $500 and $2,000. At Uber specifically, 95 percent of developers adopted AI tools and 70 percent of committed code became AI-generated, which sounds impressive until you ask whether the same proportion of value was generated.
It wasn't. Meta dismantled its token leaderboard after discovering engineers were "burning millions of tokens for literally zero productivity." When asked to explain themselves, engineers had become creative in the usual ways. Longer prompts. More agents. More context windows left open. Parallel sessions running for the sake of having parallel sessions running. The same instinct that used to pad lines of code now pads token counts.
This is, of course, the oldest story in management. Charles Goodhart, in 1975, observed that the moment a measure becomes a target, it stops being a good measure. The field rediscovers Goodhart every couple of years and gives him a new outfit. Tokenmaxxing is the most expensive outfit so far.
What the smart companies are doing instead
The pivot, where it's happening, is toward measuring output rather than input.
Salesforce has introduced Agentic Work Units, or AWUs, explicitly designed to capture output (work completed, features shipped, problems solved) rather than consumption (tokens burned). The category is still emerging, but the principle is right. If you want to know whether AI is being used well, count what gets done, not what gets typed.
The smarter version of this conversation is happening at the level of unit economics. What is the cost-per-feature-shipped, with AI versus without? What is the bug rate? What is the cycle time? These are harder to measure than raw token counts, which is exactly why they're worth measuring.
The fifty-year-old lesson
Tokenmaxxing, in its current form, will not survive. For the same reason every metric of its type has never survived. You cannot count an input and expect to measure an output. You can count lines of code; you'll get longer code. You can count meetings attended; you'll get more meetings. You can count tokens consumed; you'll get more tokens consumed. None of this is the same as work.
The token is to AI what the line of code was to engineering, and what the billable hour is to professional services. A useful internal accounting unit. A terrible measure of value.
The companies that win the next decade will figure this out quickly. They'll measure what AI lets them ship, not what AI lets them spend. The companies that don't will keep running expensive leaderboards full of engineers cheerfully gaming the wrong number.
Be careful what you measure. They'll give it to you.