All hail the great and mighty token! (Part 1)
You may have caught the news. Meta pulled the plug on an employee-built internal dashboard nicknamed “Claudeonomics” - a gamified leaderboard ranking the top 250 AI token users in the company, handing out badges like “Model Connoisseur,” “Cache Wizard,” and “Session Immortal.” (Ok… let’s be honest… I’d totally put those monikers on my resume.)
The numbers were absurd.
In a 30-day window, Meta employees collectively burned 60 trillion tokens. The top-ranked individual averaged 281 billion. On the cheapest frontier model, that one person alone could have cost the company north of $1.4 million. In a month.
Meta’s leaderboard came down two days after it hit the press: causation or correlation? But that hasn’t changed the trend.
OpenAI runs a leaderboard. Shopify has one too. One OpenAI power user reportedly logged 210 billion tokens in a week - about 33 times the entire text of Wikipedia. One Anthropic developer is burning $150K a month on Claude Code alone.
And then there’s Jensen Huang, who told his GTC audience recently he’d pay engineers half their base salary on top of salary - in tokens.
“It is now one of the recruiting tools in Silicon Valley,” he said. “How many tokens come along with my job?”
Tokens as the fourth pillar of compensation. Tokens as a status symbol. Tokens as the scoreboard for who’s winning the AI race.
I get the impulse. But we’re measuring the wrong thing.
Building a Field of Dreams: LLM played by Kevin Costner (Part 2)
Here’s what I think is actually going on - and it deserves more charity than it usually gets.
The leaders behind these leaderboards aren’t AI scientists. They’re business operators. They’ve spent careers running playbooks on growth and efficiency, and they know - better than anyone - that those outcomes are delivered by people.
Not tools. People using tools.
And now someone has handed them the shiniest, most unknowable tool of their career and said: go make magic. No playbook. No precedent. No operating model. Just a category-defining technology, a board breathing down their neck, and a workforce that ranges from “I built my own agent this weekend” to “what’s a prompt.” And let’s not forget a whole lot of folks being told that AI is about to eat their job (or the cause of their best friend’s recent unemployment).
So they did what smart operators do when they don’t know the answer. They reached for the simplest possible version of it:
If I can just get people to use it, the magic will come.
That’s not a dumb instinct. It’s a deeply human one. It’s Field of Dreams. Build it and they will come. Leaderboard token usage and people will latch onto the KPI as their north star, engaging in AI to achieve those goals. And in the early innings of any technology shift, getting people to touch the thing at all is a real win.
But it’s a field, not a path. It’s hope dressed up as strategy.
What comes next - the part where tool usage turns into workflow redesign, which turns into actual transformation - doesn’t happen because usage went up. It happens because behavior changed in a structured, measurable, repeatable way.
And that’s the inflection. You don’t get structured behavior change from a leaderboard. You get it from the right incentives pointed at the right outcomes.
The missing link in driving AI transformation (Part 3)
Before we go there, let’s be precise about why tokenmaxxing breaks down. The logic behind it runs like this:
- More usage → more fluency
- More fluency → more productivity
- More productivity → more outcomes
Step 2 is where it falls apart. Usage does not necessarily equate to productive use.
You can max your tokens without doing better work. You can max your tokens without learning a thing. And at scale, that’s exactly what happens - because the moment a metric becomes a target, people optimize the metric.
I’ve heard the stories (ok, ok… I know more than a couple folks working in companies with token leaderboards). Agents left running overnight to pad numbers. Recursive queries that accomplish nothing but burn compute. Knowledge workers pasting entire codebases into a prompt “to use more tokens,” producing the same deliverable they would have produced in Google Docs.
HubSpot’s CEO Yamini Rangan distilled the rebuttal in a single line: outcome maxxing beats token maxxing. Appian’s Matt Calkins compared tokenmaxxing to the Soviet practice of judging chandeliers by their weight. Jellyfish’s Andrew Lau warned you can tokenmaxx all day and ship nothing but chaos.
The most dangerous employee in your company right now is someone who’s bad at their job and is tokenmaxxing.
Like a moth to the flame - how to pick the right KPI for transformation (Part 4)
Here’s the underlying mechanic that token leaderboards got right, even if they executed it poorly: people move toward incentives. Always have.
Incentives are the most powerful lever leaders have for driving behavior change, which is why they’re the first thing any real transformation playbook reaches for. But incentives are indifferent. They’ll drive whatever behavior you attach them to. If you attach them to the wrong thing, you’ll get the wrong thing, loudly and at scale. So the job isn’t whether to use incentives. The job is making sure they point at behavior you actually want.
And incentives are doled out against a measurement. A KPI. Which means the single most critical question in a transformation effort is: how do you pick the right KPI?
Three filters - critical questions you need to consider in identifying the right KPI for your AI transformation initiative.
1. Does it measure outcome, or activity?
Activity metrics (tokens consumed, prompts sent, tools enabled) are easy to collect and trivial to game. Outcome metrics (time to resolution, cycle-time reduction, revenue per employee) are harder to collect and much harder to game - because they’re tied to things the business actually cares about.
Want to measure activity tied to an outcome? You can do that, you know. “Time to resolution when using an AI tool.”
Rule of thumb: if the metric can go up while your business stays flat, it’s activity. Use it for diagnostics, not for incentives.
2. If someone games it, is the gamed version still valuable?
Gaming tokens produces nothing. Gaming “tickets resolved under SLA with CSAT above 4” requires actually resolving tickets well. Pick the second kind.
3. Is it a leading or lagging indicator?
Lagging indicators (revenue, margin, NPS) tell you whether transformation worked six months ago. Essential for validation, useless for steering. Leading indicators (workflow redesign coverage, capability density, outcome-linked usage) tell you whether you’re on track now. Good systems pair both - leading to steer, lagging to validate.
Three filters. Clear all three, you have a KPI worth building incentives on. Fail any one, and you’ve just built the next token leaderboard.
This isn’t Lord of the Rings (Part 5)
There isn’t one ring to rule them all. There isn’t one KPI either.
Great AI use by a researcher looks nothing like great AI use by an engineer, which looks nothing like great AI use by a salesperson. Pretending otherwise is how you end up with a single enterprise-wide metric that flatters no one and steers nothing.
The right approach is backwards from how most companies do it: build KPIs up from personas first, then identify the commonalities across them. Start with the work. Work your way to the metric. Not the other way around.
Let me walk you through two as an example.
The researcher.
What great AI use looks like: faster hypothesis cycles, broader literature coverage in less time, more experiments framed and rejected before committing real resources. Candidate leading KPI: time from question to testable hypothesis, paired with peer-validated quality. Gaming it requires actually getting to better hypotheses faster. Good.
The engineer.
What great AI use looks like: quicker feature release cycles with fewer defects, faster cycles from ticket to merged PR, more of the boring stuff automated so the hard stuff gets brain time. Candidate leading KPI: AI-assisted PRs merged per sprint, weighted by review pass rate. Gaming it requires the PRs to actually be good. Also good.
Notice what’s happening in both: usage shows up, but always as a numerator over an outcome denominator. The denominator is what keeps the numerator honest.
I’d walk you through more personas, but - and I say this with love - I actively try to monetize my skills at this kind of work.
Don’t hate the token, hate the game (Part 6)
I have enormous respect for the companies and leaders at the cutting edge of this token conversation. Whatever you think of the execution, they’ve embraced large-scale transformation in the age of AI and planted a flag to drive it.
That’s not nothing. That’s more than most.
And let’s be honest with each other - there’s no way these titans of industry actually sat in a boardroom and said, “if only we put up a token leaderboard, people will magically start using AI and transform their work lives.” That’s not the question they were asking.
The question they were asking is the right one: even if our approach is imperfect, what can we do to catalyze behavior change?
How do we make learning, reshaping a workflow, rethinking a day-in-the-life-of, something people actually embrace? Tokenmaxxing and leaderboards were a pretty great start. Flawed, expensive, gameable - but a start.
It’s also played out. The companies that lit the fuse have already pulled the learnings and moved on to what’s next. So if you’re starting your company’s transformation journey today or tomorrow - tokenmaxxing isn’t the silver bullet. It’s time to think (or rethink) what is.
And boy do I have ideas. Let me know if you need help.


