I never took tokenmaxxing to be about improving productivity directly; mundane feature work that comes out of it is just a side effect. I always saw it as a race between these big tech companies to get a generational advantage by being the one to discover the way of the future, with respect to harnessing AI to actually and truly automate software development.
EDIT: whoa, I used "way of the future" as a reference to Howard Hughes in "The Aviator", not this Way of the Future religious organization thing I just stumbled on; no intended reference there.
Agree and I have wondered behind close doors if this is not the mental model. You need to spend money to see what is working this was simply a way to see that.
Eh, I also saw it as a rather blatant attempt to undermine the bargaining power of the Software Engineer (which had grown to insane proportions over the years) - both in working conditions and raw cash money.
In many cases it really didn’t/doesn’t matter if the AI automation actually works, just that people think it could - and hence leave money on the table.
Anthropic's annualized run rate is >$40b according to outside reporting. AWS hit that by Q4 2019. There were still debates on public cloud vs on prem at that time, but by late 2019 public cloud had facilitated the creation or adoption of entire categories of software within SaaS and PaaS, not to mention consumer internet businesses like Uber and Airbnb. The net impact of AI coding tools is far more ambiguous in comparison.
The profitability comparison is fraught but worth noting that by then AWS was already extremely profitable.
Now feels like a very good time to be a small team of experienced developers who can largely work on stuff by themselves and not a corporation of hundreds of developers of varying abilities all now trying to show how much code they can generate and how many tokens they can burn.
A friend of mine added some pretty extensive iOS UI tests to a keystone feature hit by millions every month. They'd been kicking the can down the road for years, trying to fit it in their roadmap, and with Claude running overnight they were able to bang out the whole suite in a week.
I'm not sure how it would show up in quarterly results.
I see these kinds of stories here a lot, and I'm curious whether they reflect a steady stream of need for AI coding, or whether a lot of companies have a burst of AI-appropriate coding work now that the technology is available and then will have a smaller need going forward.
Is it like the stereotypical dad who rents a power washer, powerwashes every exposed surface on his property, and then doesn't need to do any powerwashing for a few years; his neighbor who gets an Instant Pot and uses it for every meal for a month, then sees it gathering dust when the family gets tired of pressure-cooked stews; or like their neighbor who gets a microwave oven and uses it multiple times a day for decades?
> or whether a lot of companies have a burst of AI-appropriate coding work now that the technology is available and then will have a smaller need going forward
For the product my friend works on, it's definitely the latter. I definitely don't expect this party to last forever.
So far where I work its the Instant Pot, at least for the non-devs. We rolled out Claude & Cowork to the masses after a brief pilot. It was about a solid month and a half of heavy usage and then suddenly usage fell off a cliff. Once it stopped being a cool new toy, people just didn't find a use for it.
A few mundane things got automated, but these were just back office admin type work. Nothing that's going to show on the P&L. Yeah those people now have a little more time for other things, but those other things are also not revenue generating. No FTE got replaced by it so in the end they just paid for a bunch of administrative positions to be a little less busy. Great for the workers who are now less stressed, but almost no impact on the business financials except there's now yet another subscription.
That’s been my theory - there’s some low hanging fruit in every environment where AI knocks it out of the park. Then complex brownfield reality (coupled with non-technical factors) rears its head and the stunning productivity gains are nowhere near to be seen.
That’s the explanation how you can have both the anecdotes of amazing AI productivity and rigorous studies showing anything from actual loss of productivity to single-digit gains.
In my limited experience with using agents to create tests it tends to code the tests to the existing code instead of ensuring the correctness from a spec. Great for regression testing but still limited in effectiveness for catching existing issues.
Even if Uber really did double developer productivity, would it translate to quarterly results?
Ultimately they make money selling rides, not selling software. The Uber app is mature and adding new features is unlikely to significantly increase sales.
Writing 2x more code doesn't translate to 2x more revenue unless it results in 2x more rides.
I am not sure how uber is operating internally around the use of tokens but if they actually shipped features faster than before then it is still a win. if they learn that users don't want these features or want a different version of it; they have learned this new knowledge faster than they would have if they manually coded those features, which means in principle you should be able to iterate faster. but this will collide with creative ceiling that humans exhibit in a span of time and on top of that uber is prioritizing spending money on tokens over humans which seems like a mistake. you need humans for creativity.
I used Uber for the first time in like 8 years recently, and near as I can tell it's the exact same thing it was. What features are they even adding at all much less that anyone cares about? You ask for a ride to a place, a driver shows up, money is exchanged - the end?
Sometimes things are actually just finished. They don't need to treadmill.
They have to make it easier for ubereats orders to have parts of their bill attached to a corporate card and half attached to someone's personal card and be able to split the invoices!
I do find it to be true that with coding agents the famous quote from Jurassic Park goes through my head multiple time a day
"our scientists were so preoccupied with whether they could, they never stopped to ask if they should.
I've now come to the realization that if I'm having an llm work constantly all day writing code for me i'm probably doing something wrong as I'm no longer focusing on the core issue itself.
I may be in a minority here in that I write code to augment my self and not to ship to others so I can tell very quickly if I'm just gold platting something or if i'm actually delivering real value to my trading or risk management.
This argument is funny because you could have said the same thing 4 years ago: Uber still picks you up just as it did years before that, so what did all those millions spent on developer salaries get them?
Uber’s business is relentlessly confusing for people who think it’s a simple app to send an alert to a nearby driver to pick you up.
Uber operates at a scale where there are no trivial problems because even small changes can impact hundred of thousands of customers. They can also justify spending time and money on new features that only 0.1% of customers might use because 0.1% of their customers is a very large number.
> Eats for the way - your driver picks up a takeaway for you to eat while they drive you to your destination.
This seems like the kind of terrible idea that an LLM might have come up with. I'm pretty sure most drivers do not want people eating (especially a whole meal) in their car, and I can't imagine a lot of instances where you're calling an Uber and don't have time to get yourself food, but don't mind waiting an extra 10 minutes for the driver to detour, find parking, and wait for your food.
> I can't imagine a lot of instances where you're calling an Uber and don't have time to get yourself food
Recently I got a car to take me to the train station and picked up food on the way. Seems pretty common to me. Of course, I didn't need or want it charged as a premium feature in the app.
Not to mention what anyone who's worked in an office with a shared kitchen can tell you - the smell getting into a car where an indeterminate amount of people have eaten different meals. Like climbing into a food court dumpster.
This seems like the doom of all tech companies that hit a single kernel of a good idea, hire a big development team to build it, and then, once it's running well and making money, leadership looks around and sees this big body of developers, product managers, project managers, QA, and management tree, looking around for something else to do. Then, instead of saying, "Let's find the next big thing to do," they say, "Cram dozens more things into the thing that already works. Anything you can think of, spin up a team of 10 to bolt it onto the main product. Move things around to make everything fit. Run experiments on users to see if this new crap moves the metrics. A/B test to see what we should keep and what we should silently remove next update. Attach this other company's product that we just bought."
In a few years, what do you end up with? The modern version of every single fucking app we use today.
Well, travel booking is one of those things every company wants to get involved with because it's just straight referral fees. I get advertisements to book travel through my phone company (T-Mobile US) and a slew of financial services companies.
If it's easy enough to add to the app and sticks around for a while, it may well be profitable even if only a small percentage of customers use it or even realize it's available.
There's probably tons of backend projects going on, expanding in countries, payments, complying with regulations, effeciency and reliability projects. They also do food delivery. There's a whole engineering team to support
I really don’t understand on the customer side of B2B why so many companies actively encouraged AI tooling costs.
I can understand it from the side of the companies selling tokens and AI hardware. I don’t understand the race to spend more on internal tools.
I’ve been sitting around waiting for my company to buy a number of necessary bits of tools. They cheap out on every solution imaginable. Datadog is too expensive, let’s buy a cheap solution that costs us months of setup time. Configuration management is too expensive, let’s use the free version with no audit trail or dashboard.
But everyone…in the entire company…gets multiple AI tool subscriptions.
I don’t remember investors being this stupid at any other point. I don’t recall investors pressuring my company to use blockchain or NFTs.
The logic is quite simple. Management thinks that AI can improve productivity, but knows that there will be some resistance and some learning curve. So they force people to use it so that people can 1. develop their skills and workflow and 2. find out where it is useful 3. find out what needs to be improved to make it useful.
As a more obvious example consider that cars were just invented and the post office management thinks that they could improve performance of letter carriers. But right now cars are slow, break down a lot and there isn't much infrastructure for them. Lots of letter carriers will (rightly) think that it is a waste of time because they need to get in, stop, park between every house and they break down so often it isn't worth it and half of their route is unsuitable for a car anyways. But if cars are forced for a while they will find out what routes work well for cars and which don't, improve the cars and related infrastructure to make cars more effective and other improvements to unlock more productivity.
So yes, right now management is wasting money on cars and gas for no increased productivity. And yes, measuring how much gas each employee uses and encouraging to use more is obviously stupid in isolation. But the idea is to force adoption to iron out the kinks and find out where it can improve productivity. It is basically funding a research project.
IMO, the root of all of this is the almost total inability for most managers, and most eng orgs to measure individual engineer output in any useful way. And in particular in a way that lets you reliably compare engineers to each other.
Despite decades of the industry telling itself that we "pay for performance" or whatever, that has never been the case because we can't really measure performance very well. Where I have seen it done ok (not great, just ok), it was massively labor intensive and did not last, and was only done fully when considering promotion.
So, as you observe, now we have some new technique that managers are sure will increase performance by 50+%, if only people would use it. They can't just raise their expectations of performance by 50%, because they can't measure performance to within 50%! So, they measure the thing they can: token consumption.
How will companies like Uber continue to fund the “research” when the budget ends up burning 3x faster than predicted without being able to observe measurable gains?
Nothing that C-execs and management advocates for has made any sense for a long time now. If this is the first you're starting to question it all, I must ask what rock you're sleeping under because I desperately need a really good nap...
Strategy - just doing what your friends on the golf course are doing.
The number of times I have been told "oh I talked to so and so and they are having SUCH a good time using X" and then three years later "oh I talked to so and so and they got rid of X as soon as they could, we should switch!"
Or the management all read the same article in PC Magazine and lo! the next day did orders come down to implement said article, regardless. Waterworld, rowing the Valdez. Some years of this usually results in some number of half-baked or half-implemented systems scattered about production, and who knows which if any are actually used, or how much stink there will be to shutdown something unpatchable. Like why are there two wiki engines, sharepoint, three different database servers, …
I surely remember everyone does SOA, everyone does NoSQL, everyone does Hadoop, everyone does microservices, everyone does kubernetes,....
Not with the same pressure as everyone in the company (literally everyone, regardless of the job role) has to burn AI tokens, and attend forced AI workshops, still it is always running after the next new shinny.
Nobody wanted to admit that they had no idea how "AI" was going to help but nobody wanted to get left off the hype train...so they tasked their engineers to figure something out...by just asking them to spend as much as possible (As I explain this it just sounds stupider and stupider). Of course, spending willy-nilly is not a good way to find a profitable (or smart) idea, but that's a problem for future company bottom line.
It feels like maybe the wheels are starting to fall off the AI hype train. I expect complete collapse once people start figuring out that the numbers on all this don’t make sense. I’m looking for investment portfolios that will weather that storm. If you are reading this and have a similar curiosity, this is a great place to start.
I've been thinking that for years about various sources and the bubble stubbornly refuses to pop on a convenient timeline so I'm falling back on the adage "time in the market beats trying to time the market". Index funds and chill is much more relaxed than trying to determine who's actually going to survive the AI bubble popping.
I never took tokenmaxxing to be about improving productivity directly; mundane feature work that comes out of it is just a side effect. I always saw it as a race between these big tech companies to get a generational advantage by being the one to discover the way of the future, with respect to harnessing AI to actually and truly automate software development.
EDIT: whoa, I used "way of the future" as a reference to Howard Hughes in "The Aviator", not this Way of the Future religious organization thing I just stumbled on; no intended reference there.
Agree and I have wondered behind close doors if this is not the mental model. You need to spend money to see what is working this was simply a way to see that.
Eh, I also saw it as a rather blatant attempt to undermine the bargaining power of the Software Engineer (which had grown to insane proportions over the years) - both in working conditions and raw cash money.
In many cases it really didn’t/doesn’t matter if the AI automation actually works, just that people think it could - and hence leave money on the table.
Anthropic's annualized run rate is >$40b according to outside reporting. AWS hit that by Q4 2019. There were still debates on public cloud vs on prem at that time, but by late 2019 public cloud had facilitated the creation or adoption of entire categories of software within SaaS and PaaS, not to mention consumer internet businesses like Uber and Airbnb. The net impact of AI coding tools is far more ambiguous in comparison.
The profitability comparison is fraught but worth noting that by then AWS was already extremely profitable.
Now feels like a very good time to be a small team of experienced developers who can largely work on stuff by themselves and not a corporation of hundreds of developers of varying abilities all now trying to show how much code they can generate and how many tokens they can burn.
What has been the end result of all the tokens companies are burning?
Where does it show up in quarterly results?
I can’t see how it’s sustainable just based on “this feels more productive”
A friend of mine added some pretty extensive iOS UI tests to a keystone feature hit by millions every month. They'd been kicking the can down the road for years, trying to fit it in their roadmap, and with Claude running overnight they were able to bang out the whole suite in a week.
I'm not sure how it would show up in quarterly results.
I see these kinds of stories here a lot, and I'm curious whether they reflect a steady stream of need for AI coding, or whether a lot of companies have a burst of AI-appropriate coding work now that the technology is available and then will have a smaller need going forward.
Is it like the stereotypical dad who rents a power washer, powerwashes every exposed surface on his property, and then doesn't need to do any powerwashing for a few years; his neighbor who gets an Instant Pot and uses it for every meal for a month, then sees it gathering dust when the family gets tired of pressure-cooked stews; or like their neighbor who gets a microwave oven and uses it multiple times a day for decades?
I guess only time will tell.
> or whether a lot of companies have a burst of AI-appropriate coding work now that the technology is available and then will have a smaller need going forward
For the product my friend works on, it's definitely the latter. I definitely don't expect this party to last forever.
So far where I work its the Instant Pot, at least for the non-devs. We rolled out Claude & Cowork to the masses after a brief pilot. It was about a solid month and a half of heavy usage and then suddenly usage fell off a cliff. Once it stopped being a cool new toy, people just didn't find a use for it.
A few mundane things got automated, but these were just back office admin type work. Nothing that's going to show on the P&L. Yeah those people now have a little more time for other things, but those other things are also not revenue generating. No FTE got replaced by it so in the end they just paid for a bunch of administrative positions to be a little less busy. Great for the workers who are now less stressed, but almost no impact on the business financials except there's now yet another subscription.
That’s been my theory - there’s some low hanging fruit in every environment where AI knocks it out of the park. Then complex brownfield reality (coupled with non-technical factors) rears its head and the stunning productivity gains are nowhere near to be seen.
That’s the explanation how you can have both the anecdotes of amazing AI productivity and rigorous studies showing anything from actual loss of productivity to single-digit gains.
In my limited experience with using agents to create tests it tends to code the tests to the existing code instead of ensuring the correctness from a spec. Great for regression testing but still limited in effectiveness for catching existing issues.
It wouldn't, at least not directly. That's why it wasn't done pre-AI.
Even if Uber really did double developer productivity, would it translate to quarterly results?
Ultimately they make money selling rides, not selling software. The Uber app is mature and adding new features is unlikely to significantly increase sales.
Writing 2x more code doesn't translate to 2x more revenue unless it results in 2x more rides.
Lowering costs to run the infra would show up as increased profits without any change in rides.
Do you really need AI for that? Seems like the thing any existing engineering team could do if it was prioritized.
In the big red number shown after revenue where profits used to be.
Probably shows up in OpenAI and Anthropic quarterly reports. I have to wonder if that was the point.
Advancement in AI research seems to be the only thing at this point.
> Where does it show up in quarterly results?
Standard answer is "companies that are not seeing significant gains from AI just aren't AI-ing hard enough, trust me bro".
Dupe: https://news.ycombinator.com/item?id=48268871
I am not sure how uber is operating internally around the use of tokens but if they actually shipped features faster than before then it is still a win. if they learn that users don't want these features or want a different version of it; they have learned this new knowledge faster than they would have if they manually coded those features, which means in principle you should be able to iterate faster. but this will collide with creative ceiling that humans exhibit in a span of time and on top of that uber is prioritizing spending money on tokens over humans which seems like a mistake. you need humans for creativity.
I used Uber for the first time in like 8 years recently, and near as I can tell it's the exact same thing it was. What features are they even adding at all much less that anyone cares about? You ask for a ride to a place, a driver shows up, money is exchanged - the end?
Sometimes things are actually just finished. They don't need to treadmill.
They have to make it easier for ubereats orders to have parts of their bill attached to a corporate card and half attached to someone's personal card and be able to split the invoices!
hot take: token spend can be used a honey pot, especially when compared to what you deliver. spend accordingly!
I do find it to be true that with coding agents the famous quote from Jurassic Park goes through my head multiple time a day
"our scientists were so preoccupied with whether they could, they never stopped to ask if they should.
I've now come to the realization that if I'm having an llm work constantly all day writing code for me i'm probably doing something wrong as I'm no longer focusing on the core issue itself.
I may be in a minority here in that I write code to augment my self and not to ship to others so I can tell very quickly if I'm just gold platting something or if i'm actually delivering real value to my trading or risk management.
I still get picked up by an Uber the same way. As an end user, nothing has changed for me.
So I wonder what the heck were all those billions of AI tokens burnt on that they extinguished it in just 4 months into the year?
This argument is funny because you could have said the same thing 4 years ago: Uber still picks you up just as it did years before that, so what did all those millions spent on developer salaries get them?
Uber’s business is relentlessly confusing for people who think it’s a simple app to send an alert to a nearby driver to pick you up.
Uber operates at a scale where there are no trivial problems because even small changes can impact hundred of thousands of customers. They can also justify spending time and money on new features that only 0.1% of customers might use because 0.1% of their customers is a very large number.
Apparently:
* In App Hotel bookings in partnership with Expedia.
* Travel Mode with suggestions on where to eat and visit when travelling.
* Eats for the way - your driver picks up a takeaway for you to eat while they drive you to your destination.
* Voice bookings using AI and speech to text.
How did we ever live without them!
> Eats for the way - your driver picks up a takeaway for you to eat while they drive you to your destination.
This seems like the kind of terrible idea that an LLM might have come up with. I'm pretty sure most drivers do not want people eating (especially a whole meal) in their car, and I can't imagine a lot of instances where you're calling an Uber and don't have time to get yourself food, but don't mind waiting an extra 10 minutes for the driver to detour, find parking, and wait for your food.
> I can't imagine a lot of instances where you're calling an Uber and don't have time to get yourself food
Recently I got a car to take me to the train station and picked up food on the way. Seems pretty common to me. Of course, I didn't need or want it charged as a premium feature in the app.
Not to mention what anyone who's worked in an office with a shared kitchen can tell you - the smell getting into a car where an indeterminate amount of people have eaten different meals. Like climbing into a food court dumpster.
Holy fuck, aside from the voice bookings, that's some useless shit to spend money building as far as both tokens and salaries go.
Are they profitable yet lol
This seems like the doom of all tech companies that hit a single kernel of a good idea, hire a big development team to build it, and then, once it's running well and making money, leadership looks around and sees this big body of developers, product managers, project managers, QA, and management tree, looking around for something else to do. Then, instead of saying, "Let's find the next big thing to do," they say, "Cram dozens more things into the thing that already works. Anything you can think of, spin up a team of 10 to bolt it onto the main product. Move things around to make everything fit. Run experiments on users to see if this new crap moves the metrics. A/B test to see what we should keep and what we should silently remove next update. Attach this other company's product that we just bought."
In a few years, what do you end up with? The modern version of every single fucking app we use today.
Well, travel booking is one of those things every company wants to get involved with because it's just straight referral fees. I get advertisements to book travel through my phone company (T-Mobile US) and a slew of financial services companies.
If it's easy enough to add to the app and sticks around for a while, it may well be profitable even if only a small percentage of customers use it or even realize it's available.
they are very profitable now!
For context, this is an interview where Uber CEO discussed these ideas:
https://www.theverge.com/podcast/922909/dara-khosrowshahi-ub...
Can't say I am convinced.
There's probably tons of backend projects going on, expanding in countries, payments, complying with regulations, effeciency and reliability projects. They also do food delivery. There's a whole engineering team to support
Goodhart's law strikes again. Stop giving your engineers token-burning quotas or they'll burn tokens.
I really don’t understand on the customer side of B2B why so many companies actively encouraged AI tooling costs.
I can understand it from the side of the companies selling tokens and AI hardware. I don’t understand the race to spend more on internal tools.
I’ve been sitting around waiting for my company to buy a number of necessary bits of tools. They cheap out on every solution imaginable. Datadog is too expensive, let’s buy a cheap solution that costs us months of setup time. Configuration management is too expensive, let’s use the free version with no audit trail or dashboard.
But everyone…in the entire company…gets multiple AI tool subscriptions.
I don’t remember investors being this stupid at any other point. I don’t recall investors pressuring my company to use blockchain or NFTs.
The logic is quite simple. Management thinks that AI can improve productivity, but knows that there will be some resistance and some learning curve. So they force people to use it so that people can 1. develop their skills and workflow and 2. find out where it is useful 3. find out what needs to be improved to make it useful.
As a more obvious example consider that cars were just invented and the post office management thinks that they could improve performance of letter carriers. But right now cars are slow, break down a lot and there isn't much infrastructure for them. Lots of letter carriers will (rightly) think that it is a waste of time because they need to get in, stop, park between every house and they break down so often it isn't worth it and half of their route is unsuitable for a car anyways. But if cars are forced for a while they will find out what routes work well for cars and which don't, improve the cars and related infrastructure to make cars more effective and other improvements to unlock more productivity.
So yes, right now management is wasting money on cars and gas for no increased productivity. And yes, measuring how much gas each employee uses and encouraging to use more is obviously stupid in isolation. But the idea is to force adoption to iron out the kinks and find out where it can improve productivity. It is basically funding a research project.
IMO, the root of all of this is the almost total inability for most managers, and most eng orgs to measure individual engineer output in any useful way. And in particular in a way that lets you reliably compare engineers to each other.
Despite decades of the industry telling itself that we "pay for performance" or whatever, that has never been the case because we can't really measure performance very well. Where I have seen it done ok (not great, just ok), it was massively labor intensive and did not last, and was only done fully when considering promotion.
So, as you observe, now we have some new technique that managers are sure will increase performance by 50+%, if only people would use it. They can't just raise their expectations of performance by 50%, because they can't measure performance to within 50%! So, they measure the thing they can: token consumption.
How will companies like Uber continue to fund the “research” when the budget ends up burning 3x faster than predicted without being able to observe measurable gains?
Nothing that C-execs and management advocates for has made any sense for a long time now. If this is the first you're starting to question it all, I must ask what rock you're sleeping under because I desperately need a really good nap...
I knew they were stupid, I just didn’t know it got way down to this level
Strategy - just doing what your friends on the golf course are doing.
The number of times I have been told "oh I talked to so and so and they are having SUCH a good time using X" and then three years later "oh I talked to so and so and they got rid of X as soon as they could, we should switch!"
Or the management all read the same article in PC Magazine and lo! the next day did orders come down to implement said article, regardless. Waterworld, rowing the Valdez. Some years of this usually results in some number of half-baked or half-implemented systems scattered about production, and who knows which if any are actually used, or how much stink there will be to shutdown something unpatchable. Like why are there two wiki engines, sharepoint, three different database servers, …
I surely remember everyone does SOA, everyone does NoSQL, everyone does Hadoop, everyone does microservices, everyone does kubernetes,....
Not with the same pressure as everyone in the company (literally everyone, regardless of the job role) has to burn AI tokens, and attend forced AI workshops, still it is always running after the next new shinny.
Nobody wanted to admit that they had no idea how "AI" was going to help but nobody wanted to get left off the hype train...so they tasked their engineers to figure something out...by just asking them to spend as much as possible (As I explain this it just sounds stupider and stupider). Of course, spending willy-nilly is not a good way to find a profitable (or smart) idea, but that's a problem for future company bottom line.
Affordable inference will be around longer if more Big tech companies cap their AI sending.
It feels like maybe the wheels are starting to fall off the AI hype train. I expect complete collapse once people start figuring out that the numbers on all this don’t make sense. I’m looking for investment portfolios that will weather that storm. If you are reading this and have a similar curiosity, this is a great place to start.
https://portfoliocharts.com/2021/12/16/three-secret-ingredie...
I've been thinking that for years about various sources and the bubble stubbornly refuses to pop on a convenient timeline so I'm falling back on the adage "time in the market beats trying to time the market". Index funds and chill is much more relaxed than trying to determine who's actually going to survive the AI bubble popping.