The limiting factor at work isn't writing code anymore. It's deciding what to build and catching when things go sideways.
We've been running agent workflows for a while now. The pattern that works: treat agents like junior team members. Clear scope, explicit success criteria, checkpoints to review output. The skills that matter are the same ones that make someone a good manager of people.
pglevy is right that many managers aren't good at this. But that's always been true. The difference now is that the feedback loop is faster. Bad delegation to an agent fails in minutes, not weeks. You learn quickly whether your instructions were clear.
The uncomfortable part: if your value was being the person who could grind through tedious work, that's no longer a moat. Orchestration and judgment are what's left.
I've been saying it this whole time; it's not the engineers who need to be concerned with being replaced - it's anyone involved in the busywork cycle. This includes those who do busywork (grinding through tedium) and those who create it (MBAs, without apologies to the author).
Here's the thing - that feedback loop isn't a magic lamp. Actually understanding why an agent is failing (when it does) takes knowledge of the problem space. Actually guiding that feedback loop so it optimally handles tasks - segmenting work and composing agentic cores to focus on the right things with the right priority of decision making - that's something you need to be curious about the internals for. Engineering, basically.
One thing I've seen in using these models to create code is that they're myopic and shortsighted - they do whatever it takes to fix the problem right in front of them when asked. This causes a cascading failure mode where the code is a patchwork of one-off fixes and hardcoded solutions for problems that not only recur, they get exponentially worse as they compound. You'd only know this if you could spot it when the model says something like "I see the problem, this server configuration is blocking port 80 and that's blocking my test probes. Let me open that port in the firewall".
You still need to do most of the grunt work, verifying and organizing the code. it's just you're not editing the code directly. Speed of typing out code is hardly the bottle neck.
The bottleneck is visualizing it and then coming up with a way to figure out bugs or add features.
I've tried a bunch of agents, none of them can reasonably conduct a good architectural change in a medium size codebase.
> The skills that matter are the same ones that make someone a good manager of people.
I disagree. Part of being a good manager of (junior) people is teaching them soft skills in addition to technical skills -- how to ask for help and do their own research, and how to build their own skills autonomously, how to think about requirements creatively, etc.
Clear specifications and validating output is only a part of good people management, but is 100% of good agent management.
It’s teaching them in the first place. You can’t teach an LLM. Writing a heap of AGENTS.md is not teaching. LLMs take it as information, but they don’t learn from it in any non-superficial sense.
With https://code.claude.com/docs/en/skills you kinda can teach new things. And also, I have little doubt Anthropic reads these and future AIs might get trained on the most popular recommendations.
Yes, it's a crutch. But maybe the whole NNs that can code and we don't really know why is too.
> The uncomfortable part: if your value was being the person who could grind through tedious work, that's no longer a moat. Orchestration and judgment are what's left.
What kind of work do you think people who deal with LLMs everyday are doing? LLMs could maybe take something 60% of the way there. The remaining 40% is horrible tedious work that someone needs to grind through.
>> The limiting factor at work isn't writing code anymore. It's deciding what to build and catching when things go sideways.
Actually I disagree. I've been experimenting with AI a lot, and the limiting factor is marketing. You can build things as fast as you want, but without a reliable and repeatable (and at least somewhat automated) marketing system, you won't get far. This is especially because all marketing channels are flooded with user-generated content (UGS) that is generated by AI.
Recently, I came across Erich Fromm's distinction between "being mode" and "having mode" (AI really explained it the best, would paste it here but it's somewhat long). You're, in contrast with parent post, looking at it from the "having mode" - how to sell the "product" to someone.
But you can also think what would you want to build (for yourself or someone you know), that would otherwise take a team of people. Coding what used to be a professional app can now be a short hobby project.
I played with Claude Code Pro only a short while, but I already believe the mode of production of SW will change to be more accessible to individuals (pro or amateur). It will be similar to death of music labels.
> deciding what to build and catching when things go sideways
I feel like this was always true. Business still moves at the speed of high-level decisions.
> The uncomfortable part: if your value was being the person who could grind through tedious work, that's no longer a moat.
Even when junior devs were copy-pasting from stackoverflow over a decade ago they still had to be accountable for what they did. AI is ultimately a search tool, not a solution builder. We will continue to need junior devs. All devs regardless of experience level still have to push back when requirements are missing or poorly defined. How is picking up this slack and needing to constantly follow up and hold people's hands not "grinding through tedious work"?
AI didn't change anything other than how you find code. I guess it's nice that less technical people can now find it using their plain english ramblings instead of needing to know better keywords? AI has arguably made these search results worse, the need for good docs and examples even more important, and we've all seen how vibecoding goes off the rails.
The best code is still the least you can get away with. The skill devs get paid for has always been making the best choices for the use case, and that's way harder than just "writing code".
> And there was something else: most early startups need to pivot, changing direction as they learn more about what the market wants and what is technically possible. By lowering the costs of pivoting, it was much easier to explore the possibilities without being locked in or even explore multiple startups at once: you just tell the AI what you want.
In my experience so far, AI prototyping has been a powerful force for breaking analysis paralysis.
In the last 10 years of my career, the slow execution speed at different companies wasn't due to slow code writing. It was due to management excesses trying to drive consensus and de-risk ideas before the developers were even allowed to write the code. Let's circle back and drive consensus in a weekly meeting with the stakeholders to get alignment on the KPIs for the design doc that goes through the approval and sign off process first.
Developers would then read the ream and realize that perfection was expected from their output, too, so development processes grew to be long and careful to avoid accidents. I landed on a couple teams where even small changes required meetings to discuss it, multiple rounds of review, and a lot of grandstanding before we were allowed to proceed.
Then AI comes along and makes it cheap to prototype something. If it breaks or it's the wrong thing, nobody feels like they're in trouble because we all agree it was a prototype and the AI wrote it. We can cycle through prototypes faster because it's happening outside of this messy human reputation-review-grandstanding loop that has become the norm.
Instead of months of meetings, we can have an LLM generate a UI and a backend with fake data and say "This is what I want to build, and this is what it will do". It's a hundred times more efficient than trying to describe it to a dozen people in 1-hour timeslots in between all of their other meetings for 12 weeks in a row.
The dark side of this same coin is when teams try to rely on the AI to write the real code, too, and then blame the AI when something goes wrong. You have to draw a very clear line between AI-driven prototyping and developer-driven code that developers must own. I think this article misses the mark on that by framing everything as a decision to DIY or delegate to AI. The real AI-assisted successes I see have developers driving with AI as an assistant on the side, not the other way around. I could see how an MBA class could come to believe that AI is going to do the jobs instead of developers, though, as it's easy to look at these rapid LLM prototypes and think that production ready code is just a few prompts away.
> The dark side of this same coin is when teams try to rely on the AI to write the real code, too, and then blame the AI when something goes wrong. You have to draw a very clear line between AI-driven prototyping and developer-driven code that developers must own. I think this article misses the mark on that by framing everything as a decision to DIY or delegate to AI. The real AI-assisted successes I see have developers driving with AI as an assistant on the side, not the other way around. I could see how an MBA class could come to believe that AI is going to do the jobs instead of developers, though, as it's easy to look at these rapid LLM prototypes and think that production ready code is just a few prompts away.
This is what's missing in most teams. There's a bright line between throwaway almost fully vibe-coded, cursorily architected features on a product and designing a scalable production-ready product and building it. I don't need a mental model of how to build a prototype, I absolutely need one for something I'm putting in production that is expected to scale, and where failures are acceptable but failure modes need to be known.
Almost everyone misses this in going the whole AI hog, or in going the no-AI hog.
Once I build a good mental model of how my service should work and design it properly, all the scaffolding is much easier to outsource, and that's a speed up but I still own the code because I know what everything does and my changes to the product are well thought out. For throw-away prototypes its 5x this output because the hard part of actually thinking the problem through doesn't really matter its just about getting everyone to agree on one direction of output.
Most places I've worked, the "slow execution speed" wasn't because it took a long time to physically write the code, but it took a long time to get those other Analysis Paralysis things you mentioned: consensus among multiple ImportantPeople who all were expected to demonstrate "impact", agonizing over risks (perceived and real), begging VPs/leadership for their "buy-in", informing and receiving feedback from other vague "stakeholders" and so on. The software writing itself was never the bottleneck, and could be prototyped in 1/10th to 1/100th of the time it took to actually make the decision to write it.
I haven't had the analysis paralysis problem because I've always been quite decent at restructuring environments to avoid bureocracy (which can one of the most dangerous things for a project) but one thing I've observed is that If operations are not ZeroOps then whoever is stuck maintaining systems will suffer by not being able to deliver the "value adding cool features that drive careers".
Since shipping prototypes doesn't actually create value unless they're in some form of production environment to effect change, then either they work and are ZeroOps or they break and someone needs to operate on them and is accountable for them.
This means that at some point, your thesis of
"The dark side of this same coin is when teams try to rely on the AI to write the real code, too, and then blame the AI when something goes wrong" won't really work that way but whoever is accountable will get the blame and the operations.
The same principles for building software that we've always have apply more than ever to AI related things.
Easy to change, reusable, compostable, testable.
Prototypes need to be thrown away. Otherwise they're trace bullets and you don't want to have tech debt in your tracer bullets unless your approach is to throw it to someone else ans make it their problem.
-----
Creating a startup or any code from scratch in a way that you don't actually have to maintain and find out the consequences of your lack of sustainable approaches (tech debt/bad design/excessive cost) is easy. You hide the hardest part. It's easy to do things that in surface look good if you can't see how they will break.
The blog post is interesting but, unless I've missed something, it does gloss over the accountability aspect. If you can delegate accountability you don't worry about evals-first design, you can push harder on dates because you're not working backwards from the actual building and design and its blockers.
Evals (think promtpfoo) for evals-first design will be key for any builder who is accountable for the decisions of their agents (automation).
I thought I had a great startup idea. It was niche, but a solid global market. It was unique. There was a genuine pain point being solved. My MVP solved it. The pricing worked, the tiers were sound.
At least ChatGPT, Gemini and Claude told me it was. I did so many rounds of each one evaluating the other, trying to poke holes etc. Reviewing the idea and the "research", the reasoning. Plugging the gaps.
Then I started talking to real people about their problems in this space to see if this was one of them. Nope, not really. It kinda was, but not often enough to pay for a dedicated service, and not enough of a pain to move on from free workarounds.
Beware of AI reviewing AI. Always talk to real people to validate.
It’s weird when the failure modes of AI are similar.
I once solved a Leetcode problem kind of unorthodox and both ChatGPT and Gemini both said it was wrong in the same way. Then I asked both of them to give me a counter example and only Gemini was able to realize the counter example would have actually worked.
When thinking about automation people overindex on their current class biases. For 20 years we heard that robots were going to take over the “burger flipper” jobs. Why was it so easy to think that robots could replace fast food workers? Because they were the lowest rung on the career ladder, so it felt natural that they would be the first ones to get replaced.
Similarly, it’s easy to think that the lowly peons in the engineering world are going to get replaced and we’ll all be doing the job of directors and CEOs in the future, but that doesn’t really make sense to me.
Being able to whip your army of AI employees 3% better than your competitor doesn’t (usually) give any lasting advantage.
What does give an advantage is: specialized deep knowledge, building relationships and trust with users and customers, and having a good sense of design/ux/etc.
Like maybe that’s some of the job of a manager/director/CEO, but not anyone that I’ve worked with.
The "management as superpower" framing assumes people thoughtfully evaluate AI output. In practice, most users either review everything (slow, defeats the speed benefit) or review almost nothing (fast, but you're trusting the AI entirely). The MBAs who did well probably had domain expertise to spot wrong answers quickly, that's the actual superpower, not generic "management skill
> I think many people have the skills they need, or can learn them, in order to work with AI agents - they are management 101 skills.
I like his thinking but many professional managers are not good at management. So I'm not sure about the assumption that "many people" can easily pick this up.
5 years ago: ML-auto-complete → You had to learn coding in depth
Last Year: AI-generated suggestions → You had to be an expert to ask the right questions
Now: AI-generated code → You should learn how to be a PM
Future: AI-generated companies → You must learn how to be a CEO
Meta-future: AI-generated conglomerates → ?
Recently I realized that instead of just learning technical skills, I need to learn management skills. Specifically, project management, time management, writing specifications, setting expectations, writing tests, and in general, handling and orchestrating an entire workflow.
And I think this will only shift to the higher levels of the management hierarchy in the future. For example, in the future we will have AI models that can one-shot an entire platform like Twitter. Then the question is less about how to handle a database and more about how to handle several AI generated companies!
While we're at the project manager level now, in the future we'll be at the CEO level. It's an interesting thing to think about.
>While we're at the project manager level now, in the future we'll be at the CEO level.
This is the kind of half baked thought that seems profound to a certain kind of tech-brained poster on HN, but upon further consideration makes absolutely zero sense.
I was being sarcastic; your comment was a low blow. You didn't say why you disagreed with it. Might wanna read HN guidelines before leaving comments here.
I've never understood this train of thought. When working in teams and for clients, people always have questions about what we have created. "Why did you choose to implement it like this?" "How does this work?" "Is X possible to do within our timeframe/budget?"
If you become just a manager, you don't have answers to these questions. You can just ask the AI agent for the answer, but at that point, what value are you actually providing to the whole process?
And what happens when, inevitably, the agent responds to your question with "You're absolutely right, I didn't consider that possibility! Let's redo the entire project to account for this?" How do you communicate that to your peers or clients?
So... nothing. Glad we're in agreement here. If AI can do all the things people hope/dream it can, there won't be any value in doing it on behalf of folks. I would argue that even some "AI provider" (if that could even be a thing given a sophisticated enough agent) would see diminishing returns as the tech inevitably distills into everyone having bespoke agents running locally and handling/organizing/managing everything (of whatever needs managing, who knows).
Basically I don't see how you can be an AI maximalist and a capitalist at the same time. They're contradictory, IMO.
The moment we have true AGD (artificial general developer), we’ll also have AGI that can equally well serve as a CEO. Where humans sit then won’t be a question of intellectual skill differentiation among humans anymore.
I'd advise caution with this approach. One of the things I'm seeing a lot of people get wrong about AI is that they expect that it means they no longer need to understand the tools they're working with - "I can just focus on the business end". This seems true but it's not - it's actually more important to have a deep understanding of how the machine works because if the AI is doing things that you don't understand you run a severe risk of putting yourself in a very bad situation - insecure applications or servers, code with failure modes that are catastrophic edge cases you won't catch until they're a problem, data lossage / leakage.
If anything, managing the project, writing the spec, setting expectations and writing tests are things llms are incredibly well suited for. Getting their work 'correct' and not 'functional enough that you don't know the difference' is where they struggle.
"AI-generated company" as in the AI writes the A-Z of the code required to have a working platform like Twitter. Currently it can build some of the frontend or some of the backend, but not all. It's conceivable that in the future AI can handle the entire chain.
Also you're forgetting the decreasing cost of AI, as well as the fact that you can buy a $10k Mac Studio NOW and have it run 24/7 with some of the best models out there. Only costs would be the initial fixed cost and electric (250W at peak GPU usage).
AI is still being heavily subsidized. None of the major players have turned a profit, and they are all having to do 4D Chess levels of financing to afford the capex.
No, no companies and no CEOs. Just a user.
It's like StarTrek replicator. Food replication. No you are not a chef, not a restaurant manager, not agrifarm CEO but just a user that orders a meal. So yes you will need "skills" to specify the type of meal but nothing beyond that.
> I find it interesting to watch as some of the most well-known software developers at the major AI labs note how their jobs are changing from mostly programming to mostly management of AI agents.
"AI labs"
Can we stop this misleading language. They're doing product development. It's not a "laboratory" doing scientific research. There's no attempt at the scientific method. It's a software firm and these are software developers/project managers.
Which brings me to point 2. These guys are selling AI tooling. Obviously there's a huge desire to dogfood the tooling. Plus, by joining the company, you are buying into the hype and the vision. It would be more surprising if they weren't using their own tools the whole time. If you can't even sell to yourself...
Commercially driven R&D labs have been around for a long time. Much of research and development has never followed the "scientific method". There's nothing wrong with calling the current set of AI companies "labs" when referring to their research efforts. And their researchers are putting out plenty of academic papers, sharing plenty of research results, so it's not like there's any level of rigor that is lacking.
I don't know why you're trying to suggest some kind of restriction on the word "lab", or based on what. But calling them "labs" is perfectly normal, conventional, and justified terminology.
The limiting factor at work isn't writing code anymore. It's deciding what to build and catching when things go sideways.
We've been running agent workflows for a while now. The pattern that works: treat agents like junior team members. Clear scope, explicit success criteria, checkpoints to review output. The skills that matter are the same ones that make someone a good manager of people.
pglevy is right that many managers aren't good at this. But that's always been true. The difference now is that the feedback loop is faster. Bad delegation to an agent fails in minutes, not weeks. You learn quickly whether your instructions were clear.
The uncomfortable part: if your value was being the person who could grind through tedious work, that's no longer a moat. Orchestration and judgment are what's left.
I've been saying it this whole time; it's not the engineers who need to be concerned with being replaced - it's anyone involved in the busywork cycle. This includes those who do busywork (grinding through tedium) and those who create it (MBAs, without apologies to the author).
Here's the thing - that feedback loop isn't a magic lamp. Actually understanding why an agent is failing (when it does) takes knowledge of the problem space. Actually guiding that feedback loop so it optimally handles tasks - segmenting work and composing agentic cores to focus on the right things with the right priority of decision making - that's something you need to be curious about the internals for. Engineering, basically.
One thing I've seen in using these models to create code is that they're myopic and shortsighted - they do whatever it takes to fix the problem right in front of them when asked. This causes a cascading failure mode where the code is a patchwork of one-off fixes and hardcoded solutions for problems that not only recur, they get exponentially worse as they compound. You'd only know this if you could spot it when the model says something like "I see the problem, this server configuration is blocking port 80 and that's blocking my test probes. Let me open that port in the firewall".
This is verifiably false.
You still need to do most of the grunt work, verifying and organizing the code. it's just you're not editing the code directly. Speed of typing out code is hardly the bottle neck.
The bottleneck is visualizing it and then coming up with a way to figure out bugs or add features.
I've tried a bunch of agents, none of them can reasonably conduct a good architectural change in a medium size codebase.
> The skills that matter are the same ones that make someone a good manager of people.
I disagree. Part of being a good manager of (junior) people is teaching them soft skills in addition to technical skills -- how to ask for help and do their own research, and how to build their own skills autonomously, how to think about requirements creatively, etc.
Clear specifications and validating output is only a part of good people management, but is 100% of good agent management.
It’s teaching them in the first place. You can’t teach an LLM. Writing a heap of AGENTS.md is not teaching. LLMs take it as information, but they don’t learn from it in any non-superficial sense.
With https://code.claude.com/docs/en/skills you kinda can teach new things. And also, I have little doubt Anthropic reads these and future AIs might get trained on the most popular recommendations.
Yes, it's a crutch. But maybe the whole NNs that can code and we don't really know why is too.
>You can’t teach an LLM.
Actually you can. Training data, then rhe way you describe the task, goals, checkpoints, etc
> The uncomfortable part: if your value was being the person who could grind through tedious work, that's no longer a moat. Orchestration and judgment are what's left.
What kind of work do you think people who deal with LLMs everyday are doing? LLMs could maybe take something 60% of the way there. The remaining 40% is horrible tedious work that someone needs to grind through.
Automating part of the grind means that the remaining grind is more fun. You get more payoffs for less work.
> The limiting factor at work isn't writing code anymore
Where are yall working that "writing code" was ever the slow part of process
"writing code" was never the limiting factor and if it was you shouldn't be a developer
>> The limiting factor at work isn't writing code anymore. It's deciding what to build and catching when things go sideways.
Actually I disagree. I've been experimenting with AI a lot, and the limiting factor is marketing. You can build things as fast as you want, but without a reliable and repeatable (and at least somewhat automated) marketing system, you won't get far. This is especially because all marketing channels are flooded with user-generated content (UGS) that is generated by AI.
Recently, I came across Erich Fromm's distinction between "being mode" and "having mode" (AI really explained it the best, would paste it here but it's somewhat long). You're, in contrast with parent post, looking at it from the "having mode" - how to sell the "product" to someone.
But you can also think what would you want to build (for yourself or someone you know), that would otherwise take a team of people. Coding what used to be a professional app can now be a short hobby project.
I played with Claude Code Pro only a short while, but I already believe the mode of production of SW will change to be more accessible to individuals (pro or amateur). It will be similar to death of music labels.
> deciding what to build and catching when things go sideways
I feel like this was always true. Business still moves at the speed of high-level decisions.
> The uncomfortable part: if your value was being the person who could grind through tedious work, that's no longer a moat.
Even when junior devs were copy-pasting from stackoverflow over a decade ago they still had to be accountable for what they did. AI is ultimately a search tool, not a solution builder. We will continue to need junior devs. All devs regardless of experience level still have to push back when requirements are missing or poorly defined. How is picking up this slack and needing to constantly follow up and hold people's hands not "grinding through tedious work"?
AI didn't change anything other than how you find code. I guess it's nice that less technical people can now find it using their plain english ramblings instead of needing to know better keywords? AI has arguably made these search results worse, the need for good docs and examples even more important, and we've all seen how vibecoding goes off the rails.
The best code is still the least you can get away with. The skill devs get paid for has always been making the best choices for the use case, and that's way harder than just "writing code".
Translation: Assimilate or die.
Patently shocked to find this on profile:
> I lead AI & Engineering at Boon AI (Startup building AI for Construction).
> And there was something else: most early startups need to pivot, changing direction as they learn more about what the market wants and what is technically possible. By lowering the costs of pivoting, it was much easier to explore the possibilities without being locked in or even explore multiple startups at once: you just tell the AI what you want.
In my experience so far, AI prototyping has been a powerful force for breaking analysis paralysis.
In the last 10 years of my career, the slow execution speed at different companies wasn't due to slow code writing. It was due to management excesses trying to drive consensus and de-risk ideas before the developers were even allowed to write the code. Let's circle back and drive consensus in a weekly meeting with the stakeholders to get alignment on the KPIs for the design doc that goes through the approval and sign off process first.
Developers would then read the ream and realize that perfection was expected from their output, too, so development processes grew to be long and careful to avoid accidents. I landed on a couple teams where even small changes required meetings to discuss it, multiple rounds of review, and a lot of grandstanding before we were allowed to proceed.
Then AI comes along and makes it cheap to prototype something. If it breaks or it's the wrong thing, nobody feels like they're in trouble because we all agree it was a prototype and the AI wrote it. We can cycle through prototypes faster because it's happening outside of this messy human reputation-review-grandstanding loop that has become the norm.
Instead of months of meetings, we can have an LLM generate a UI and a backend with fake data and say "This is what I want to build, and this is what it will do". It's a hundred times more efficient than trying to describe it to a dozen people in 1-hour timeslots in between all of their other meetings for 12 weeks in a row.
The dark side of this same coin is when teams try to rely on the AI to write the real code, too, and then blame the AI when something goes wrong. You have to draw a very clear line between AI-driven prototyping and developer-driven code that developers must own. I think this article misses the mark on that by framing everything as a decision to DIY or delegate to AI. The real AI-assisted successes I see have developers driving with AI as an assistant on the side, not the other way around. I could see how an MBA class could come to believe that AI is going to do the jobs instead of developers, though, as it's easy to look at these rapid LLM prototypes and think that production ready code is just a few prompts away.
> The dark side of this same coin is when teams try to rely on the AI to write the real code, too, and then blame the AI when something goes wrong. You have to draw a very clear line between AI-driven prototyping and developer-driven code that developers must own. I think this article misses the mark on that by framing everything as a decision to DIY or delegate to AI. The real AI-assisted successes I see have developers driving with AI as an assistant on the side, not the other way around. I could see how an MBA class could come to believe that AI is going to do the jobs instead of developers, though, as it's easy to look at these rapid LLM prototypes and think that production ready code is just a few prompts away.
This is what's missing in most teams. There's a bright line between throwaway almost fully vibe-coded, cursorily architected features on a product and designing a scalable production-ready product and building it. I don't need a mental model of how to build a prototype, I absolutely need one for something I'm putting in production that is expected to scale, and where failures are acceptable but failure modes need to be known.
Almost everyone misses this in going the whole AI hog, or in going the no-AI hog.
Once I build a good mental model of how my service should work and design it properly, all the scaffolding is much easier to outsource, and that's a speed up but I still own the code because I know what everything does and my changes to the product are well thought out. For throw-away prototypes its 5x this output because the hard part of actually thinking the problem through doesn't really matter its just about getting everyone to agree on one direction of output.
Most places I've worked, the "slow execution speed" wasn't because it took a long time to physically write the code, but it took a long time to get those other Analysis Paralysis things you mentioned: consensus among multiple ImportantPeople who all were expected to demonstrate "impact", agonizing over risks (perceived and real), begging VPs/leadership for their "buy-in", informing and receiving feedback from other vague "stakeholders" and so on. The software writing itself was never the bottleneck, and could be prototyped in 1/10th to 1/100th of the time it took to actually make the decision to write it.
> In my experience so far, AI prototyping has been a powerful force for breaking analysis paralysis.
So is an 8-ball.
an 8 boll doesn’t ship code .. which might be why you like the comparison kind sir
Neither does AI prototyping.
@layer8 .. yeah ... nobody claimed prototypes ship .. but i guess they just prevent teams like yours from shipping nothing.
... of cocaine.
I haven't had the analysis paralysis problem because I've always been quite decent at restructuring environments to avoid bureocracy (which can one of the most dangerous things for a project) but one thing I've observed is that If operations are not ZeroOps then whoever is stuck maintaining systems will suffer by not being able to deliver the "value adding cool features that drive careers".
Since shipping prototypes doesn't actually create value unless they're in some form of production environment to effect change, then either they work and are ZeroOps or they break and someone needs to operate on them and is accountable for them.
This means that at some point, your thesis of
"The dark side of this same coin is when teams try to rely on the AI to write the real code, too, and then blame the AI when something goes wrong" won't really work that way but whoever is accountable will get the blame and the operations.
The same principles for building software that we've always have apply more than ever to AI related things.
Easy to change, reusable, compostable, testable.
Prototypes need to be thrown away. Otherwise they're trace bullets and you don't want to have tech debt in your tracer bullets unless your approach is to throw it to someone else ans make it their problem.
-----
Creating a startup or any code from scratch in a way that you don't actually have to maintain and find out the consequences of your lack of sustainable approaches (tech debt/bad design/excessive cost) is easy. You hide the hardest part. It's easy to do things that in surface look good if you can't see how they will break.
The blog post is interesting but, unless I've missed something, it does gloss over the accountability aspect. If you can delegate accountability you don't worry about evals-first design, you can push harder on dates because you're not working backwards from the actual building and design and its blockers.
Evals (think promtpfoo) for evals-first design will be key for any builder who is accountable for the decisions of their agents (automation).
I need to turn it into a small blog post but the points of the talk https://alexhans.github.io/talks/airflow-summit/toward-a-sha...
- We can’t compare what we can’t measure
- Can I trust this to run on its own?
Are crucial to have a live system that makes critical decisions. If you don't, have this, you're just using the --yolo flag.
I thought I had a great startup idea. It was niche, but a solid global market. It was unique. There was a genuine pain point being solved. My MVP solved it. The pricing worked, the tiers were sound.
At least ChatGPT, Gemini and Claude told me it was. I did so many rounds of each one evaluating the other, trying to poke holes etc. Reviewing the idea and the "research", the reasoning. Plugging the gaps.
Then I started talking to real people about their problems in this space to see if this was one of them. Nope, not really. It kinda was, but not often enough to pay for a dedicated service, and not enough of a pain to move on from free workarounds.
Beware of AI reviewing AI. Always talk to real people to validate.
It’s weird when the failure modes of AI are similar.
I once solved a Leetcode problem kind of unorthodox and both ChatGPT and Gemini both said it was wrong in the same way. Then I asked both of them to give me a counter example and only Gemini was able to realize the counter example would have actually worked.
When thinking about automation people overindex on their current class biases. For 20 years we heard that robots were going to take over the “burger flipper” jobs. Why was it so easy to think that robots could replace fast food workers? Because they were the lowest rung on the career ladder, so it felt natural that they would be the first ones to get replaced.
Similarly, it’s easy to think that the lowly peons in the engineering world are going to get replaced and we’ll all be doing the job of directors and CEOs in the future, but that doesn’t really make sense to me.
Being able to whip your army of AI employees 3% better than your competitor doesn’t (usually) give any lasting advantage.
What does give an advantage is: specialized deep knowledge, building relationships and trust with users and customers, and having a good sense of design/ux/etc.
Like maybe that’s some of the job of a manager/director/CEO, but not anyone that I’ve worked with.
The "management as superpower" framing assumes people thoughtfully evaluate AI output. In practice, most users either review everything (slow, defeats the speed benefit) or review almost nothing (fast, but you're trusting the AI entirely). The MBAs who did well probably had domain expertise to spot wrong answers quickly, that's the actual superpower, not generic "management skill
> I think many people have the skills they need, or can learn them, in order to work with AI agents - they are management 101 skills.
I like his thinking but many professional managers are not good at management. So I'm not sure about the assumption that "many people" can easily pick this up.
And I think this will only shift to the higher levels of the management hierarchy in the future. For example, in the future we will have AI models that can one-shot an entire platform like Twitter. Then the question is less about how to handle a database and more about how to handle several AI generated companies!
While we're at the project manager level now, in the future we'll be at the CEO level. It's an interesting thing to think about.
>While we're at the project manager level now, in the future we'll be at the CEO level.
This is the kind of half baked thought that seems profound to a certain kind of tech-brained poster on HN, but upon further consideration makes absolutely zero sense.
thanks for your intellectual contribution to the HN community.
I think calling out ill thought out comments is a public service. Especially because many people who read these comment sections are not engineers.
I was being sarcastic; your comment was a low blow. You didn't say why you disagreed with it. Might wanna read HN guidelines before leaving comments here.
@dang
I'm not sure you can tag dang like that, but I don't think its against the rules either.
I've never understood this train of thought. When working in teams and for clients, people always have questions about what we have created. "Why did you choose to implement it like this?" "How does this work?" "Is X possible to do within our timeframe/budget?"
If you become just a manager, you don't have answers to these questions. You can just ask the AI agent for the answer, but at that point, what value are you actually providing to the whole process?
And what happens when, inevitably, the agent responds to your question with "You're absolutely right, I didn't consider that possibility! Let's redo the entire project to account for this?" How do you communicate that to your peers or clients?
If AI gets to be this sophisticated, what value would you bring to the table in these scenarios?
EVERY developer will own their own hyper niche SAAS?
> what value would you bring to the table in these scenarios?
I bring the table, AI brings the value.
So... nothing. Glad we're in agreement here. If AI can do all the things people hope/dream it can, there won't be any value in doing it on behalf of folks. I would argue that even some "AI provider" (if that could even be a thing given a sophisticated enough agent) would see diminishing returns as the tech inevitably distills into everyone having bespoke agents running locally and handling/organizing/managing everything (of whatever needs managing, who knows).
Basically I don't see how you can be an AI maximalist and a capitalist at the same time. They're contradictory, IMO.
what value do you bring to the table or to this discussion?
Oops, sorry I already brought my own!
"The value of Juicero is more than a glass of cold-pressed juice. Much more."
The moment we have true AGD (artificial general developer), we’ll also have AGI that can equally well serve as a CEO. Where humans sit then won’t be a question of intellectual skill differentiation among humans anymore.
I'd advise caution with this approach. One of the things I'm seeing a lot of people get wrong about AI is that they expect that it means they no longer need to understand the tools they're working with - "I can just focus on the business end". This seems true but it's not - it's actually more important to have a deep understanding of how the machine works because if the AI is doing things that you don't understand you run a severe risk of putting yourself in a very bad situation - insecure applications or servers, code with failure modes that are catastrophic edge cases you won't catch until they're a problem, data lossage / leakage.
If anything, managing the project, writing the spec, setting expectations and writing tests are things llms are incredibly well suited for. Getting their work 'correct' and not 'functional enough that you don't know the difference' is where they struggle.
one-shot doesn't mean what you think it means.
one-shot means you provide one full question/answer example (from the same distribution) in the context to LLM.
> more about how to handle several AI generated companies!
The cost of a model capable of running an entire company will be multiples of the market cap of the company it is capable of running.
"AI-generated company" as in the AI writes the A-Z of the code required to have a working platform like Twitter. Currently it can build some of the frontend or some of the backend, but not all. It's conceivable that in the future AI can handle the entire chain.
Also you're forgetting the decreasing cost of AI, as well as the fact that you can buy a $10k Mac Studio NOW and have it run 24/7 with some of the best models out there. Only costs would be the initial fixed cost and electric (250W at peak GPU usage).
>Also you're forgetting the decreasing cost of AI
AI is still being heavily subsidized. None of the major players have turned a profit, and they are all having to do 4D Chess levels of financing to afford the capex.
Even if AI subsidies go away, the Mac Studio scenario still holds.
No, no companies and no CEOs. Just a user. It's like StarTrek replicator. Food replication. No you are not a chef, not a restaurant manager, not agrifarm CEO but just a user that orders a meal. So yes you will need "skills" to specify the type of meal but nothing beyond that.
It’s hard to take this author seriously given there’s no way they reviewed the work their students did.
It's vibes all the way down now.
> I find it interesting to watch as some of the most well-known software developers at the major AI labs note how their jobs are changing from mostly programming to mostly management of AI agents.
"AI labs"
Can we stop this misleading language. They're doing product development. It's not a "laboratory" doing scientific research. There's no attempt at the scientific method. It's a software firm and these are software developers/project managers.
Which brings me to point 2. These guys are selling AI tooling. Obviously there's a huge desire to dogfood the tooling. Plus, by joining the company, you are buying into the hype and the vision. It would be more surprising if they weren't using their own tools the whole time. If you can't even sell to yourself...
Commercially driven R&D labs have been around for a long time. Much of research and development has never followed the "scientific method". There's nothing wrong with calling the current set of AI companies "labs" when referring to their research efforts. And their researchers are putting out plenty of academic papers, sharing plenty of research results, so it's not like there's any level of rigor that is lacking.
I don't know why you're trying to suggest some kind of restriction on the word "lab", or based on what. But calling them "labs" is perfectly normal, conventional, and justified terminology.
AI works best when you’re selling it (author fits in this category as well).
Reminds me of blockchains and blockchain advocates.
But of course - AI is exactly where all the crypto bros moved to since hype around crypto went down.
... and perhaps AI sucks when you’re fearful it collapses years of “experience” into a prompt
If only there were no conflict of interest - since you're directly invested in AI hype.
Invested lol