> and by the time you've stacked five or six the field of view is small enough that the model has to thread a needle through whatever assumptions you packed in along the way. Any time one of those assumptions is wrong, and at least one usually is, the output looks like a failure even though the model was doing exactly what you asked.
I disagree. The core issue doesn’t comes from the prompt author assumptions, it’s rather all the assumptions the agent is injecting, and they are very often wrong given it only has a narrow understanding of the problem you’re tackling. Models aren’t neutral at all, they have lots of biases from their training set and reinforcement loop. They do not try to find the correct design that fits all your criteria, instead some keywords and group of tokens will drive most of the output, and whatever is associated with those tokens in the model will find its way into the artifact.
A non programming example to illustrate, that I’ve seen recently. Let’s say you’re asking for an analysis of the ETF „Amundi MSCI Eu ESG Brd Transition UCITS ETF EUR A“, here the agent sees ESG and will inject a whole set of assumptions that are completely irrelevant. You end up with half the report being about ESG investment instead of the ETF itself. It’s exactly the same for software development (with the additional issue that if you „let it cook“ the agent will cheat its tests and take shortcuts that are completely against whatever you asked).
That being said I agree that first loading the project content into the context with minimal influence from the prompt is very helpful. Though there are other issues, such as the fact that it will very often do a very shallow reading of the project files, will derive conclusions but not validate them, will only target some arbitrary files and not others, etc
Whenever possible, I ask leading questions rather than telling it explicitly what to do. I give it a problem and ask it what changes would be needed and if there are any complications. If I don't like the plan it comes up with, then I point out things and ask other questions. Make the agent do its own "thinking."
What exactly is the hard part they’re describing? Taste? Picking a stack? Noticing drift?
Seems like the “harder and vaguer thing” would be explaining to someone why that’s even necessary, or why your particular taste or stack of choice are any better than what the AI would choose itself.
Kinda silly when the whole point of the piece is “don’t invent fake things to justify your work”.
very often I find people who complain AI is not good yet for their coding projects
mostly because they try it looking for a flaw to discard it in a confirmation bias/change resistance kind of way as soon as possible
had to put our team to use it obligatorily for some weeks, after that each one shared a lot of skills, workflows, plugins and really got into it as an accelerator to complete their ideas
That's super uncharitable. I have never once seen someone decide in advance it isn't good and look for excuses to ignore it. People are genuinely trying it, and finding out it actually isn't good.
> and by the time you've stacked five or six the field of view is small enough that the model has to thread a needle through whatever assumptions you packed in along the way. Any time one of those assumptions is wrong, and at least one usually is, the output looks like a failure even though the model was doing exactly what you asked.
I disagree. The core issue doesn’t comes from the prompt author assumptions, it’s rather all the assumptions the agent is injecting, and they are very often wrong given it only has a narrow understanding of the problem you’re tackling. Models aren’t neutral at all, they have lots of biases from their training set and reinforcement loop. They do not try to find the correct design that fits all your criteria, instead some keywords and group of tokens will drive most of the output, and whatever is associated with those tokens in the model will find its way into the artifact.
A non programming example to illustrate, that I’ve seen recently. Let’s say you’re asking for an analysis of the ETF „Amundi MSCI Eu ESG Brd Transition UCITS ETF EUR A“, here the agent sees ESG and will inject a whole set of assumptions that are completely irrelevant. You end up with half the report being about ESG investment instead of the ETF itself. It’s exactly the same for software development (with the additional issue that if you „let it cook“ the agent will cheat its tests and take shortcuts that are completely against whatever you asked).
That being said I agree that first loading the project content into the context with minimal influence from the prompt is very helpful. Though there are other issues, such as the fact that it will very often do a very shallow reading of the project files, will derive conclusions but not validate them, will only target some arbitrary files and not others, etc
Whenever possible, I ask leading questions rather than telling it explicitly what to do. I give it a problem and ask it what changes would be needed and if there are any complications. If I don't like the plan it comes up with, then I point out things and ask other questions. Make the agent do its own "thinking."
What exactly is the hard part they’re describing? Taste? Picking a stack? Noticing drift?
Seems like the “harder and vaguer thing” would be explaining to someone why that’s even necessary, or why your particular taste or stack of choice are any better than what the AI would choose itself.
Kinda silly when the whole point of the piece is “don’t invent fake things to justify your work”.
very often I find people who complain AI is not good yet for their coding projects
mostly because they try it looking for a flaw to discard it in a confirmation bias/change resistance kind of way as soon as possible
had to put our team to use it obligatorily for some weeks, after that each one shared a lot of skills, workflows, plugins and really got into it as an accelerator to complete their ideas
This is not even grammaring.
Indians will do everything with AI except fix their English
That's super uncharitable. I have never once seen someone decide in advance it isn't good and look for excuses to ignore it. People are genuinely trying it, and finding out it actually isn't good.
I have, so who's right?