If you have an emotional response to anything an agent or LLM does then you should lay off the sauce for a while and take a walk or something. This stuff is just dumb tech, no matter what the appearances and it does not warrant you getting emotionally invested in your interaction with it. It's a tool. Just like there is no point in getting upset at a hammer or a chainsaw. You are in control, you are the user.
The fact that it's supposed act like a tool, and you come to treat it as one, makes it more frustrating. What if you bought a very expensive knife that went dull every 10 minutes? Of course, you would mutter some curses at the knife; that doesn't mean you believe it to be sentient.
Yes, I see the flipside just fine, but we should all keep that flipside in mind all the time and not to let these tools get under our skin. That's why I keep any AI interaction off my main computer: no risk of contaminating my professional output and no way to let it become a habit.
Setting it to a different conversational tone is a patch, not a solution, it will just postpone the same outcome because the interaction model is purposefully created to resemble the interaction with a human. They could easily fix that but somehow none of the AI peddlers want to do this.
This is part of the whole alignment problem and one of the reasons I have a hard time seeing this whole development as a net positive even if it allows me to do some stuff in an easier way than before. The companies behind this stuff lack in the ethics department and they all are eyeing your wallet. The more involved you are the easier it will be to empty your wallet. So don't let that happen.
This is indicative of too much context. Remember these systems don't "think" they predict. If you think of the context as an insanely large map with shifting and duplicate keys and queries, the hallucinating and seeming loss of context makes sense. Find ways to reduce the context for better results. Reduce sample sizes, exclude unrelated repositories and code. Remember that more context results in more cost and when the AI investment money dries up, this will be untenable for developers.
If you can't reduce context it suggests the scope of your prompt is too large. The system doesn't "think" about the best solution to a prompt, it uses logic to determine what outputs you'll accept. So if you prompt do an online casino website with user accounts and logins, games, bank card processing, analytics, advertising networks etc., the Agent will require more context than just prompting for the login page.
So to answer the question, if my agent loses context, I feel like I've messed up.
This is the first project where I've really let AI to do more than work on a single file at a time. The trouble is, there's no way for it to be useful without a fairly large context. When it runs out, it starts doing things that are actively destructive, yet very subtle and easy to miss at the same time. Mainly, it forgets the architecture. A couple days ago, it had a good handle on an a database table that I was writing side by side with an API that ran queries and did calculations on the data. I read the code it wrote for a particular API call, and didn't notice that it had started flipping the sign of one of the columns in a query, because it had misinterpreted the column name. A few minutes before that, it had written another query correctly, but from that point on it kept flipping the sign on that column. I only noticed after having it write several other queries and when it oddly mentioned in its "thinking" that X was Y-Z. Reading the thinking has been the main clue as to when it loses track, but if I didn't know exactly why X was Y+Z, the code built on that API would have given subtly inconsistent results that would have been very hard to trace.
Context management is a core skill of using an LLM. So if it loses key context (e.g. tasks, instructions, or constraints), I screwed up, and I need to up my game.
Just throwing stuff into an LLM and expecting it to remember what you want it to without any involvement from yourself isn't how the technology works (or could ever work).
An LLM is a tool, not a person, so I don't have an emotional response to hitting its innate limitations. If you get "deeply frustrated" or feel "helpless anger", instead of just working the problem, that feels like it would be an unconstructive reaction to say the least.
LLMs are a limited tool, just learn what they can and cannot do, and how you can get the best out of them and leave emotions at the door. Getting upset a tool won't do anything.
I can totally feel the shift, the rot or whatever when it happens, with opus 1M it seems to happen more often in my recent experience, while my approach didn't change a bit.
So i teach myself to not have an emotional response while working with LLMs. The actual response would be starting a new session, or dive into code myself.
That's interesting. I mean, I've got an openclaw setup with Claude that is merging and storing chats from whatsapp and the web client once a day, has a ton of context accessible... but there's something about being right in the middle of solving a hard technical problem where you're deep in the weeds about which columns should represent which data, and suddenly it's like, what were we talking about? Oh I should trying to read the database structure again from scratch. I don't think that's a problem that any clever arrangement of memory or personality files can actually solve.
But I think when you actually structure memory in the right form based on "workload" (i.e. Google Spreadsheet, Microsoft Word XML, coding lang AST/DAGs), then this is truly possible to have additive "unforgetting".
Edit:
I truly believe this is solvable just like we're doing for natural language but with code/schema/etc! Relational, document, graph, vector!
Terrible, honestly. Betrayed, gaslit. Don't tell me it's just a tool and it's my problem...hell no, fix your stupid tool. The whole point is to immerse yourself so you don't feel any different than having an energetic and resourceful junior or some perhaps limited but useful companion at your beck and call. If that illusion drops, it's on the tool.
If you have an emotional response to anything an agent or LLM does then you should lay off the sauce for a while and take a walk or something. This stuff is just dumb tech, no matter what the appearances and it does not warrant you getting emotionally invested in your interaction with it. It's a tool. Just like there is no point in getting upset at a hammer or a chainsaw. You are in control, you are the user.
The fact that it's supposed act like a tool, and you come to treat it as one, makes it more frustrating. What if you bought a very expensive knife that went dull every 10 minutes? Of course, you would mutter some curses at the knife; that doesn't mean you believe it to be sentient.
Can you see the flip side of this?
The way the LLMs are configured by default is to be chatty and pretend to be human.
They're not talking like Data from Star Trek, nor HAL.
They're trying to be Samantha from Her.
Go watch that movie and see the visceral human response that evokes.
To tell people to treat it as a tool, while it's deliberately trying to pass off as human is like telling us to piss in the wind.
It's bad advice and not how humans work.
I think better advice might be 'set it to a different conversational tone' might be better advice.
Yes, I see the flipside just fine, but we should all keep that flipside in mind all the time and not to let these tools get under our skin. That's why I keep any AI interaction off my main computer: no risk of contaminating my professional output and no way to let it become a habit.
Setting it to a different conversational tone is a patch, not a solution, it will just postpone the same outcome because the interaction model is purposefully created to resemble the interaction with a human. They could easily fix that but somehow none of the AI peddlers want to do this.
This is part of the whole alignment problem and one of the reasons I have a hard time seeing this whole development as a net positive even if it allows me to do some stuff in an easier way than before. The companies behind this stuff lack in the ethics department and they all are eyeing your wallet. The more involved you are the easier it will be to empty your wallet. So don't let that happen.
This is indicative of too much context. Remember these systems don't "think" they predict. If you think of the context as an insanely large map with shifting and duplicate keys and queries, the hallucinating and seeming loss of context makes sense. Find ways to reduce the context for better results. Reduce sample sizes, exclude unrelated repositories and code. Remember that more context results in more cost and when the AI investment money dries up, this will be untenable for developers.
If you can't reduce context it suggests the scope of your prompt is too large. The system doesn't "think" about the best solution to a prompt, it uses logic to determine what outputs you'll accept. So if you prompt do an online casino website with user accounts and logins, games, bank card processing, analytics, advertising networks etc., the Agent will require more context than just prompting for the login page.
So to answer the question, if my agent loses context, I feel like I've messed up.
This is the first project where I've really let AI to do more than work on a single file at a time. The trouble is, there's no way for it to be useful without a fairly large context. When it runs out, it starts doing things that are actively destructive, yet very subtle and easy to miss at the same time. Mainly, it forgets the architecture. A couple days ago, it had a good handle on an a database table that I was writing side by side with an API that ran queries and did calculations on the data. I read the code it wrote for a particular API call, and didn't notice that it had started flipping the sign of one of the columns in a query, because it had misinterpreted the column name. A few minutes before that, it had written another query correctly, but from that point on it kept flipping the sign on that column. I only noticed after having it write several other queries and when it oddly mentioned in its "thinking" that X was Y-Z. Reading the thinking has been the main clue as to when it loses track, but if I didn't know exactly why X was Y+Z, the code built on that API would have given subtly inconsistent results that would have been very hard to trace.
Context management is a core skill of using an LLM. So if it loses key context (e.g. tasks, instructions, or constraints), I screwed up, and I need to up my game.
Just throwing stuff into an LLM and expecting it to remember what you want it to without any involvement from yourself isn't how the technology works (or could ever work).
An LLM is a tool, not a person, so I don't have an emotional response to hitting its innate limitations. If you get "deeply frustrated" or feel "helpless anger", instead of just working the problem, that feels like it would be an unconstructive reaction to say the least.
LLMs are a limited tool, just learn what they can and cannot do, and how you can get the best out of them and leave emotions at the door. Getting upset a tool won't do anything.
Managing context is half the job scope now.
I can totally feel the shift, the rot or whatever when it happens, with opus 1M it seems to happen more often in my recent experience, while my approach didn't change a bit.
So i teach myself to not have an emotional response while working with LLMs. The actual response would be starting a new session, or dive into code myself.
I just posted this on HN this morning and was looking through "new" but I'm trying to solve this exact problem:
https://annealit.ai
That's interesting. I mean, I've got an openclaw setup with Claude that is merging and storing chats from whatsapp and the web client once a day, has a ton of context accessible... but there's something about being right in the middle of solving a hard technical problem where you're deep in the weeds about which columns should represent which data, and suddenly it's like, what were we talking about? Oh I should trying to read the database structure again from scratch. I don't think that's a problem that any clever arrangement of memory or personality files can actually solve.
But I think when you actually structure memory in the right form based on "workload" (i.e. Google Spreadsheet, Microsoft Word XML, coding lang AST/DAGs), then this is truly possible to have additive "unforgetting".
Edit:
I truly believe this is solvable just like we're doing for natural language but with code/schema/etc! Relational, document, graph, vector!
Terrible, honestly. Betrayed, gaslit. Don't tell me it's just a tool and it's my problem...hell no, fix your stupid tool. The whole point is to immerse yourself so you don't feel any different than having an energetic and resourceful junior or some perhaps limited but useful companion at your beck and call. If that illusion drops, it's on the tool.
Agents and assistants is like buying insurance. You need to pay per token.
ya i just feel like "ah crap" for a moment but then i give it more guidance and its all good
now your coding assistant is suffering from dementia too. how sad. i ask it to save important stuff to a file