On the one hand this is exactly the right solution to prevent lethal trifecta exfiltration attacks.
The existence of lockdown mode does however imply that ChatGPT, in its default settings, does not provide robust protection against sufficiently determined data exfiltration attacks!
So we still don't have a reliable way to separate instructions from data when talking to an LLM, a problem that humans learned how to solve decades ago in areas like SQL and memory safety. But hey, we have these hopefully-not-leaky containers, which are probably implemented with just more system prompts.
How long until somebody figures out how to trick Codex into disabling Lockdown Mode for you?
We can seperate them but the $ value of an agent that does is much lower than one that doesn't.
As a pre LLM analogy imagine working at a bank with a whitelist firewall. You need to install a package but requires an IT ticket. Safer but slooooower.
Now not saying what the answer here is but that is the issue.
The answer may be more like industries that get safer through lessons (like aviation) rather than go for 100% safety out of the gate. Because both fast travel and AI agents are insanely useful.
what? Aviation safety is not designed to get safer through lessons? They literally try to ensure it is 100% safe out of the gate. The accidents that happen are usually statistical outliers and lead to loss of life.
That's what it means when they say aviation regulations are written in blood. Not that they just fling planes into the sky and be like "boy i hope we learn some new regulations from this". The number of airplane crashes would be astronomically larger if the 100% safety part was not embedded into the design process.
The help doc explicitly carves out Codex: "Lockdown Mode does not affect network access in Codex." The mode limits outbound requests in chat to block prompt injection exfiltration, but Codex network access is a separate setting. An enterprise team that turns on Lockdown Mode while using Codex against internal repos still has an open outbound path this mode doesn't cover.
On the one hand this is exactly the right solution to prevent lethal trifecta exfiltration attacks.
The existence of lockdown mode does however imply that ChatGPT, in its default settings, does not provide robust protection against sufficiently determined data exfiltration attacks!
Probably influenced by Apple's feature with the same name: https://support.apple.com/en-us/105120
I imagine that enterprise companies will be quite interested in this.
https://x.com/sama/status/1891533802779910471
i can definitely feel the agi now
Congratulations, you are a high taste tester!
So we still don't have a reliable way to separate instructions from data when talking to an LLM, a problem that humans learned how to solve decades ago in areas like SQL and memory safety. But hey, we have these hopefully-not-leaky containers, which are probably implemented with just more system prompts.
How long until somebody figures out how to trick Codex into disabling Lockdown Mode for you?
> So we still don't have a reliable way to separate instructions from data when talking to an LLM
Humans also do not know how to do this reliably, which is why phishing is still a thing and always will be.
We can seperate them but the $ value of an agent that does is much lower than one that doesn't.
As a pre LLM analogy imagine working at a bank with a whitelist firewall. You need to install a package but requires an IT ticket. Safer but slooooower.
Now not saying what the answer here is but that is the issue.
The answer may be more like industries that get safer through lessons (like aviation) rather than go for 100% safety out of the gate. Because both fast travel and AI agents are insanely useful.
what? Aviation safety is not designed to get safer through lessons? They literally try to ensure it is 100% safe out of the gate. The accidents that happen are usually statistical outliers and lead to loss of life.
That's what it means when they say aviation regulations are written in blood. Not that they just fling planes into the sky and be like "boy i hope we learn some new regulations from this". The number of airplane crashes would be astronomically larger if the 100% safety part was not embedded into the design process.
I think we agree? Unless my reading comp is off today.
The help doc explicitly carves out Codex: "Lockdown Mode does not affect network access in Codex." The mode limits outbound requests in chat to block prompt injection exfiltration, but Codex network access is a separate setting. An enterprise team that turns on Lockdown Mode while using Codex against internal repos still has an open outbound path this mode doesn't cover.