Astro - Hacker News

4 comments

buppermint 2 minutes ago

The paper title is a bit misleading. The tested detectors and models here are small and rather dated (Llama 3.1 8B and Gemini Flash 2.0 - these are basically in the level of a modern 1B model), and the actual paper says this only shows vulnerability in small model systems.
simonw 27 minutes ago

It concerns me that anyone with anything important to protect might trust what this paper calls "Injection detectors deployed to protect LLM agents" - Llama Guard and the like.
There are unlimited combinations of tokens that can be used to attack an LLM system. The idea that some kind of "detector" can catch them all just feels inherently absurd to me.
[-]
- swatcoder 17 minutes ago
  
  Contemporary tech culture successfully trained influential people to be beyond credulous.
  If you have somebody promising a feature and somebody saying that the feature is impossible or a time bomb for catastrophe, the default for most executives and many developers these days is to believe the person promising the feature. And then, to boot, you can trust that same executive or developer to shirk responsibility when things fail later with a "How could I have known?! [Now defunct company] said it would work!"
BarryMilo 28 minutes ago

This is an "uh oh" moment, isn't it?