Astro - Hacker News

6 comments

bobbiechen 2 hours ago

If I understand correctly, threat model here seems to be to protect against accidental issues that would impact performance, but doesn't cover malicious actor.
For example, Sketchy Provider tells you they are running the latest and greatest, but actually is knowingly running some cheaper (and worse) model and pocketing the difference. These tests wouldn't help since Sketchy Provider could detect when they're being tested and do the right thing (like the Volkswagen emissions scandal). Right?
[-]
- gpm an hour ago
  
  Yes and no.
  For a truly malicious actor, you're right. But it shifts it from "well we aren't obviously committing fraud by quantizing this model and not telling people" to "we're deliberately committing fraud by verifying our deployment with one model and then serving customer requests with another".
  I suspect there's a lot of semi-malicious actors who are only happy to do the former.
- j-bos 2 hours ago
  
  Seems like a great challenge for all these systems, see fromtier labs serving quants when under hesvy load.
OsamaJaber 2 hours ago

Good to see this exist. Inference providers quietly swap quant levels. Most users never check. A standard verifier from the model maker is the right move, would love to see other labs ship the same
seism 2 hours ago

A test that runs for 15 hours on a high powered rig is going to be hard to reproduce or scale. But I think this addresses a widespread concern, which affects all kinds of cloud services. What you ping is not necessarily what you get.
curioussquirrel 2 hours ago

After Anthropic, Moonshot is another model provider who restricts tweaking of sampling parameters. I do like the idea of the vendor verifier, though.