I don't think I'm ready to trust very security sensitive functions to pure vibe-coded software, and that's what this seems to be? Certainly the README is authored by an LLM, and there's a gazillion empty commits and other weirdness that indicates no human is in the loop. It looks like a loop engineered this software.
Models have gotten good, but c'mon. Good idea, maybe even a good implementation, but I don't have confidence in it, and you've got to have confidence in a project that claims to provide security.
Also, even the best models still regularly write C security bugs. It doesn't make sense to have a model write C code when having it write in a memory safe language is only slightly more effort/cost.
How you type is a poor proxy for code quality. Code quality is a good proxy for code quality. Inspect the code, build a verification pipeline for it, use agents to explore the code and the architecture, see if you can unearth anything fowl.
I'm not judging based on how they type. I can't see how they type, they vibed the README.
And, it's not my monkey. You can inspect the code, build a verification pipeline for it, use agents to explore the architecture and see if you can unearth anything fowl.
My heuristic is to dismiss purely vibe-coded apps from people I don't know, particularly for security sensitive stuff. If the README is written by a human and is coherent and exhibits some kind of desire and competence to make good software on the part of the author, I'm more likely to trust they drove their agents with care.
Here's the thing: you can make good software with agents, if you exhibit good judgement and put yourself in the path as a gate on quality. Too many clues point at this being loop engineering. And, C for this task, given 100% agent authorship, gives me the ick. Seems like bad judgement or opting out of making judgement calls.
How big a video file is a poor proxy for the encode, quality is a good proxy. The problem is finding the actual quality of a video file is a hell of a lot more work and resources than using a proxy to see if doing so is a good use of our resources. See if you can go the extra mile you described for a few hours/dollars tonight and let us know what you find, it would be appreciated!
It's not "how you type", it's "whether any human so much as laid eyes on the code". I wouldn't automatically discard code from an LLM, but let's put the goalposts where they actually are.
It says on their Github profile that they are building some kind of nowhere detection product. Maybe in that context, a very strict syscall allowlist is useful or good?
> It is designed for CI pipelines, CTF jail challenges, and lightweight code evaluation
Looking at the list, it seems pretty good for that. What does a CI runner that just needs to run GCC or whatever really need?
Edit: no open does seem restrictive. Not that it's bad security (not my area of expertise), but how many useful programs use open that are just off limits here?
Who the F* runs a minimizer on friggin C sources? And it's inconsistent too.
Security-related code should be readable and auditable.
I don't think I'm ready to trust very security sensitive functions to pure vibe-coded software, and that's what this seems to be? Certainly the README is authored by an LLM, and there's a gazillion empty commits and other weirdness that indicates no human is in the loop. It looks like a loop engineered this software.
Models have gotten good, but c'mon. Good idea, maybe even a good implementation, but I don't have confidence in it, and you've got to have confidence in a project that claims to provide security.
Also, even the best models still regularly write C security bugs. It doesn't make sense to have a model write C code when having it write in a memory safe language is only slightly more effort/cost.
How you type is a poor proxy for code quality. Code quality is a good proxy for code quality. Inspect the code, build a verification pipeline for it, use agents to explore the code and the architecture, see if you can unearth anything fowl.
I'm not judging based on how they type. I can't see how they type, they vibed the README.
And, it's not my monkey. You can inspect the code, build a verification pipeline for it, use agents to explore the architecture and see if you can unearth anything fowl.
My heuristic is to dismiss purely vibe-coded apps from people I don't know, particularly for security sensitive stuff. If the README is written by a human and is coherent and exhibits some kind of desire and competence to make good software on the part of the author, I'm more likely to trust they drove their agents with care.
Here's the thing: you can make good software with agents, if you exhibit good judgement and put yourself in the path as a gate on quality. Too many clues point at this being loop engineering. And, C for this task, given 100% agent authorship, gives me the ick. Seems like bad judgement or opting out of making judgement calls.
How big a video file is a poor proxy for the encode, quality is a good proxy. The problem is finding the actual quality of a video file is a hell of a lot more work and resources than using a proxy to see if doing so is a good use of our resources. See if you can go the extra mile you described for a few hours/dollars tonight and let us know what you find, it would be appreciated!
It's not "how you type", it's "whether any human so much as laid eyes on the code". I wouldn't automatically discard code from an LLM, but let's put the goalposts where they actually are.
Setting aside that this seems to be pure slop, what’s with all the empty commits?
The seccomp-BPF rules seem almost unusably strict. What is this even designed to be used to run?
It says on their Github profile that they are building some kind of nowhere detection product. Maybe in that context, a very strict syscall allowlist is useful or good?
> It is designed for CI pipelines, CTF jail challenges, and lightweight code evaluation
Looking at the list, it seems pretty good for that. What does a CI runner that just needs to run GCC or whatever really need?
Edit: no open does seem restrictive. Not that it's bad security (not my area of expertise), but how many useful programs use open that are just off limits here?
allowing individual syscall is the sandbox standard today on BSDs and optin on linux. project have some issues but being too restrictive is not one