This approach seems useful for validating certain kinds of skills, but I worry that it provides a false sense of security. It is a bit like antivirus software. It might be better than nothing, but it is hard to know how much better.
Skills are ultimately just prompts, and agents execute code based on what is in them. If agents running skills can write code, execute commands, and reach the internet, it is virtually impossible to prove they are trustworthy.
When we download programs, we trust that the companies who wrote them did not add malicious code. We do have some ways of detecting malicious code, but software distribution is still mostly a trust-based system.
My recommendation is not to run skills from any source you would not download and execute code from.
It seems redundant as well, if it were complementary, like LLM reviewing code or code verifying LLM, then that's defense in depth.
But LLM reviewing LLM? I think if the review LLM catches it, then the executing LLM would refuse to run it, and if the prompt fools the executing LLM, it will probably fool the reviewing LLM.
Also it looks very silly? Like I know it sounds like a joke, but optics matter, imagine you are getting paid a salary in tender money to feed your family, would you really want to get caught with this anywhere in the chain at all? Regardless of whether it contributed to the vuln, or just failed to catch it, will you defend your role in a company with this? Unless you are deep into the AI is a god/gold mine, it sounds like buffoonery.
This approach seems useful for validating certain kinds of skills, but I worry that it provides a false sense of security. It is a bit like antivirus software. It might be better than nothing, but it is hard to know how much better.
Skills are ultimately just prompts, and agents execute code based on what is in them. If agents running skills can write code, execute commands, and reach the internet, it is virtually impossible to prove they are trustworthy.
When we download programs, we trust that the companies who wrote them did not add malicious code. We do have some ways of detecting malicious code, but software distribution is still mostly a trust-based system.
My recommendation is not to run skills from any source you would not download and execute code from.
It seems redundant as well, if it were complementary, like LLM reviewing code or code verifying LLM, then that's defense in depth.
But LLM reviewing LLM? I think if the review LLM catches it, then the executing LLM would refuse to run it, and if the prompt fools the executing LLM, it will probably fool the reviewing LLM.
Also it looks very silly? Like I know it sounds like a joke, but optics matter, imagine you are getting paid a salary in tender money to feed your family, would you really want to get caught with this anywhere in the chain at all? Regardless of whether it contributed to the vuln, or just failed to catch it, will you defend your role in a company with this? Unless you are deep into the AI is a god/gold mine, it sounds like buffoonery.