I'm curious who the ideal customer of this should be. If we're a startup with our own harness, are we a good fit? What would qualify us or disqualify us from being a good user?
I think startups are a great fit. Getting a really good agent out of the box lets you scale and give your customers value fast. All you need to think about is the business logic: system prompts, tools to give the agent, skills, etc. You won't need to spend time on building the infra layer, orchestration loops, memory, implementing automations, etc.
Yeah I know some of my team members have invested a lot of time in this. Could definitely be worth chatting with them on what improvements could be made here. We're starting to deprioritize our consumer facing agent harness, in favor of more infrastructure level improvements we are making.
Developers with customer-facing chat products are the ideal customer.
If a startup has a specific flow they want the agent to take and their traffic is bursty, then I'd recommend using a framework like Mastra and deploying onto a sandbox.
For long-running always on agents where it's important to learn the users preferences overtime, our approach is the highest ROI.
Interesting. We definitely have long-running agents where certain preferences are key. However, some of the preferences are likely going to be shared universally across our customers. Is there some way of triaging this feedback into permanent improvements in agent performance?
But isn’t that the same as using Claude agent sdk minus maybe the memory features? What I mean to say is that you could pick the latest one and switch when another better one rolls out?
We’re using Claude agent sdk right now to rollout an internal agent factory. We haven’t hit the memory issue yet but I do use Hermes as a personal agent and can see where it fits you.
Good question. There a few differences between our approach and shipping an agent with the Claude agent sdk.
1. Our approach has cron-based or trigger-based automations built-in. Building automations with claude agent sdk requires setting up separate infrastructure.
2. Our approach has self-learning built-in. Building a feature like "dreaming" https://docs.openclaw.ai/concepts/dreaming with claude agent sdk also requires setting up separate infrastructure.
3. Our approach decouples the harness and the compute, which lets developers enforce a stricter security boundary, while claude agent sdk ships with the harness, shell, and filesystem in one process https://platform.claude.com/cookbook/claude-agent-sdk-07-hos....
4. Our approach does not vendor lock developers.
You could pick the latest harness and then switch when another better one rolls out. Our bet is that a developer's time is better spent speaking to their customers than switching harnesses.
If you re-use the Hermes agent, what are the cost and security implications? One Docker container per-customer sounds like it would be really expensive. Are they started on-demand, or run 24/7? What keeps users from using the agents for general purpose tasks, protects against prompt-injection, etc?
> What keeps users from using the agents for general purpose tasks, protects against prompt-injection, etc?
Users define their agent with a system prompt, tool definitions, and skills (which separate a media generation agent from a people search agent). We use Openrouter which has a prompt injection detection feature: https://openrouter.ai/docs/guides/features/guardrails/prompt....
The most valuable pieces of information an AI agent startup can gather is access to their customer's proprietary data and knowledge of their customers preferences (memory + self-learning).
Even as the cost of writing code goes to zero, those two pieces of information are non-commodities.
Thanks for the feedback. The main idea is that today to built a best-in-class agent, developers build the agent loop, session management, tools, memory, skills, automations (cron + trigger-based), sandboxed deployment, and self-learning.
By providing Hermes with a system prompt, custom tools, and skills, developers get the agent loop, session management, automations, sandboxed deployment, and self-learning for free.
I'm curious who the ideal customer of this should be. If we're a startup with our own harness, are we a good fit? What would qualify us or disqualify us from being a good user?
I think startups are a great fit. Getting a really good agent out of the box lets you scale and give your customers value fast. All you need to think about is the business logic: system prompts, tools to give the agent, skills, etc. You won't need to spend time on building the infra layer, orchestration loops, memory, implementing automations, etc.
Yeah I know some of my team members have invested a lot of time in this. Could definitely be worth chatting with them on what improvements could be made here. We're starting to deprioritize our consumer facing agent harness, in favor of more infrastructure level improvements we are making.
Developers with customer-facing chat products are the ideal customer.
If a startup has a specific flow they want the agent to take and their traffic is bursty, then I'd recommend using a framework like Mastra and deploying onto a sandbox.
For long-running always on agents where it's important to learn the users preferences overtime, our approach is the highest ROI.
Interesting. We definitely have long-running agents where certain preferences are key. However, some of the preferences are likely going to be shared universally across our customers. Is there some way of triaging this feedback into permanent improvements in agent performance?
But isn’t that the same as using Claude agent sdk minus maybe the memory features? What I mean to say is that you could pick the latest one and switch when another better one rolls out?
We’re using Claude agent sdk right now to rollout an internal agent factory. We haven’t hit the memory issue yet but I do use Hermes as a personal agent and can see where it fits you.
Good question. There a few differences between our approach and shipping an agent with the Claude agent sdk.
1. Our approach has cron-based or trigger-based automations built-in. Building automations with claude agent sdk requires setting up separate infrastructure.
2. Our approach has self-learning built-in. Building a feature like "dreaming" https://docs.openclaw.ai/concepts/dreaming with claude agent sdk also requires setting up separate infrastructure.
3. Our approach decouples the harness and the compute, which lets developers enforce a stricter security boundary, while claude agent sdk ships with the harness, shell, and filesystem in one process https://platform.claude.com/cookbook/claude-agent-sdk-07-hos....
4. Our approach does not vendor lock developers.
You could pick the latest harness and then switch when another better one rolls out. Our bet is that a developer's time is better spent speaking to their customers than switching harnesses.
If you re-use the Hermes agent, what are the cost and security implications? One Docker container per-customer sounds like it would be really expensive. Are they started on-demand, or run 24/7? What keeps users from using the agents for general purpose tasks, protects against prompt-injection, etc?
> what are the cost and security implications?
Cost is the token usage and container uptime.
> One Docker container per-customer sounds like it would be really expensive.
The advantage is per-user memory and self-learning. For context, Claude Managed Agents uses one sandbox per session: https://platform.claude.com/docs/en/managed-agents/environme....
> Are they started on-demand, or run 24/7?
24/7 (best for customer-facing chat products).
> What keeps users from using the agents for general purpose tasks, protects against prompt-injection, etc?
Users define their agent with a system prompt, tool definitions, and skills (which separate a media generation agent from a people search agent). We use Openrouter which has a prompt injection detection feature: https://openrouter.ai/docs/guides/features/guardrails/prompt....
Interesting idea! Question:
> It is highly unlikely that an AI agent startup becomes wealthy by creating the best harness for a particular use case.
If it's not the harness, what do you think is the thing that will differentiate AI agent startups? Is it mainly data, or something else?
The most valuable pieces of information an AI agent startup can gather is access to their customer's proprietary data and knowledge of their customers preferences (memory + self-learning).
Even as the cost of writing code goes to zero, those two pieces of information are non-commodities.
I thought the entire industry is moving toward harness engineering? I read this twice and didn't fully understand what it was telling me.
Thanks for the feedback. The main idea is that today to built a best-in-class agent, developers build the agent loop, session management, tools, memory, skills, automations (cron + trigger-based), sandboxed deployment, and self-learning.
By providing Hermes with a system prompt, custom tools, and skills, developers get the agent loop, session management, automations, sandboxed deployment, and self-learning for free.
But effectively they’re deferring harness engineering onto another developer?? I don’t understand how this is different than any other library, ever
Yes by using us developers are deferring the harness engineering onto us, and they can spend time writing code for their business logic.
We are closer to infrastructure than a library or framework; we give developers a live agent they can chat with in a single API call.