Astro - Hacker News

16 comments

naultic an hour ago

I'm working on something a little similar but mines more a dev tool vs process automation but I love where yours is headed. The biggest issue I've run into is handling retries with agents. My current solution is I have them set checkpoints so they can revert easily and when they can't make an edit or they can't get a test passing, they just restart from earlier state. Problem is this uses up lots of tokens on retries how did you handle this issue in your app?
[-]
- jawiggins an hour ago
  
  Generally I've found agents are capable of self correcting as long as they can bash up against a guardrail and see the errors. So in optio the agent is resumed and told to fix any CI failures or fix review feedback.
denysvitali 2 hours ago

FWIW, a "cheaper" version of this is triggering Claude via GitHub Actions and `@claude`ing your agents like that. If you run your CI on Kubernets (ARC), it sounds pretty much the same
MrDarcy 4 hours ago

Looks cool, congrats on the launch. Is there any sandbox isolation from the k8s platform layer? Wondering if this is suitable for multiple tenants or customers.
[-]
- jawiggins 4 hours ago
  
  Oh good question, I haven't thought deeply about this.
  Right now nothing special happens, so claude/codex can access their normal tools and make web calls. I suppose that also means they could figure out they're running in a k8s pod and do service discovery and start calling things.
  What kind of features would you be interested in seeing around this? Maybe a toggle to disable internet connections or other connections outside of the container?
abybaddi009 41 minutes ago

Does this support skills and MCP?
antihero 4 hours ago

And what stops it making total garbage that wrecks your codebase?
[-]
- jawiggins 4 hours ago
  
  There are a few things:
  a) you can create CI/build checks that run in github and the agents will make sure pass before it merges anything
  b) you can configure a review agent with any prompt you'd like to make sure any specific rules you have are followed
  c) you can disable all the auto-merge settings and review all the agent code yourself if you'd like.
  [-]
  - kristjansson 3 hours ago
    
    > to make sure
    you've really got to be careful with absolute language like this in reference to LLMs. A review agent provides no guarantees whatsoever, just shifts the distribution of acceptable responses, hopefully in a direction the user prefers.
    
    [-]
    
    jawiggins 3 hours ago
    
    Fair, it's something like a semantic enforcement rather than a hard one. I think current AI agents are good enough that if you tell it, "Review this PR and request changes anytime a user uses a variable name that is a color", it will do a pretty good job. But for complex things I can still see them falling short.
    
    SR2Z 13 minutes ago
    
    I mean, having unit tests and not allowing PRs in unless they all pass is pretty easy (or requiring human review to remove a test!).
    A software engineer takes a spec which "shifts the distribution of acceptable responses" for their output. If they're 100% accurate (snort), how good does an LLM have to be for you to accept its review as reasonable?
- upupupandaway 4 hours ago
  
  Ticket -> PR -> Deployment -> Incident
conception 3 hours ago

What’s the most complicated, finished project you’ve done with this?
[-]
- jawiggins 3 hours ago
  
  Recently I used to to finish up my re-implementation of curl/libcurl in rust (https://news.ycombinator.com/item?id=47490735). At first I started by trying to have a single claude code session run in an iterative loop, but eventually I found it was way to slow.
  I started tasking subagents for each remaining chunk of work, and then found I was really just repeating the need for a normal sprint tasking cycle but where subagents completed the tasks with the unit tests as exit criteria. So optio came to my mind, where I asked an agent to run the test suite, see what was failing, and make tickets for each group of remaining failures. Then I use optio to manage instances of agents working on and closing out each ticket.
hmokiguess 3 hours ago

the misaligned columns in the claude made ASCII diagrams on the README really throw me off, why not fix them?
| | | |
[-]
- jawiggins 3 hours ago
  
  Should be fixed now :)