I guess they really do eat their own dogfood and vibe code their way through it without care for technical debt? In a way, it’s a good challenge, but it’s fairly painful to watch the current state of the project (which is about a year old now, so it should be in prime shape).
> is about a year old now, so it should be in prime shape
A 1yo project may be in good shape if written by just one dev, maybe a few. But if you have many devs, I can guarantee it will be messy and buggy. If anything, at 1yo it is probably still full of bugs because not enough time has elapsed for people to run into them.
It's only 510k LoC, at ~100 lines of code a day for a year, this code base would take 23 engineers a year to write. That's for 220 working days in somewhere civilized.
And I'm sure we all know that when working on a greenfield project you can produce a lot more LoC per day than maintaining a legacy one.
Given that vibe code is significantly more verbose, you're probably talking about ~15 engineers worth of code?
I know that's all silly numbers, but this is just attempting to give people some context here, this isn't a massive code base. I've not read a lot of it, so maybe it's better than the verbose code I see Claude put out sometimes.
Yes but my point was that they seem to explicitly not care about code quality and/or the insane amount of bloat, and seem to just want the LLM to be able to deal with it.
I've heard somewhere that they have roughly 100% code churn every few months, so yes, they unfortunately don't care about code quality. It's a shame, because it's still the best coding agent, in my experience.
Which makes for an interesting thought / discussion; code is written to be read by humans first, executed by computers second. What would code look like if it was written to be read by LLMs? The way they work now (or, how they're trained) is on human language and code, but there might be a style that's better for LLMs. Whatever metric of "better" you may use.
Just a thought experiment, I very much doubt I'm the first one to think of it. It's probably in the same line of "why doesn't an LLM just write assembly directly"
LLMs read and write human-code because humans have been reading and writing human-code. The sample size of assembly problems is, in my estimate, too small for LLMs to efficiently read and write it for common use cases.
I liken it to the problem of applying machine learning to hard video games (e.g. Starcraft). When trained to mimic human strategies, it can be extremely effective, but machine learning will not discover broadly effective strategies on a reasonable timescale.
If you convert "human strategies" to "human theory, programming languages, and design patterns", perhaps the point will be clear.
But: could the ouroboric cycle of LLM use decay the common strategies and design patterns we use into inexplicable blobs of assembly? Can LLMs improve at programming if humans do not advance the theory or invent new languages, patterns, etc?
> It's probably in the same line of "why doesn't an LLM just write assembly directly"
My suspicion is that the "language" part of LLMs means they tend to prefer languages which are closer to human languages than assembly and benefit from much of the same abstractions and tooling (hence the recent acquisition of bun and astral).
Kairos and auto-dream are more interesting than anything in the agent loop section. Memory consolidation between sessions is the actual unsolved problem. The rest is just plumbing tbh
There's this weird thing about AI generated content where it has the perfect presentation but conveys very little.
For example the whole animation on this website, what does it say beyond that you make a request to backend and get a response that may have some tool call?
That's fair. The site isn't meant to be a deep technical dive, it's more of a visual high-level guide of what I've curated while exploring the codebase while assisted by AI, 500k loc codebase is just too much to sift through in a short amount of time.
I doubt there is anything special about the transformer code the frontier labs use. The only thing proprietary in it are probably the infrastructure-specific optimizations for very large scale distributed training and some GPU kernel tricks. The real moat is the training data, especially the RLHF/finetuning data and verifiable reward environments, and the GPU clusters of course.
The open source models are quite close, and they'd probably be just as good with the equivalent amount of compute/data the frontier labs have access to.
However, I assume that usage data could be increasingly valuable as well. That will likely help the big commercial cloud models to maintain a head start for general use.
I mean, I get it: vibe-coded software deserves vibe-coded coverage. But I would at least appreciate it if the main part of it, the animation, went at a speed that at least makes it possible to follow along and didn't glitch out with elements randomly disappearing in Firefox...
It's on the front page because it looks really cool. You can complain about it being vibe coded, but it still looks good. If you ask Claude to allow the user to slow down the animation, it can do that quite easily, that's just not a problem caused by vibe coding. And I'm on FF and didn't notice anything glitching out.
Really nice visualisation of this, makes understanding the flow at a high levle pretty clear. Also the tool system and command catalog, particularly the gated ones are super interesting.
I think it's good that it's out there, and I wonder why Anthropic have been keeping it closed source; clearly they can't possibly think that the CC source code is a competitive advantage...?
Agents in general are easy to make, and trivial to make for yourself especially, and the result will be much better than what any of the big providers can make for you.
`pi` with whatever commands/extensions you want to make for yourself is better than CC if you really don't want to go through the trouble of making your own thing.
I feel the same way. Given it's AI-written, looking at the code isn't even worth it to me. I would rather read a blog post about how they develop it day to day.
Thanks, I'll use this for teaching next week (on what not to do). BashTool.ts :D But, in general, I guess it just shows yet again that the emperor has no clothes.
If it was 2020, it would be hard to imagine that after some hours/days you getting a visual representation of the leak with such detailed stats lol
How was this generated ? I'm quite sure "with ai/claude code" but what are the actual steps ?
Feel free to add this to Awesome Claude code. https://github.com/rosaboyle/awesome-cc-oss
I guess they really do eat their own dogfood and vibe code their way through it without care for technical debt? In a way, it’s a good challenge, but it’s fairly painful to watch the current state of the project (which is about a year old now, so it should be in prime shape).
> is about a year old now, so it should be in prime shape
A 1yo project may be in good shape if written by just one dev, maybe a few. But if you have many devs, I can guarantee it will be messy and buggy. If anything, at 1yo it is probably still full of bugs because not enough time has elapsed for people to run into them.
It's only 510k LoC, at ~100 lines of code a day for a year, this code base would take 23 engineers a year to write. That's for 220 working days in somewhere civilized.
And I'm sure we all know that when working on a greenfield project you can produce a lot more LoC per day than maintaining a legacy one.
Given that vibe code is significantly more verbose, you're probably talking about ~15 engineers worth of code?
I know that's all silly numbers, but this is just attempting to give people some context here, this isn't a massive code base. I've not read a lot of it, so maybe it's better than the verbose code I see Claude put out sometimes.
Boris Cherny, the creator of Claude Code said he uses CC to build CC.
Yes but my point was that they seem to explicitly not care about code quality and/or the insane amount of bloat, and seem to just want the LLM to be able to deal with it.
I've heard somewhere that they have roughly 100% code churn every few months, so yes, they unfortunately don't care about code quality. It's a shame, because it's still the best coding agent, in my experience.
Which makes for an interesting thought / discussion; code is written to be read by humans first, executed by computers second. What would code look like if it was written to be read by LLMs? The way they work now (or, how they're trained) is on human language and code, but there might be a style that's better for LLMs. Whatever metric of "better" you may use.
Just a thought experiment, I very much doubt I'm the first one to think of it. It's probably in the same line of "why doesn't an LLM just write assembly directly"
LLMs read and write human-code because humans have been reading and writing human-code. The sample size of assembly problems is, in my estimate, too small for LLMs to efficiently read and write it for common use cases.
I liken it to the problem of applying machine learning to hard video games (e.g. Starcraft). When trained to mimic human strategies, it can be extremely effective, but machine learning will not discover broadly effective strategies on a reasonable timescale.
If you convert "human strategies" to "human theory, programming languages, and design patterns", perhaps the point will be clear.
But: could the ouroboric cycle of LLM use decay the common strategies and design patterns we use into inexplicable blobs of assembly? Can LLMs improve at programming if humans do not advance the theory or invent new languages, patterns, etc?
> It's probably in the same line of "why doesn't an LLM just write assembly directly"
My suspicion is that the "language" part of LLMs means they tend to prefer languages which are closer to human languages than assembly and benefit from much of the same abstractions and tooling (hence the recent acquisition of bun and astral).
> also related: https://www.ccleaks.com
This deployment is temporarily paused
Same!
Okay those "hidden features" are amazing, especially the cross-session referencing. I hope we can look forward to that in the future
Also I definitely want a Claude Code spirit animal
It's live! If you're on the latest cc you can use /buddy now.
It's a ridiculous folly. I've already lost a well-constructed question because I accidentally tabbed into my pointless 'buddy'.
(Yes, I know I can turn it off. I have.)
I find Claude Code features fall into 2 categories, "hmmmm that could be actually useful" vs "there is more kool aid where that came from"
Ok! First prompt, obviously:
“Complete thyself.”
And I want an octopus. Who orchestrates octopuses.
Kairos and auto-dream are more interesting than anything in the agent loop section. Memory consolidation between sessions is the actual unsolved problem. The rest is just plumbing tbh
Projects like Beads help with memory consolidation by making it somewhat moot, since it stays "offline" and can be recollected at any moment.
There's this weird thing about AI generated content where it has the perfect presentation but conveys very little.
For example the whole animation on this website, what does it say beyond that you make a request to backend and get a response that may have some tool call?
Also it's just randomly incorrect in places. For instance, it lists "fox" as one of the "Buddy" species, but that's not in the code.
That's been corrected, I did another fact checking pass!
When you're picking most likely tokens, you get least surprising tokens, ones with least entropy and least information per token.
That's fair. The site isn't meant to be a deep technical dive, it's more of a visual high-level guide of what I've curated while exploring the codebase while assisted by AI, 500k loc codebase is just too much to sift through in a short amount of time.
Really Weird but then it's so easy spot AI text by this pattern
would be nice if the transformers code for one of these frontier LLM models got leaked, HN will have a field day with a reveal like that
I doubt there is anything special about the transformer code the frontier labs use. The only thing proprietary in it are probably the infrastructure-specific optimizations for very large scale distributed training and some GPU kernel tricks. The real moat is the training data, especially the RLHF/finetuning data and verifiable reward environments, and the GPU clusters of course.
The open source models are quite close, and they'd probably be just as good with the equivalent amount of compute/data the frontier labs have access to.
That’s what I‘m thinking as well.
However, I assume that usage data could be increasingly valuable as well. That will likely help the big commercial cloud models to maintain a head start for general use.
I mean, I get it: vibe-coded software deserves vibe-coded coverage. But I would at least appreciate it if the main part of it, the animation, went at a speed that at least makes it possible to follow along and didn't glitch out with elements randomly disappearing in Firefox...
How is this on the front page?
It's on the front page because it looks really cool. You can complain about it being vibe coded, but it still looks good. If you ask Claude to allow the user to slow down the animation, it can do that quite easily, that's just not a problem caused by vibe coding. And I'm on FF and didn't notice anything glitching out.
Really nice visualisation of this, makes understanding the flow at a high levle pretty clear. Also the tool system and command catalog, particularly the gated ones are super interesting.
So it does use ripgrep and not unix grep. [0] I knew it from some other commenters here on HN, but it's nice to see it in the source as well.
0 - https://github.com/zackautocracy/claude-code/blob/main/src/u...
I just stumbled on a fascinating replacement candidate while clicking around on embed models on hugging face: https://github.com/lightonai/next-plaid/tree/main/colgrep
it looks really interesting.
I hope /Buddy is ported across to OpenCode.
Nice site. I might suggest moving SendMessage to the Hidden Features as they don't appear to have implemented a ReadMessage or ListMessages tools.
Is it just me or do I not find the Claude Code application that fascinating?
I use it all day and love it. Don't get me wrong. But it's a terminal-based app that talks to an LLM and calls local functions. Ooookay…
I think it's good that it's out there, and I wonder why Anthropic have been keeping it closed source; clearly they can't possibly think that the CC source code is a competitive advantage...?
Agents in general are easy to make, and trivial to make for yourself especially, and the result will be much better than what any of the big providers can make for you.
`pi` with whatever commands/extensions you want to make for yourself is better than CC if you really don't want to go through the trouble of making your own thing.
why do you think agents you make yourself will be better for you? integration with tooling that you prefer? your local dev setup built in?
curious as i haven't gotten around to writing my own agent yet
That’s what every agent does. They are fundamentally simple.
But you can do a lot of interesting things on top of this. I highly recommend writing an agent and hooking it up to a local model.
I feel the same way. Given it's AI-written, looking at the code isn't even worth it to me. I would rather read a blog post about how they develop it day to day.
Clever architecture often can still beat clever programming.
I expect dozens more "research articles" that
- find nothing - still manage to fill entire lages - somehow have a similar structure - are boring as fuck
At least this one is 3/4, the previous one had BINGO.
Ccleaks is down?
cool Archaeologization Collection Output
How the hell is it 500k lines?
It is vibe coded.
Thanks, I'll use this for teaching next week (on what not to do). BashTool.ts :D But, in general, I guess it just shows yet again that the emperor has no clothes.
Are you not feeling the vibes?
In all seriousness. I think you‘re supposed to run these in some kind of sandbox.
> it just shows yet again that the emperor has no clothes
Which emperor, specifically?