Astro - Hacker News

38 comments

eithed 3 minutes ago

What I find fascinating that there is so little substance in this article about the quality of produced code and the medium. Is the code documented and tested? Is it understandable and extendable? Is it secure? What language, framework, database was used? Author mentions judgement and taste - well, is the code tasteful? Will the model rearchitecture the entire thing if I ask it to add new functionality, spending another 9.5h in tokens?
JumpCrisscross 18 minutes ago

Anecdote: I fed Fable some models I’ve been hand verifying (basically, I sketch out a scenario for Opus to model, it builds it, I ask it to show me the math, I correct it, we iterate like this, then I double check its code to make sure the math matches the model logic). Fable found almost every error I found, and then had some interesting suggestions for additional variables.
It also burned through my usage quota like a late-90s Hummer.
[-]
- cyanydeez 6 minutes ago
  
  now for the best question: whats your ROI here?
selfawareMammal 11 minutes ago

What are people working on that they see such a substantial difference between Mythos and Opus? I'd say I'm working with advanced stuff and more than often Deepseek is even more than enough. Why is everybody a genius in here?
[-]
- mervz 5 minutes ago
  
  We see the same thing when new laptops are announced and every employee all of a sudden needs to upgrade, despite the fact that 90% of people would be able to make do with a Macbook Neo.
thepasch 7 minutes ago

What it feels like to work with Fable:
> Switched to Opus 4.8: Fable 5 has safety measures that flag messages on most cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Send feedback or learn more.
gopalv 2 hours ago

> It worked for nine and a half hours.
> Again, it wasn’t perfect. As an expert, I was able to spot some errors and omissions (some as a result of the design I had asked for) that I had the AI correct
That's the bit that stuck out to me - that's longer than I would expect to work on a problem in a day or even expect to go back & fix the output of something that has a core reward loop of hours.
My customers are currently clamoring to push down my agent response times from 85 seconds down to below the 20s mark.
At the same time, it is very dissonant to see the industry heading towards hour+ long workflows with an agent.
[-]
- matneyx 2 hours ago
  
  In Claude's defense (and I cannot believe I'm defending it), I know no single dev who could create what it did (Concord), from a 19-page design document, in 9.5 working hours.
  We're gonna go back to the days where our bosses ask why we're just sitting around, but instead of saying "compiling," we'll just say, "waiting for Claude."
  [-]
  - giancarlostoro 14 minutes ago
    
    This. I get told things like "you can't build all that on your own?" I've had Claude poop out full feature web apps in under 30 minutes, to a spec. Was it perfect? No, but sometimes even in a simple setup phase you can burn 15 minutes to some obscure setup step that's failing. I cannot just code nonstop at 900WPM or whatever ridiculous speed, and poop out an entire full feature web app, with maybe a few bugs here or there. If you can, come show me, I'll gladly have you race against my Claude prompting capabilities.
    Will Claude's code be perfect in one shot? Probably not, will it get you 80 to 90% of the way there with your chosen design patterns in under a few hours? Absolutely.
  - neogodless 2 hours ago
    
    For the rare uninitiated:
    https://xkcd.com/303/
- giancarlostoro 15 minutes ago
  
  > At the same time, it is very dissonant to see the industry heading towards hour+ long workflows with an agent.
  At this point, pay me significantly more, and I'll do it.
- hedgehog an hour ago
  
  Work duration is also not that valuable of a measure, you're usually better off defining the process yourself in code and having that delegate chunks of work to the models. The only real issue there is that it's harder to take advantage of the providers' subscription discounts, but on the other hand it's easier to do your own model routing, and there's no way I've seen for the normal chatbots to maintain coherence on streams of work measured in days and weeks.
- cyanydeez 7 minutes ago
  
  I think we hit the sigmoid back when the QWEN models were released. By properly structuring my project, I can point it at any extension I want and get it going for 30 minutes to extend whatever. It can't effectively do 'god mode' on all the code, but being a mindful observer and code "professional" I don't need more than what a 128GB VRAM needs.
  I'm amazed we're so far into SOTA bloat that the chinese will kill once they start etching silicon with these models.
- PeterStuer 2 hours ago
  
  My Opus 4.8 regularly works for 10+minutes on a single non-trivial coding request.
  [-]
  - ASalazarMX 44 minutes ago
    
    Your Opus 4.8? Is it now usual to refer to LLMs like that?
    
    [-]
    
    giancarlostoro 5 minutes ago
    
    That's pretty tame, if you want to be disturbed check out r/MyBoyfriendIsAI
    
    wongarsu 15 minutes ago
    
    Isn't it common to refer to all software like that? "Let my look at my JIRA", "I can't find anything using my Outlook's search function", "My Powerpoint is acting up today", "My browser just crashed" are all sentences I might say during a normal work day
    
    [-]
    
    hypfer 9 minutes ago
    
    Depends on the demographic I think. And also tells you surprisingly much about how the brain of person uttering it works.
    There are people that almost feel physical pain if something is unnecessarily incorrect.
    + That if the mental model of something is accurate, it is actually _more_ work to say something that is incorrect than just saying the correct thing.
    
    calvinmorrison 3 minutes ago
    
    better than "The JIRA" , or "The Google" or "The Spotify"
    
    w4yai 19 minutes ago
    
    You don't have your Opus 4.8 ? I got mine yesterday !
theturtletalks 12 minutes ago

This is what he built:
https://isochronic-passage-chart.netlify.app/
Doesn’t work too well on mobile but looks interesting
recursivedoubts an hour ago

would it be possible for mythos to make the space bar scroll the pages on your website properly?
[-]
- mulr00ney 18 minutes ago
  
  Seems to be hijacked the video of some game they generated. :(
382hi 2 hours ago

I think Qwen 3.7-Plus is better at reasoning than Mythos, and I've used both for quite a while.
[-]
- giancarlostoro 4 minutes ago
  
  Would love to see samples of the kinds of prompts you use with both. I sometimes wonder if the specific wording is the secret sauce, I have very few issues with Opus / Claude, but when I try premier GPT models, I get weird output from what I've grown to expect with Claude.
asdK120 2 hours ago

Mollick runs the Generative AI Lab at Wharton, with all the corporate sponsors.
He is a professor but sadly also an AI shill. He should switch to advertising washing power.
[-]
- MostlyStable 2 hours ago
  
  So...no engagement with the substance? Not even to explain why it is that this is not a useful description or test of capabilities? Ok.
  [-]
  - dthread3 2 hours ago
    
    I would like to see it do something useful, like converting pytorch to golang.
    
    [-]
    
    cadamsdotcom an hour ago
    
    Why not get a plan from Anthropic and get that done yourself? Probably is going to cost you as much as a coffee.
    
    lijok an hour ago
    
    Hot damn - is that the floor of what you consider useful?
    
    fdsdfsdfzxczxc 2 hours ago
    
    This newfangled car thing is useless. It can't even properly shoe a horse.
- whyenot an hour ago
  
  Instead of attacking the author, please respond to the content of the article. That is the HN way, and it leads to more substantive and interesting discussions.
root_axis 2 hours ago

I just can't stand this type of fawning language.
the_doctah 2 hours ago

More Mythos Marketing.
[-]
- boringg 14 minutes ago
  
  The mythos of Mythos is marketing.
et-al 2 hours ago

[flagged]
[-]
- astrange an hour ago
  
  It is not a sponsored article and he writes one of these every time a new model releases. Why would a professor at Wharton need to write sponsored Substack articles.
- 0x1ceb00da 2 hours ago
  
  "I don't care who the IRS sends I am not paying taxes!"