It's pretty clear at this point that Mythos' capability to discover and exploit zero-day vulnerabilities at scale is but an incremental improvement over existing models like ChatGPT Plus/Pro.
Anthropic tries to create marketing hype around Mythos using two psychological tricks.
1. Put large numbers in the headlines.
"Mythos discovered 271 vulnerabilities in Firefox" makes the model seem extremely capable to the uninitiated.
But it's actually meaningless as a measure of capability _improvement_.
Anthropic gave away $100mil specifically as Mythos credits to these projects and companies (that's $2.5mil per project). Spending the same exorbitant amount of compute analyzing the same codebases in an older model like ChatGPT Plus would have turned up 260 of these vulnerabilities, or could even have turned up more than 271 ones.
No need to speculate, since this is exactly what we saw in the few code bases where we have such comparisons (like in the curl codebase). Supposedly weaker models, working with a much lower budget, turned up dozens of vulnerabilities. Mythos turned up only one, which ended up as a low severity CVE.
2. Do the whole "too dangerous to release" shtick. This is one of Dario Amodei's favorite moves. When he was vice president of research at OpenAI, he declared GPT-3 (which wasn't able to produce coherent text beyond 3-4 sentences at the time) too dangerous [1] as well.
Long story short, it's the ChatGPT 4.5 situation again: a company trained a model that's too slow and expensive, but not much more capable than what came before. It therefore requires these marketing stunts.
I couldn't agree more. I think the recent moves to partner with xAI and Amazon are proof that they desperately need more compute and are doing everything possible to get it.
It's an AI-written slop article, which is hugged to death by HN in any case.
It claims to be an evidence-based investigation, but basically invents the contents of the documents they supposedly investigated, such as the Anthropic Frontier Red Team writeup, from whole cloth.
I don't think deeper engagement with it would promote good discussion.
Don't bother with the slop lovers, these people are anti-human in their souls and willing to follow the most evil people on Earth to the depths of hell; for what? I have zero idea but it's sad to see.
I use emdashes all the time. They're correct punctuation as opposed to a minus sign. They're easy to type too: opt-shift-minus. If they were such a huge giveaway without ever being used by humans, models would be trained by now not to use them as much.
> It's pretty clear at this point that Mythos' capability to discover and exploit zero-day vulnerabilities at scale is but an incremental improvement over existing models like ChatGPT Plus/Pro.
I'm skeptical of AI takes by someone who thinks there's a model called chatgpt plus. Spend more time working with the current systems!
Brother, I don't care who writes the specs as long as they sign the checks on time. And yes, I do care about my work even if upstream is slop. In a relay race, you can lower your performance to weakest leg, or you can be the strongest leg. And maybe I just like to run.
It's somehow nice to see an old-school HN hug of death once in a while, it has become a rare sight since most links are now to big platforms or to websites behind Cloudflare.
My thinking is that if it really was super duper then Anthropic could charge eye watering amounts and have willing customers and set up expectations going forward that SOTA costs a lot to use.
That they don’t suggests that really it is only incrementally better than Opus 4.7 and that the market won’t bear a price increase that makes it economical to serve let alone profit from serving.
So the cynical me imagines execs sitting around the table and worrying that releasing it at anywhere close to break even would risk actually hurting the brand instead of setting them up as a premium company, and this at a time just before ipo when they can ill afford that rumour.
So they wonder what to do, and think playing national security card is the obvious way out. It’s incrementally better enough to find bugs that previous sota missed, it doesn’t get used widely so it’s cheap to serve and they get the good publicity without the economic scrutiny?
Making a loss selling to a small number of users using it in a limited way is entirely affordable. Making a loss selling it at scale is correspondingly unaffordable?
They announced the pricing when they released preview: $25/$125 per million input/output tokens. I have no doubt they're already selling it to select customers.
Article does not mention the other reason: in the interview with Dwarkesh, Amodei remarked about how other organizations are copying or training off Opus for their models.
By delaying allowing others to train off Mythos, they hold their SWE-Bench Pro head start longer so among other things, the USG can't but notice Anthropic's lead when they're deliberating on whether to further substantiate the "supply chain risk".
Precise motives are hard to work out as a general rule. Ultimately, it often comes down to a decision that decision makers like or don't like for a confluence of reasons.
High end AI is at its most useful when you use it to replace high end human labor. You can't buy 9000 cybersec specialists on demand, but you can buy more Mythos tokens.
Then we get into all the scaling curves. Such as: LLMs getting more capable per FLOP, per byte of weights, per byte of VRAM, etc. And: inference compute getting cheaper over time.
I see a lot of "should make the industry nervous", but when you try to dig into it? It's wishful thinking, every fucking time.
My posts* got to the first spot on hackernews couple of times. Never once it broke down like that. And why would it, it's just a bunch of html and css files served through (free) vercel (don't think it matters). I wonder what do people run their blogs these days, so they fail under the pressure so easily.
Yes, with the recent waves of AI scrapers I have noticed on my own WordPress websites that the DB seems to the weak point when under load. Cache plugins can help a lot with this.
The thought of this didn't even cross my mind until yesterday. I previously figured the hype was primarily around marketing, but after watching this Primagen video, I have the same suspicion.
The missing piece is the reminder that scarcity still exists.
Whether its actually scarcity or hype building or a bit of column a, bit of column b is TBD. Then again, the new models seem more expensive, they slashed the tokens thrown around in thinking, and put up limit speedbumps so it’s probably not all gaslighting about compute bottlenecks.
(I work at Anthropic) We have publicly stated[1] that our goal is to deploy Mythos-class models at scale when we have the requisite safeguards for offensive cyber risks in place. Mythos is a general frontier model, not a cyber-specific model so there are many reasons why we think our users will benefit from access (with the aforementioned safeguards in place) in due course. Compute has also not factored into our decision[2] to rollout the model in a limited fashion to defenders. We'll be sharing more soon on the first month or so of the project and rollout.
Multiple people who have already used Mythos or been given its reports on their software have publicly stated that its all hype and that it is not really finding any new critical bugs which other models cant.
Are there any publicly verifiable sources that Mythos is that much more intelligent than Opus, so to be considered much more dangerous (as it is presented in the public discourse by Anthropic)
It doesn't have to be _much more intelligent_ than Opus to be a risk. It doesn't even need to be _more intelligent_. It just needs to be _better at finding security problems_. Which could happen from just minor improvements in training data, or the harness, etc. Even a small improvement could shift it from finding very few new security holes, to reliably finding many at scale.
Weird take to claim "generally intelligent frontier" (whatever rhat means) and restrict availability based on "offensive" cyber security alone (how can this be handled at all compared to fixing software also remain to be seen) all while competitors but more importantly sw maintainers (eg curl) estimate that the capability in finding cybersecurity bugs is similar to what other modern models produce, and this has just significatively risen in the last months for everybody.
It's probably a little of both: dangerous and expensive. This article makes a good case that the cost is at least part of the reason.
I wish the article could have been a lot tighter and shorter. This is not earth shattering information that requires a New Yorker length piece of investigative journalism.
As far as my understanding goes. It is not a breakthrough model itself but finetuned model with right tools and skills. Fairly similiar to today's coding agents with difference that they are made for software engineering not cyber security.
It has considerably more parameters than most frontier models of today. Which gives it a lot more oomph per token.
Is it a "breakthrough" as in "something novel and unexpected"? No. Is it a "breakthrough" as in "something we know works, but made to work on a greater scale"? Very much so.
Yeah we have literally no examples of more intelligent beings accidentally or purposefully wiping out less intelligent beings. Any time such a scenario could have conceivably happened, the less intelligent beings were able to foresee the methods, mechanisms, and motivations of the more intelligent beings and were able to counteract it.
If we look at our human history, there are millions of examples where less intelligent beings destroyed highly advanced civilizations.
It was never about intelligence, but about willingness to destroy (willingness to defend is not enough). Babylon, Egypt, Persia, Greece, Rome, China, ... I won't mention current examples ...
This lengthy article by a self-described "AI enthusiast" muddies the waters. Yes, Anthropic has capacity constraints, which is why they rented Colossus from Musk despite the danger of being distilled.
The real reason is that the hype around Mythos has already gone quiet because it does not find more than other models. That is, nothing at all in most open source projects. If you hide the model, embarrassing statistics will not be posted.
I'd be tempted to offer this as a consultant service were I at Anthropic.
It feels like an AI tool that needs professionals to interface with it. Get some of those professionals, have them work with clients in a targeted way. It helps reduce the exposure the tool has to bad actors, and reduces the amount of resource usage that it will incur, because it's being used only by trained individuals.
Use what you learn from the experience to further refine its operation and make it less expensive to operate.
The "too dangerous to release" line was definitely a marketing stunt.
OpenAI already used the same playbook with GPT-2 in 2019, and some of the same people involved back then are now doing it again at Anthropic with Mythos.
Same safety-branding DNA, different company, and people are falling for it again.
ChatGPT literally tells people to kill themselves but apparently that’s not too dangerous and this is.
It’s bad enough that it’s a marketing stunt, totally agree with you. But in the face of what we have seen and how they act like it’s no big deal, it’s just gross.
It's pretty clear at this point that Mythos' capability to discover and exploit zero-day vulnerabilities at scale is but an incremental improvement over existing models like ChatGPT Plus/Pro.
Anthropic tries to create marketing hype around Mythos using two psychological tricks.
1. Put large numbers in the headlines.
"Mythos discovered 271 vulnerabilities in Firefox" makes the model seem extremely capable to the uninitiated.
But it's actually meaningless as a measure of capability _improvement_.
Anthropic gave away $100mil specifically as Mythos credits to these projects and companies (that's $2.5mil per project). Spending the same exorbitant amount of compute analyzing the same codebases in an older model like ChatGPT Plus would have turned up 260 of these vulnerabilities, or could even have turned up more than 271 ones.
No need to speculate, since this is exactly what we saw in the few code bases where we have such comparisons (like in the curl codebase). Supposedly weaker models, working with a much lower budget, turned up dozens of vulnerabilities. Mythos turned up only one, which ended up as a low severity CVE.
2. Do the whole "too dangerous to release" shtick. This is one of Dario Amodei's favorite moves. When he was vice president of research at OpenAI, he declared GPT-3 (which wasn't able to produce coherent text beyond 3-4 sentences at the time) too dangerous [1] as well.
Long story short, it's the ChatGPT 4.5 situation again: a company trained a model that's too slow and expensive, but not much more capable than what came before. It therefore requires these marketing stunts.
[1] https://www.itpro.com/technology/artificial-intelligence-ai/...
I couldn't agree more. I think the recent moves to partner with xAI and Amazon are proof that they desperately need more compute and are doing everything possible to get it.
You're not really responding to the piece at all.
It's an AI-written slop article, which is hugged to death by HN in any case.
It claims to be an evidence-based investigation, but basically invents the contents of the documents they supposedly investigated, such as the Anthropic Frontier Red Team writeup, from whole cloth.
I don't think deeper engagement with it would promote good discussion.
Don't bother with the slop lovers, these people are anti-human in their souls and willing to follow the most evil people on Earth to the depths of hell; for what? I have zero idea but it's sad to see.
So you say. I actually read the piece and didn't get AI vibes from it all, except for the graphics
there are 31 emdashes in that piece. the domain ends with _ai_
I use emdashes all the time. They're correct punctuation as opposed to a minus sign. They're easy to type too: opt-shift-minus. If they were such a huge giveaway without ever being used by humans, models would be trained by now not to use them as much.
The blog is about AI. So yeah the TLD is .ai
> It's pretty clear at this point that Mythos' capability to discover and exploit zero-day vulnerabilities at scale is but an incremental improvement over existing models like ChatGPT Plus/Pro.
I'm skeptical of AI takes by someone who thinks there's a model called chatgpt plus. Spend more time working with the current systems!
When your logo is AI, your illustrations are AI, and you profile pic is AI, I'm going to assume the text is AI too and won't read it.
Even if it wasn’t, I probably still wouldn’t have read the article, so not much difference.
I read it, and it is AI
I don't think it is. Just the (somewhat lame) graphics are.
Now imagine that your work specs are generated by an AI agent that the EM is using.
Do you still care about the work?
If my manager can't be bothered to do any actual work but expects me to, no. I'm quitting. Next question.
Brother, I don't care who writes the specs as long as they sign the checks on time. And yes, I do care about my work even if upstream is slop. In a relay race, you can lower your performance to weakest leg, or you can be the strongest leg. And maybe I just like to run.
And the domain is .ai
> Resource Limit Is Reached The website is temporarily unable to service your request as it exceeded resource limit. Please try again later.
I guess it was too dangerous to even read the article
https://archive.is/31PFC
Mythos took it down
archived: https://nonogra.ph/too-dangerous-to-release-or-just-too-expe...
The HN hug of death
It's somehow nice to see an old-school HN hug of death once in a while, it has become a rare sight since most links are now to big platforms or to websites behind Cloudflare.
My thinking is that if it really was super duper then Anthropic could charge eye watering amounts and have willing customers and set up expectations going forward that SOTA costs a lot to use.
That they don’t suggests that really it is only incrementally better than Opus 4.7 and that the market won’t bear a price increase that makes it economical to serve let alone profit from serving.
So the cynical me imagines execs sitting around the table and worrying that releasing it at anywhere close to break even would risk actually hurting the brand instead of setting them up as a premium company, and this at a time just before ipo when they can ill afford that rumour.
So they wonder what to do, and think playing national security card is the obvious way out. It’s incrementally better enough to find bugs that previous sota missed, it doesn’t get used widely so it’s cheap to serve and they get the good publicity without the economic scrutiny?
Making a loss selling to a small number of users using it in a limited way is entirely affordable. Making a loss selling it at scale is correspondingly unaffordable?
They announced the pricing when they released preview: $25/$125 per million input/output tokens. I have no doubt they're already selling it to select customers.
They are. Mythos Preview is not free.
They gave away $100M in credits specifically for Mythos.
Article does not mention the other reason: in the interview with Dwarkesh, Amodei remarked about how other organizations are copying or training off Opus for their models.
By delaying allowing others to train off Mythos, they hold their SWE-Bench Pro head start longer so among other things, the USG can't but notice Anthropic's lead when they're deliberating on whether to further substantiate the "supply chain risk".
Good point.
Precise motives are hard to work out as a general rule. Ultimately, it often comes down to a decision that decision makers like or don't like for a confluence of reasons.
Conclusion: both are true which makes sense. The KV cache scaling yields both the emergent power and requires the enormous capacity.
Which does sort of hint at a (power/profitability) ceiling on the LLM line of AI… That should make the industry nervous.
Does that follow at all?
High end AI is at its most useful when you use it to replace high end human labor. You can't buy 9000 cybersec specialists on demand, but you can buy more Mythos tokens.
Then we get into all the scaling curves. Such as: LLMs getting more capable per FLOP, per byte of weights, per byte of VRAM, etc. And: inference compute getting cheaper over time.
I see a lot of "should make the industry nervous", but when you try to dig into it? It's wishful thinking, every fucking time.
My posts* got to the first spot on hackernews couple of times. Never once it broke down like that. And why would it, it's just a bunch of html and css files served through (free) vercel (don't think it matters). I wonder what do people run their blogs these days, so they fail under the pressure so easily.
* https://news.ycombinator.com/from?site=yanist.com
It's WordPress, which is a great CMS but can quickly crumble under load when using cheap hosting.
With an external cache/CDN it should work perfectly fine.
There are also some caching plugins for wordpress, but most of them still hit the database on every request.
So I assume it's because it's not statically build, but requires a DB connection all the time?
Yes, with the recent waves of AI scrapers I have noticed on my own WordPress websites that the DB seems to the weak point when under load. Cache plugins can help a lot with this.
Cheap shared hosting will throttle sites which get too much traffic.
The thought of this didn't even cross my mind until yesterday. I previously figured the hype was primarily around marketing, but after watching this Primagen video, I have the same suspicion.
https://www.youtube.com/watch?v=zaGOKd4jqEk
Is it possible that curl just doesn't have any critical security vulnerabilities left?
It all sounds a bit too marketing-ey to me “we have this amazing model that is too good to release” but the goal is still AGI? Ok right.
What's incoherent about that?
The goal for anthropic is safe AGI. A) this model is dangerous in the hands of consumers. B)They do not want China to train on these models.
"Safe" for who?
For the ruling class. Cattle classes are not people in their eyes.
Drink that Anthropic coolade up!
The missing piece is the reminder that scarcity still exists.
Whether its actually scarcity or hype building or a bit of column a, bit of column b is TBD. Then again, the new models seem more expensive, they slashed the tokens thrown around in thinking, and put up limit speedbumps so it’s probably not all gaslighting about compute bottlenecks.
I found this an illuminating piece, though I don't think percentages needed to be assigned between "is it about cost" vs "is it about security"
It’s obvious that this is a campaign to pump their pending ipo. It may be too expensive, but it’s all about the ipo in my opinion.
(I work at Anthropic) We have publicly stated[1] that our goal is to deploy Mythos-class models at scale when we have the requisite safeguards for offensive cyber risks in place. Mythos is a general frontier model, not a cyber-specific model so there are many reasons why we think our users will benefit from access (with the aforementioned safeguards in place) in due course. Compute has also not factored into our decision[2] to rollout the model in a limited fashion to defenders. We'll be sharing more soon on the first month or so of the project and rollout.
[1] https://www.anthropic.com/glasswing#:~:text=deploy%20Mythos%...
[2] https://x.com/logangraham/status/2054613618168082935
Multiple people who have already used Mythos or been given its reports on their software have publicly stated that its all hype and that it is not really finding any new critical bugs which other models cant.
Are there any publicly verifiable sources that Mythos is that much more intelligent than Opus, so to be considered much more dangerous (as it is presented in the public discourse by Anthropic)
It doesn't have to be _much more intelligent_ than Opus to be a risk. It doesn't even need to be _more intelligent_. It just needs to be _better at finding security problems_. Which could happen from just minor improvements in training data, or the harness, etc. Even a small improvement could shift it from finding very few new security holes, to reliably finding many at scale.
Weird take to claim "generally intelligent frontier" (whatever rhat means) and restrict availability based on "offensive" cyber security alone (how can this be handled at all compared to fixing software also remain to be seen) all while competitors but more importantly sw maintainers (eg curl) estimate that the capability in finding cybersecurity bugs is similar to what other modern models produce, and this has just significatively risen in the last months for everybody.
It's probably a little of both: dangerous and expensive. This article makes a good case that the cost is at least part of the reason.
I wish the article could have been a lot tighter and shorter. This is not earth shattering information that requires a New Yorker length piece of investigative journalism.
This was nothing like investigative journalism; it's just LLM spew. It could have been written in a handful of paragraphs.
Also dangerous is expensive. If you cause damage you sometimes need to pay for it.
Opus Fast Mode is 30$/150$/M Input/Output cost. Mythos's pricing (from model card) is 25$/125$ Input/Output cost.
Based on this I doubt that Mythos pro is too dangerous to release or provides significantly more value.
As far as my understanding goes. It is not a breakthrough model itself but finetuned model with right tools and skills. Fairly similiar to today's coding agents with difference that they are made for software engineering not cyber security.
Mythos is the next point on the scaling curve.
It has considerably more parameters than most frontier models of today. Which gives it a lot more oomph per token.
Is it a "breakthrough" as in "something novel and unexpected"? No. Is it a "breakthrough" as in "something we know works, but made to work on a greater scale"? Very much so.
AI has always been dangerous, but not existentially dangerous.
Mythos is dangerous but it's not going to Skynet us.
Just the same as the military drone using some sort of OpenCV library and target prioritisation loop isn't going to turn evil on us.
Yeah we have literally no examples of more intelligent beings accidentally or purposefully wiping out less intelligent beings. Any time such a scenario could have conceivably happened, the less intelligent beings were able to foresee the methods, mechanisms, and motivations of the more intelligent beings and were able to counteract it.
I get the sarcasm, but what about Neanderthals versus Homo Sapiens?
You have a lot of faith in the chatbots.
If we look at our human history, there are millions of examples where less intelligent beings destroyed highly advanced civilizations.
It was never about intelligence, but about willingness to destroy (willingness to defend is not enough). Babylon, Egypt, Persia, Greece, Rome, China, ... I won't mention current examples ...
For marketing purposes it is always too dangerous, not saying it is safe
The real Mythos was the friends we made along the way.
This lengthy article by a self-described "AI enthusiast" muddies the waters. Yes, Anthropic has capacity constraints, which is why they rented Colossus from Musk despite the danger of being distilled.
The real reason is that the hype around Mythos has already gone quiet because it does not find more than other models. That is, nothing at all in most open source projects. If you hide the model, embarrassing statistics will not be posted.
You don't have to look much further than marketing...
Mythos had to silence you apparently
Silenced immediately.
I'd be tempted to offer this as a consultant service were I at Anthropic.
It feels like an AI tool that needs professionals to interface with it. Get some of those professionals, have them work with clients in a targeted way. It helps reduce the exposure the tool has to bad actors, and reduces the amount of resource usage that it will incur, because it's being used only by trained individuals.
Use what you learn from the experience to further refine its operation and make it less expensive to operate.
It's probably not much more dangerous than all the AI security patching being done without it, CVE rate is approaching a straight line up
My guess is they are still in the "fake it till you make it" phase. There's no Mythos, it's just a hype machine fueled by a hot air.
It's on bedrock and in use by companies
"Something" is in use.
The "too dangerous to release" line was definitely a marketing stunt.
OpenAI already used the same playbook with GPT-2 in 2019, and some of the same people involved back then are now doing it again at Anthropic with Mythos.
Same safety-branding DNA, different company, and people are falling for it again.
Same people, actually. It’s a Dario move.
ChatGPT literally tells people to kill themselves but apparently that’s not too dangerous and this is.
It’s bad enough that it’s a marketing stunt, totally agree with you. But in the face of what we have seen and how they act like it’s no big deal, it’s just gross.
its pretty obvious they just dont have the compute for it.
... and the safety argument is a great way of saying "no" disguised as a "yes, if ..." to your prospects.