So, it's a webpage with 3 paragraphs and a simple chart. It has: 1) terrible color scheme – fine, I switch to reader mode 2) shitloads of JS - fine, NoScript works, page breaks 3) Fancy "design" with simple graph but unreadable X axis labels - fine, I can use screen zoom for that 4) ... "LingoLingo - Learn languages with YouTube!"
"Dead Internet" it is ...
One thing for sure is that while Claude is currently taking the #1 spot in mentions, it carries a lot of negative sentiment due to API pricing policies and frequent server downtime. On the other hand, the runner-up, GPT-5.5, actually seems to have more positive feedback.
Personally, my experience with Codex wasn't as good as with Claude Code (Codex freezes on Windows more often than you'd expect), so this is a bit surprising.
That said, the more defensive GPT is definitely better in terms of sheer code-writing capability. However, GPT actually has quite a few issues with text corruption when generating in Korean or Chinese—something English-speaking users probably don't notice.
In terms of model capabilities, when given the same agent.md (CLAUDE.md) file, I think GPT is better at writing code, while Claude is better at writing text during code reviews.
Looking at the bottom right, Qwen and DeepSeek are open-source, so they are largely mentioned in the context of guarding against vendor lock-in, which drives positive sentiment. Considering that Hacker News occasionally shows negative sentiment toward China, the fact that they are viewed this positively—unlike US models—shows that being open-source is a massive advantage in itself.
Anyway, one thing for sure is that Gemini is pretty much unusable.
Also, the stacked graph only allows you to quickly see total mentions, really hard to compare negative or positive sentiment across models at a glance.
Came here to offer this feedback. If I can't see the name of the model, nothing else in the chart really matters to me. I even tried going to the Google Sheet.
It's way too important a piece of information not to have it visible.
It's actually ChatGPT at the moment for the first filtering step, for no other reason than having a code snippet ready that I could point Cursor at (I know, so 2025). The Gemini call is using batch processing, so it's handled differently.
Just FYI this article seems to define "start of the art" as "popular", as measured by "total mentions and user sentiment", without any bearing on the technical abilities or actual usage of the model.
Calling it sota might be a bit provocative, but what actually is the "state of the art"? We have benchmarks, but those are getting increasingly gamed and don't necessarily reflect the actual performance of a model, see Opus 4.7. So I think it's useful to have real world data from actual users as an additional data point.
So, it's a webpage with 3 paragraphs and a simple chart. It has: 1) terrible color scheme – fine, I switch to reader mode 2) shitloads of JS - fine, NoScript works, page breaks 3) Fancy "design" with simple graph but unreadable X axis labels - fine, I can use screen zoom for that 4) ... "LingoLingo - Learn languages with YouTube!" "Dead Internet" it is ...
Interpreting these metrics is quite interesting.
One thing for sure is that while Claude is currently taking the #1 spot in mentions, it carries a lot of negative sentiment due to API pricing policies and frequent server downtime. On the other hand, the runner-up, GPT-5.5, actually seems to have more positive feedback.
Personally, my experience with Codex wasn't as good as with Claude Code (Codex freezes on Windows more often than you'd expect), so this is a bit surprising. That said, the more defensive GPT is definitely better in terms of sheer code-writing capability. However, GPT actually has quite a few issues with text corruption when generating in Korean or Chinese—something English-speaking users probably don't notice. In terms of model capabilities, when given the same agent.md (CLAUDE.md) file, I think GPT is better at writing code, while Claude is better at writing text during code reviews.
Looking at the bottom right, Qwen and DeepSeek are open-source, so they are largely mentioned in the context of guarding against vendor lock-in, which drives positive sentiment. Considering that Hacker News occasionally shows negative sentiment toward China, the fact that they are viewed this positively—unlike US models—shows that being open-source is a massive advantage in itself.
Anyway, one thing for sure is that Gemini is pretty much unusable.
It'd be interesting to also graph this over time to see how sentiment changes from when a model is released to today.
Please fix your graph so the names of the models are readable
Also, the stacked graph only allows you to quickly see total mentions, really hard to compare negative or positive sentiment across models at a glance.
Yep, a toggle to scale all columns to the same height could solve this. I'll look into it when I do the custom graph
Came here to offer this feedback. If I can't see the name of the model, nothing else in the chart really matters to me. I even tried going to the Google Sheet.
It's way too important a piece of information not to have it visible.
"Prompts an LLM" -> which LLM?
I saw you're using Gemini for the sentiment rating (which I guess you picked because it's not often mentioned and thus "neutral"? lol)
But would be interesting to get more details overall
It's actually ChatGPT at the moment for the first filtering step, for no other reason than having a code snippet ready that I could point Cursor at (I know, so 2025). The Gemini call is using batch processing, so it's handled differently.
Just FYI this article seems to define "start of the art" as "popular", as measured by "total mentions and user sentiment", without any bearing on the technical abilities or actual usage of the model.
Calling it sota might be a bit provocative, but what actually is the "state of the art"? We have benchmarks, but those are getting increasingly gamed and don't necessarily reflect the actual performance of a model, see Opus 4.7. So I think it's useful to have real world data from actual users as an additional data point.
That's pretty much exactly what the title says.
The technical abilities and usage are derived from the commenters usage reflections.