K2 0905 and K2 Thinking shortly after that have done impressively well in my personal use cases and was severely slept on. Faster, more accurate, less expensive, more flexible in terms of hosting and available months before Gemini 3 Flash, I really struggle to understand why Flash got such positive attention at launch.
Interested in the dedicated Agent and Agent Swarm releases, especially in how that could affect third party hosting of the models.
Our only modification part is that, if the Software (or any derivative works
thereof) is used for any of your commercial products or services that have
more than 100 million monthly active users, or more than 20 million US dollars
(or equivalent in other currencies) in monthly revenue, you shall prominently
display "Kimi K2.5" on the user interface of such product or service.
> or more than 20 million US dollars (or equivalent in other currencies) in monthly revenue, you shall prominently display "Kimi K2.5" on the user interface of such product or service.
Why not just say "you shall pay us 1 million dollars"?
Hey have they open sourced all Kimi k2.5 (thinking,instruct,agent,agent swarm [beta])?
Because I feel like they mentioned that agent swarm is available their api and that made me feel as if it wasn't open (weights)*? Please let me know if all are open source or not?
I've read several people say that Kimi K2 has a better "emotional intelligence" than other models. I'll be interested to see whether K2.5 continues or even improves on that.
I parsed "reasonable" as in having reasonable speed to actually use this as intended (in agentic setups). In that case, it's a minimum of 70-100k for hardware (8x 6000 PRO + all the other pieces to make it work). The model comes with native INT4 quant, so ~600GB for the weights alone. An 8x 96GB setup would give you ~160GB for kv caching.
You can of course "run" this on cheaper hardware, but the speeds will not be suitable for actual use (i.e. minutes for a simple prompt, tens of minutes for high context sessions per turn).
As your local vision nut, their claims about "SOTA" vision are absolutely BS in my tests.
Sure it's SOTA at standard vision benchmarks. But on tasks that require proper image understanding, see for example BabyVision[0] it appears very much lacking compared to Gemini 3 Pro.
The post actually has great benchmark tables inside of it. They might be outdated in a few months, but for now, it gives you a great summary. Seems like Gemini wins on image and video perf, Claude is the best at coding, ChatGPT is the best for general knowledge.
But ultimately, you need to try them yourself on the tasks you care about and just see. My personal experience is that right now, Gemini Pro performs the best at everything I throw at it. I think it's superior to Claude and all of the OSS models by a small margin, even for things like coding.
I like Gemini Pro's UI over Claude so much but honestly I might start using Kimi K2.5 if its open source & just +/- Gemini Pro/Chatgpt/Claude because at that point I feel like the results are negligible and we are getting SOTA open source models again.
K2 0905 and K2 Thinking shortly after that have done impressively well in my personal use cases and was severely slept on. Faster, more accurate, less expensive, more flexible in terms of hosting and available months before Gemini 3 Flash, I really struggle to understand why Flash got such positive attention at launch.
Interested in the dedicated Agent and Agent Swarm releases, especially in how that could affect third party hosting of the models.
Huggingface Link: https://huggingface.co/moonshotai/Kimi-K2.5
1T parameters, 32b active parameters.
License: MIT with the following modification:
Our only modification part is that, if the Software (or any derivative works thereof) is used for any of your commercial products or services that have more than 100 million monthly active users, or more than 20 million US dollars (or equivalent in other currencies) in monthly revenue, you shall prominently display "Kimi K2.5" on the user interface of such product or service.
> or more than 20 million US dollars (or equivalent in other currencies) in monthly revenue, you shall prominently display "Kimi K2.5" on the user interface of such product or service.
Why not just say "you shall pay us 1 million dollars"?
I assume this allows them to sue for different amounts. And not discourage too many people from using it.
Hey have they open sourced all Kimi k2.5 (thinking,instruct,agent,agent swarm [beta])?
Because I feel like they mentioned that agent swarm is available their api and that made me feel as if it wasn't open (weights)*? Please let me know if all are open source or not?
> For complex tasks, Kimi K2.5 can self-direct an agent swarm with up to 100 sub-agents, executing parallel workflows across up to 1,500 tool calls.
> K2.5 Agent Swarm improves performance on complex tasks through parallel, specialized execution [..] leads to an 80% reduction in end-to-end runtime
Not just RL on tool calling, but RL on agent orchestration, neat!
I've read several people say that Kimi K2 has a better "emotional intelligence" than other models. I'll be interested to see whether K2.5 continues or even improves on that.
yes, though this is highly subjective - it 'feels' like that to me as well (comapred to Gemini 3, GPT 5.2, Opus 4.5).
Curious what would be the most minimal reasonable hardware one would need to deploy this locally?
I parsed "reasonable" as in having reasonable speed to actually use this as intended (in agentic setups). In that case, it's a minimum of 70-100k for hardware (8x 6000 PRO + all the other pieces to make it work). The model comes with native INT4 quant, so ~600GB for the weights alone. An 8x 96GB setup would give you ~160GB for kv caching.
You can of course "run" this on cheaper hardware, but the speeds will not be suitable for actual use (i.e. minutes for a simple prompt, tens of minutes for high context sessions per turn).
https://archive.is/P98JR
As your local vision nut, their claims about "SOTA" vision are absolutely BS in my tests.
Sure it's SOTA at standard vision benchmarks. But on tasks that require proper image understanding, see for example BabyVision[0] it appears very much lacking compared to Gemini 3 Pro.
[0] https://arxiv.org/html/2601.06521v1
The chefs at Moonshot have cooked once again.
There are so many models, is there any website with list of all of them and comparison of performance on different tasks?
The post actually has great benchmark tables inside of it. They might be outdated in a few months, but for now, it gives you a great summary. Seems like Gemini wins on image and video perf, Claude is the best at coding, ChatGPT is the best for general knowledge.
But ultimately, you need to try them yourself on the tasks you care about and just see. My personal experience is that right now, Gemini Pro performs the best at everything I throw at it. I think it's superior to Claude and all of the OSS models by a small margin, even for things like coding.
I like Gemini Pro's UI over Claude so much but honestly I might start using Kimi K2.5 if its open source & just +/- Gemini Pro/Chatgpt/Claude because at that point I feel like the results are negligible and we are getting SOTA open source models again.
There is https://artificialanalysis.ai
Thank you! Exactly what I was looking for
Kimi was already one of the best writing models. Excited to try this one out
Those are some impressive benchmark results. I wonder how well it does in real life.
Maybe we can get away with something cheaper than Claude for coding.
I'm curious about the "cheaper" claim -- I checked Kimi pricing, and it's a $200/mo subscription too?
On openrouter 2.5 is at 0.60/3$ per Mtok. That's haiku pricing.
They also have a $20 and $40 tier.
Cool
Actually open source, or yet another public model, which is the equivalent of a binary?
URL is down so cannot tell.
It's open weights, not open source.
they cooked