> Developed from design to production in nine months, accelerated by OpenAI’s models
> the use of OpenAI models to accelerate parts of the design and optimization process.
I wish there was more about this. As is I kind of have to assume that this is just meaningless marketing, like saying development was accelerated by Microsoft Office or their 5k LG Ultrafine 40-inch monitors.
Like, if this was as big a deal as it kind of vaguely implies, they would be making a bigger deal of it, right?
Chip CEO here. It really depends on what "design" or "production" means. Does "design" mean that the design was complete? Does "production" mean the beginning of production, i.e. tapeout? If measuring from RTL-freeze to tapeout, this is a fairly typical (even somewhat unimpressive) timeline (accounting for some unexpected issues) for a large, complex 3nm chip. If measuring from concept (no RTL at all, block diagram of architecture) to tapeout, this is an amazing timeline. The truth is probably somewhere in between. A more concrete statement would use actual technical milestones and gates.
Not a chip CEO, but I read this article and thought that they're working on some kind of application specific chip only for serving models. Similar to how an FPGA can optimize certain tasks.
Given constant weights / biases of a Transformer / DNN you could use pipelining to feed forward calculations through the array one layer at a time. For DNN's with thousands of layers you might see 1:1 speed up per layer channel.
I doubt they would undergo this process for marginal gains.
>If measuring from RTL-freeze to tapeout, this is a fairly typical (even somewhat unimpressive) timeline (accounting for some unexpected issues) for a large, complex 3nm chip.
The hardware description languages (HDL) used in chip development are like programming languages. The existing models understand them and can do a lot with them. You don’t need to have separate, specialty models designed for this work to use LLMs in chip design workflows.
Design verification also involves a lot of traditional programming which benefits from LLMs.
So it’s not meaningless at all. You could download some of the open source chip design software today and the LLMs could even help you get started on your own tiny chip if you are so interested.
Most HDL code is locked up behind corporate firewalls and not available as training data. While LLMs can handle it to an extent there's a lot of room for improvement. I'll bet that OpenAI and their competitors are racing to license this IP from major hardware vendors in order to compete in the chip design vertical.
I tried making a button using Claude entirely (including the 3D printed enclosure) and it effed up pretty hard with the traces and the header spacing. The project was a big red arcade button that plays the "ah-my-groin.mp3" when pushed (from Simpsons). It did cool work on saving battery life, and the 3d enclosure was awesome, but yeah, I'm convinced I'd have to do another version or two of the custom chip until it came back right. I used a Blender MCP for the 3d modeling. I used a KiCAD MCP server for the chip design/validation.
I think we're not there yet. I've been meaning to look at this flux.ai to see if it has the prompts/workflow worked out better than what I was able to cobble together in a few hours. Maybe Alteryx's MCP server would have been better. I'll try that this weekend for another board I've got.
> I tried making a button using Claude entirely (including the 3D printed enclosure) and it effed up pretty hard with the traces and the header spacing.
PCB design and 3D CAD design are different topics.
Hardware Description Languages are closer to programming languages than CAD. Look at some Verilog to get an idea - https://en.wikipedia.org/wiki/Verilog
Meta: can we not downvote people who are clarifying what they're saying and asking questions, even if they're wrong about something, if the content isn't otherwise objectionable?
One (kicad) make the board, the other (blender) make the casing for it. Both are “hardware” but is electronics and the other is mechanical. Electronic one AI can do a good job, I can’t wait for it to fully built the whole circuit for you based on your specs.
> The existing models understand them and can do a lot with them.
In my experience they are not especially good at SystemVerilog. There's a lot of knowledge about it that is locked behind paywalls and it's very niche.
My guess is the "from scratch" here is quite the exaggeration. Otherwise why did they need Broadcom?
Right. There are two possible meanings and shades in-between:
1) OpenAI genuinely have AI technologies that can improve chip design (bold, unlikely claim, needs evidence)
2) OpenAI designed test/verification models and kernels that could be run on the simulated hardware to test its performance
As you and others have said, it's hard to trust when they are happy to write something that could easily only mean the latter but sounds like the former.
at the hardware company I work at, people are now using claude code and developing skills for it to do basic stuff like triage or do initial debug on failing tests, search for potential causes in RTL, generate skeleton documentation for designs etc
Browsing openai's job postings in the past few months is enough to contirm that it's more than this. They are for sure making serious efforts at building ai for chip design.
From time to time? Lol you must realize, frontier lab eng are using Codex/Claude-Code 99% in loops, on models the public doesn't have access to. Why? Because it works. Just a matter of time before humans are out of the loop and what comes next is a black hole
"The future is here, it's just not evenly distributed"
Or OpenAI accelerated the design and optimization process by summarizing emails exchanged during the design and optimization process, or made it possible to ask an AI questions about meeting notes
Yes, obviously. But do we think LLMs without access to proprietary information do a better job with them than Broadcom's human experts or existing proprietary tools at this level of operations?
It is still a bold claim and it still needs evidence.
We would obviously get a bit more of the evidence if it were to be more useful for the upcoming IPO than this rather open-ended, reinterpretable phrasing.
I've used GPT-5.5 and Opus both for FPGA design with good results. We built a lot of tooling around it to help the models, but even without that they're definitely capable of designing digital logic.
> OpenAI genuinely have AI technologies that can improve chip design (bold, unlikely claim, needs evidence)
Why is that a bold and unlikely claim?
Are you saying that AI, which has been proven to cure diseases, solve our hardest math problems, write complex computer code and generate entire generated worlds and HD video from a simple prompt would somehow be like, my bad, I guess I can't design chips?
Because then they'd likely have stfu and outperformed Intel, Nvidia and AMD, or at least one of them.
They're burning more cash than pretty much anyone else and doesn't have anything public that looks like a matching revenue stream so they probably need one very badly.
There is a lot of verilog out there, it's pretty feasible that they had AI assistance writing more to design their chip.
It doesn't have to be revolutionary, it could just be AI-assisted design and lined up well enough with their operations for a custom ASIC to be worth it.
Also there's some much boilerplate around everything. Writing a testbench with codex is extremely feasible. This is the kind of verifiable feedback loop the agents shine at.
VHDL, VLSI are well documented languages, with well build test and verification frameworks and harnesses. Even just by iteration you could get there if you have the money to pay for it.
I just read a claim on Twitter that the reason these companies (Google and Amazon as well as OpenAI) are using Broadcom isn't just for design expertise, but because Broadcom have allocation agreements in place with TSMC and the memory manufacturers.
Most design partners have allocation agreements. The thing is Broadcom is an absolute GIANT in the ASIC design space, and it's closest competitor Marvell is a fraction of it's size.
There are a lot of large tech companies that most of HN has never heard about that completely dominate entire segments.
Broadcom has become wealthy by being Google's TPU hardware partner, including sharing their TSMC capacity with Google, and evidently now they are doing the same thing with OpenAI. What a brilliant way to take advantage of the AI gold rush!
I wish they weren't using their piles of money to extort money out of the software industry like they are with VMWare and Bitnami.
> Broadcom has become wealthy by being Google's TPU hardware partner...
Kinda, but not exactly.
Broadcom cornered the enterprise infra and security market in the late 2010s and early 2020s after acquiring CA Technologies, BMC (EDIT: Did NOT acquire them, they were considering it back in 2018 but decided against it and KKR ended up acquiring them), Symantec (which they bought instead of BMC), and VMWare and were able to make a strong cybersecurity story during the late 2010s cybersecurity and SaaS boom.
That gave them plenty of cashflow that helped subsidize their hardware business when hardware was not viewed as hot as it is today.
Additionally, Broadcom is GCP's marquee customer and has been for a little under a decade so they were able to make a sweetheart deal where all that software businesses at Broadcom would be exclusively using GCP and in return GCP would working with Broadcom to design it's silicon and source infra needed for their DC buildouts.
Ironically, the DoJ blocking Broadcom's acquisition of Qualcomm was the best thing it ever could have done for Broadcom, because it gave Broadcom the dry powder to dominate the Enterprise SaaS and build a strong niche in the cybersecurity space.
> piles of money to extort money out of the software industry
From personal experience, executives and leadership who started off in the electronics and hardware industry are much more vicious and cutthroat than their peers who started in software.
Working in an industry that historically had to deal with high commodification, low margins, and long tail sales leads to leadership that can execute. Additionally, no one climbs the leadership ladder without having spent years as a line-level engineer, but that's true for software as well to an extent.
Edit: can't reply
> Did they acquire also BMC?
Nope.
Broadcom was considering acquiring them in 2018 but decided not to go through with the opportunity and KKR jumped in.
Good information, Broadcom is a playa, lots and lots of acquisitions! (a quick google search turns up a very eventful history for Broadcom)
> From personal experience, executives and leadership who started off in the electronics and hardware industry are much more vicious and cutthroat than their peers who started in software.
Only The Paranoid Survive is quite a name for a management book. It implies surviving in the world you are speaking about.
With the pace of AI, and with AI helping to pave the way for faster/better AI, I keep wondering if hardware like this will become obsolete well before it has a meaningful ROI. Huge AI models can be run with less resources already through quantization and offloading, but that's just the beginning.
One day, maybe not far from now, a breakthrough will allow huge LLMs (say 200B in size) to run well on an old 5 year old Dell desktop. Think that's crazy? Look at the size of the first hard drives. The IBM 350 was a disk with 50 platters, 24 inches in diameter, that held 3.5Mb, and was leased for today's equivalent of $35K.
Compare that to a multi-terabyte ssd. Now apply that improvement to how an LLM is architected and run now. With AI assisting, it won't be long before a leap occurs and these data centers with all their current ultra-cutting edge Nvidia cards are nearly obsolete overnight.
Very true, and all I am basing my comment on is the improvement in speed AI has demonstrated when applied to software development, and inferring it might enable a similar 10X or 100X improvement in both hardware architecture as well LLM structure and/or interface methods. If that speed improvement applies to performance of AI, that could mean the 70 years it took for people to improve storage technology might be able to be compressed to achieve a step change in AI performance in a drastically shorter timeframe.
> One day, maybe not far from now, a breakthrough will allow huge LLMs (say 200B in size) to run well on an old 5 year old Dell desktop.
I think there will be specialized hardware (beside GPUs) that would be custom made for LLMs. Yes TPUs exist, but mainly for datacenter. GPUs exist, but they are adapted from mainly graphic application. Once all the demand from data center dries up, innovation will kick in.
True but as someone else pointed out; at that time we'd be interested in running 200T parameter model rather than 200B. Why, you might ask? Law of human laziness - a human will become as lazy as the technology allows it to. With the 200T or 20,000 T model - I'd be heavily incentivized to ask it to make the bread for me that I enjoy making now or create a movie for me (featuring myself) which will maximize the dopamine production in my brain.
I think Jevons Paradox and scaling laws will make this not the case. If bigger models are always better (which seems they are), then will always need high-end hardware.
It'd be cool to see more of this type of thing, but I have to imagine the ability for it to be updated to a brand-new model as new models come out is limited. If that is the case, it's going to be an extremely hard sell.
It really depends on the pricepoint at which they can get a board. If they can do a ~32B model for 1k$ and a size of an external HDD, I'd buy one now, even knowing that it won't be upgradeable / the model remains fixed. The speeds they've shown are a quality of its own, and there's plenty you can do with such a model and faster than instant responses.
Yes, but with current architectures world knowledge is baked into the weights. We might stop figuring out how to make models better, but the world keeps changing, science is going to keep making progress at understanding the world, etc. This creates a significant minimum rate of change and I'm pretty skeptical that it's worth baking weights into silicon as a result.
You don't need SOTA models for all tasks, and being able to do more routine tasks at something like 10% of the cost and 70x speed unlocks LLM use for things that are just unthinkable now (bulk classification tasks, real time speech interaction, etc)
I think the model they chose is out of date and hard to sell, but there are plenty of use cases where today's dumb small models are fine. A Qwen 3.5/3.6 or Gemma 3 model on silicon at those speeds would be genuinely world changing even if it's only 1-3B params. Such a model at those speeds will remain extremely useful even over a 5-6 year timespan, I think.
If you consider the places you could deploy it -- with no network access, and at those high speeds... very useful .. for adding vague "common sense" fuzzy thinking to all kinds of applications that right now piss consumers off with poor UX. Esp if the model can do voice-to-text and text-to-speech well (some of the smaller models can)
I wouldn't be surprised if "fast, cheap, dumb" end us being the market for LLMs.
The state-of-the-art models aren't at "can fully replace knowledge worker" levels yet and I doubt they'll get there any time soon, so charging $2000 / month for access isn't going to happen. Right now everyone and their dog is being handed subsidized credits to play with AI, but the actual outcome is rarely good enough to be worth the money they'd need to charge for it. It might very well take another order of magnitude or two to get LLMs to be truly good (if it is even possible at all), and considering how much money is already being pumped into it I just don't see that happening.
On the other hand, the dumb models are more than adequate for simple noncritical tasks, like directing a user to the appropriate FAQ entry, or playing phone decision tree. There's a lot of money in making chatbot assistants actually useful, or in augmenting website search. Turning it into a glorified "language-to-API-call" translator doesn't take a lot of smarts, but as long as it's cheap you can make a killing in volume.
In a chatbot, 17k tok/s is a neat but nearly useless showcase. In a coding agent it is a meaningful improvement. In robotics, it could be an absolute revolution.
8B models aren't useful in general, but for specific use cases they can provide an enourmous amount of intelligence - nVidia's Tesla/Waymo competitor is a 7B LLM with a 2B diffusion model, and running that at those speeds could be an order of magnitude cheaper than existing solutions.
17K tok/s is approaching realtime motor cortex needs for a robot with ~12 actuators (bipedal humanoid) and an IMU. I don't know how many parameters a motor cortex would need but 8B feels like it is within 2 orders of magnitude.
this is an LLM, not a motor cortex. it will output commands as text (json, ...), so comparing size is not very meaningful, especially considering neurons are highly complex and likely requires thousands of artificial simple neurons (weight+bias)
Could you give me some example how in robotics it can be an absolute revolution?
My understanding is that robotics doesn't really rely much on LLM's in the first place but rather other things.
Is the thing that you are suggesting that it would ingest all real time data and then reason through it at an incredibly fast speed and then act on it and re-iterate? I might imagine some problems with this though I am not a robotics engineer and perhaps someone who deeply understands this topic can give more information.
LLM are very good at looking at images and reasoning about them. much more than just object recognition/segmentation, they can explain the physics in the image, the intents, plan actions, ...
That's because of posttraining optimizing for benchmarks that test that.
They tend to collapse into nonsense and hallucinations pretty quickly if you move slightly out of the envelope of the current visual reasoning benchmaxxing.
Disclaimer: I'm a robotics noob, but I've been working on robotics for a few months now.
I'd say virtually all robots you've seen in the real world today rely on classical approaches - you build a rudimentary map, then use classical algorithms to find paths/do area coverage. The robots do no reason or understand what they're looking for, they're more like in-game units. At most there's some bounded, lightweight image classification going on.
LLMs can understand and reason about the world natively. nVidia has a Tesla FSD/Waymo competitor which simply their 7B reasoning LLM but instead of outputting tokens directly, its outputs are fed to a 2B diffusion model that outputs 1.6 second long trajectory for the car, and this is enough for an L2 system. But to make this work, they need the model to run at 10Hz, so they use super high-end hardware to do it (Jetson Thor) and the car is still "blind" for 100ms at a time (they have a parallel classical safety system).
With on-chip LLMs you could run this loop at like 100Hz on a chip that costs a few hundred bucks, rather than 10Hz on a board that costs several thousand.
Pretty huge move. Google and their TPUs are looking infinitely more prescient as I think they are on their 7th generation, along with the offshoots it inspired like the LPU and even others, perhaps like Cerebras and their Wafer Scale Engine.
However, based off first impressions, it seems like this is meant for inference side, and not training, which is also an interesting choice.
Training is pretty much a 1x cost, and efficiency there is already on the way down with architectural improvements. Inference though is an ongoing cost which over time takes orders of magnitude more resources, so focusing on making that far more efficient means way greater gains over time.
> early testing shows that Jalapeño will deliver performance per watt substantially better than current state-of-the-art
We're starting to see what really matters here, and though this is hand wavy the TPU makes similar claims.
I think googles memo about having no moat still stands (see: https://newsletter.semianalysis.com/p/google-we-have-no-moat... if you are unaware). It kind of makes sense that all of this is looking more like 60's to 90's IBM, DEC, Cray, Sun and the hardware race that happened then. History doesn't repeat but it often rhymes and I suspect that these efforts will follow the same trajectory.
To be clear, that is not "Google's memo". It's a memo by a guy who happened to work at Google. There is a diversity of opinions at a company that employs 180,000 people.
I had Opus 4.5 design an LLM inference engine in verilog, including firmware and automated verification a while ago: https://github.com/cpldcpu/smollm.c
It's of course far from optical. But lowering the implementation through the abstraction levels turned out to be extremely powerful.
Can you suggest some tutorials for Verilog and FPGAs in general?
I have a spare Tang Nano 9k but I don't feel confident about blindly asking Claude to vibecode me a solution and still would like to have at-least a basic level of understanding.
Microsoft, Google, and Amazon also do this, but they also have the hyperscaler datacenter infrastructure to host the chips. Designing and taping out the chip is one thing, packaging, cooling, deploying, powering, and managing the fleet is another stack entirely. Wonder where that will come from?
I am not sure how much of the work is done by OpenAI, or whether it is basically a Broadcom chip specifically built for OpenAI models. It is a necessary step, but building a high-performance chip is not easy. Look at companies like Groq, Amazon, and Google.
Both Google and Amazon also codesign heavily with Broadcomm (Amazon also with Marvell and Alchip)
Broadcomm does stuff like physical design, provides IP blocks, managing manufacturing process with TSMC, packaging and testing. Google and Amazon work with system architecture, performance targets, and requirements but Broadcomm as consultant.
The new chip sounds like it's vustom made to accelerate a few specific models they really need to run fast. The advantage is it's truly and ASIC, not a xPU. There are several new startups targeting EDA tooling automation, Chip Agents is the biggest one I can think of but their are smaller players too, Silimate is one I recall. These companies are focusing on building fast AI powered tools to speed up the tape out cycle.
cheap token is more important now than ever. Chinese open weight model is getting pretty good. the real cost of AI adaption will come down to who (China or US) can provide cheap token for consumers and companies. Microsoft consider DeepSeek for their cowork is an example and now OpenAI with its own AI inference chip.
Memory bandwidth is the bottleneck in the Spark. If you replace the SoC with an optimized ASIC but keep the same 256-bit LPDDR5 the performance will be the same. You can increase performance by using wider memory but that's also more expensive.
This is just an uncut wafer - I don't think it's intended to be a wafer-scale chip.
Cerebras etch memory onto the wafer alongside the processing elements, but AFAIK OpenAI are going to be using HBM memory and a conventional chiplet design.
> May we scale smoothly, exponentially and uneventfully through A[SI]
That sentence sounds weird to me. I can't really put my finger on why, maybe the combination of adverbs, or just the fact of writing the desire of scaling as a company so directly. It feels (to me) like openly claiming their selfish goals. Or maybe I am just misinterpreting and they are referring to the whole humanity as "We" (but knowing Broadcom and in a lesser extent OpenAI doings, I am not convinced).
I'm assuming they used LLMs to (help humans) do custom circuit design. Even pre LLM there were various computer optimizations that didn't require humans like genetic algorithms. It'd be cool to see a paper on how they did it.
I mean I'd love to be able to buy something like the 17k tps taalas chip as a pcie or m.2.
Imagine when we can roar along at that speed, low power. Can just have the model reason for a while about anything and everything. It reminds me of the "race to idle" for mcus etc.
The current taalas chip is for a 3.1B param model. I’m hope so much that they can get that up to the 30B range. Just imagine Gemma 4 or Qwen 3.6 at 17k tps.
It's odd to me that I haven't heard anything about this approach (baking LLMs/weights into silicon directly) since. It seems almost common-sense that we're going to end up there eventually. And it feels like that point is drawing ever closer now that model capabilities, if not quite plateauing out, are at least getting to a "good enough" point for a LOT of use cases.
I wonder if it's being worked on in secret, if there's something about it that makes it infeasible, or if companies are really too nervous to lock in one model like that because the next one down the line could be a huge improvement. Re. infeasability, I have heard that the Taalas demonstration chip ran Llama 3.1 8B (a pretty horrible model) and that even that took a massive amount of transistors / die area. So it might just be the case that the good models are too big to fit on silicon?
CA Technologies was much worse than Broadcom in its heyday.
Three of their top execs - CEO, CFO, and head of sales - went to federal prison on securities fraud, conspiracy, and other charges. The CEO, Sanjay Kumar, who was at least partly the fall guy for co-founder Charles Wang, served 10 years.
Being acquired by Broadcom could only have been an upgrade, as strange as that may sound.
Dang, I just checked and CBRS is in free-fall since the IPO.
Sucks, I think they're a cool company.
OTOH, I was the only person back then pushing hard during my time at KAUST (back in 2019) to buy one of their systems when they were nobody, eventually resulting in a partnership between the two.
Then I joined their online discourse, very few users, I was semi-active there but they didn't care much.
Then I came to Toronto and heard they were opening an office here, tried to get noticed several times but got mostly ignored. I asked about upcoming events several times, anything to get involved, "yeah man, maybe one day". Then they made an event during Toronto Tech Week and didn't even tell me ... idk.
I don't get schadenfreude as I still think they're a cool company.
My point is they put all the eggs in one basket (AI inference) and neglected everything else. They seem to be on shaky ground now ... sad.
They don't have true competition, what they lose out on is market share with hyperscalers, since OpenAI would have no plans to share inference hardware with any other company right now. Plus, I don't know how does NVIDIA's investment equation pans out long terms given OpenAI will be investing in more purpose built inference stack for the future.
Although this seems to be for inference itself only and not training but inference is a recurring cost and training is a one time cost and so to me, even if Nvidia still gets moat on training, I don't think that it could ever justify its massive evaluations because for example, some chinese models are actually trained on Non-Nvidia models. The moat in that is incredibly thin.
(at the moment), I think that if I were Nvidia, I would be a bit terrified and I imagine the stock to not be doing super great as I can just imagine everyone online might start talking about it for better or for worse.
I am a bit impressed by OpenAI but is this what can be classified as a plan for OAI to salvage itself and all the commitments it has made nearing a 1.4 Trillion dollars from my memory and this article[0] is from 2025
But could OpenAI simply walk out of its commitments when necessary (for example to Nvidia) if this chip works out or what exactly might happen in the future as these commitments are asked to be paid for, its still smart for OAI to diversify with this chip and to have more deeper ways of revenue than just being a simple middleman but I imagine that Nvidia and others have also invested in OpenAI and they must not be happy with this change.
The thing with AI deals are that they have become so complicated that it is hard for me to find the first order impact of things, let alone second or third order impacts and financial accountability seems to be impacted quite heavily because of all of it and there is some sense that it is done so intentionally.
I call BS. It’s probably a white label around existing Broadcom IP, impossible to go from zero to this kind of chip in nine months. I doubt OpenAI had any significant contribution.
9 months to production is completely impossible anyway.
9 months from design to early samples is probably impossible given than TSMC takes 3 months after tape out to produce them. Then it’s up to the customer to qualify and revise for production. TSMC doesn’t do that.
One thing I don't like about California based companies is how cringe the names always are.
"Jalapeño" is such a bad name, having an "ñ" already makes it difficult and annoying to deal with in so many little ways. Good luck with that.
But also, theres the sort of "yes lets use Mexican related things because we're California" thought that I just really hate. I don't know, its like corporate Memphis to me. You see a product like this, you know it's an uppity califonia based firm that came up with it.
No worse, I suppose, than, the obsession with Lord of the Rings that the authoritarian surveillance companies have. Palantir, Anduril. Then we have the not defense/surveillance ones: Mithril, Valar, Narya, Erebor
None, probably. Just saying Jalapeño is no worse than any other non-descriptive company name. Although at least Palantir and Anduril are aptly named for what they do. The VC firms less so.
Don't worry, in Europe it's the same, but for insurances/lawyer stuff. Tons of companies have names based on Latin words such as Civitas/Insalus/Legalia/Legalitas or whatever which looks tacky/rancid/old fashioned kilometers away.
> Developed from design to production in nine months, accelerated by OpenAI’s models
> the use of OpenAI models to accelerate parts of the design and optimization process.
I wish there was more about this. As is I kind of have to assume that this is just meaningless marketing, like saying development was accelerated by Microsoft Office or their 5k LG Ultrafine 40-inch monitors.
Like, if this was as big a deal as it kind of vaguely implies, they would be making a bigger deal of it, right?
Chip CEO here. It really depends on what "design" or "production" means. Does "design" mean that the design was complete? Does "production" mean the beginning of production, i.e. tapeout? If measuring from RTL-freeze to tapeout, this is a fairly typical (even somewhat unimpressive) timeline (accounting for some unexpected issues) for a large, complex 3nm chip. If measuring from concept (no RTL at all, block diagram of architecture) to tapeout, this is an amazing timeline. The truth is probably somewhere in between. A more concrete statement would use actual technical milestones and gates.
Not a chip CEO, but I read this article and thought that they're working on some kind of application specific chip only for serving models. Similar to how an FPGA can optimize certain tasks.
Given constant weights / biases of a Transformer / DNN you could use pipelining to feed forward calculations through the array one layer at a time. For DNN's with thousands of layers you might see 1:1 speed up per layer channel.
I doubt they would undergo this process for marginal gains.
i don't understand what the second paragraph is saying.
>If measuring from RTL-freeze to tapeout, this is a fairly typical (even somewhat unimpressive) timeline (accounting for some unexpected issues) for a large, complex 3nm chip.
Even for a company’s first design?
I don't think you get the newcomer novelty buff when your val approaches 13 digits.
This isn't Broadcom's first design.
The hardware description languages (HDL) used in chip development are like programming languages. The existing models understand them and can do a lot with them. You don’t need to have separate, specialty models designed for this work to use LLMs in chip design workflows.
Design verification also involves a lot of traditional programming which benefits from LLMs.
So it’s not meaningless at all. You could download some of the open source chip design software today and the LLMs could even help you get started on your own tiny chip if you are so interested.
Most HDL code is locked up behind corporate firewalls and not available as training data. While LLMs can handle it to an extent there's a lot of room for improvement. I'll bet that OpenAI and their competitors are racing to license this IP from major hardware vendors in order to compete in the chip design vertical.
I tried making a button using Claude entirely (including the 3D printed enclosure) and it effed up pretty hard with the traces and the header spacing. The project was a big red arcade button that plays the "ah-my-groin.mp3" when pushed (from Simpsons). It did cool work on saving battery life, and the 3d enclosure was awesome, but yeah, I'm convinced I'd have to do another version or two of the custom chip until it came back right. I used a Blender MCP for the 3d modeling. I used a KiCAD MCP server for the chip design/validation.
I think we're not there yet. I've been meaning to look at this flux.ai to see if it has the prompts/workflow worked out better than what I was able to cobble together in a few hours. Maybe Alteryx's MCP server would have been better. I'll try that this weekend for another board I've got.
> I tried making a button using Claude entirely (including the 3D printed enclosure) and it effed up pretty hard with the traces and the header spacing.
PCB design and 3D CAD design are different topics.
Hardware Description Languages are closer to programming languages than CAD. Look at some Verilog to get an idea - https://en.wikipedia.org/wiki/Verilog
Right. KiCAD for PCB design. Blender for 3D CAD. Oh, are you saying I should have used something other than the KiCAD MCP server for better results?
VHDL is not a language for spatial design. Its more akin to a programming language with circuit semantics.
Meta: can we not downvote people who are clarifying what they're saying and asking questions, even if they're wrong about something, if the content isn't otherwise objectionable?
One (kicad) make the board, the other (blender) make the casing for it. Both are “hardware” but is electronics and the other is mechanical. Electronic one AI can do a good job, I can’t wait for it to fully built the whole circuit for you based on your specs.
You're comparing apples and oranges.
They’re saying that VHDL is an entirely different concept than physical modeling.
The question isn’t whether or not they employed a particular tool, the question is how big of an impact did it have.
> The existing models understand them and can do a lot with them.
In my experience they are not especially good at SystemVerilog. There's a lot of knowledge about it that is locked behind paywalls and it's very niche.
My guess is the "from scratch" here is quite the exaggeration. Otherwise why did they need Broadcom?
Doesn’t Broadcom bring a lot more to bear here than just Verilog? Including relationships with the actual fabricators.
Right. There are two possible meanings and shades in-between:
1) OpenAI genuinely have AI technologies that can improve chip design (bold, unlikely claim, needs evidence)
2) OpenAI designed test/verification models and kernels that could be run on the simulated hardware to test its performance
As you and others have said, it's hard to trust when they are happy to write something that could easily only mean the latter but sounds like the former.
3) The engineers working on the chip used ChatGPT from time to time.
at the hardware company I work at, people are now using claude code and developing skills for it to do basic stuff like triage or do initial debug on failing tests, search for potential causes in RTL, generate skeleton documentation for designs etc
But isn't this rather the ordinary product of an LLM, now?
Is it worth the claim that they are making in a press release?
I'd be shocked if it was anything more than this.
Browsing openai's job postings in the past few months is enough to contirm that it's more than this. They are for sure making serious efforts at building ai for chip design.
Impossible to know. Could be fake/aspirational roles to impress investors with their grand vision.
Do you have inside knowledge?
From time to time? Lol you must realize, frontier lab eng are using Codex/Claude-Code 99% in loops, on models the public doesn't have access to. Why? Because it works. Just a matter of time before humans are out of the loop and what comes next is a black hole
"The future is here, it's just not evenly distributed"
Or OpenAI accelerated the design and optimization process by summarizing emails exchanged during the design and optimization process, or made it possible to ask an AI questions about meeting notes
> 1) OpenAI genuinely have AI technologies that can improve chip design (bold, unlikely claim, needs evidence)
Chip design languages (HDLs like Verilog or VHDL) are well understood by LLMs. They don’t need specialty tools to use GPT-5.5 or other LLMs with them.
You could even try it yourself with open source chip design tooling if you wanted to see it.
Yes, obviously. But do we think LLMs without access to proprietary information do a better job with them than Broadcom's human experts or existing proprietary tools at this level of operations?
It is still a bold claim and it still needs evidence.
We would obviously get a bit more of the evidence if it were to be more useful for the upcoming IPO than this rather open-ended, reinterpretable phrasing.
I don't understand why you're getting downvoted.
I've used GPT-5.5 and Opus both for FPGA design with good results. We built a lot of tooling around it to help the models, but even without that they're definitely capable of designing digital logic.
https://dl.acm.org/doi/10.1145/3785362
https://developer.nvidia.com/culitho
https://www.synopsys.com/blogs/chip-design/analog-layout-syn...
https://arxiv.org/abs/2302.06415
> OpenAI genuinely have AI technologies that can improve chip design (bold, unlikely claim, needs evidence)
Why is that a bold and unlikely claim?
Are you saying that AI, which has been proven to cure diseases, solve our hardest math problems, write complex computer code and generate entire generated worlds and HD video from a simple prompt would somehow be like, my bad, I guess I can't design chips?
> solve our hardest math problems
We're not quite there yet :)
https://en.wikipedia.org/wiki/List_of_unsolved_problems_in_m...
> Why is that a bold and unlikely claim?
Because they could have offered even slightly more evidence.
Because then they'd likely have stfu and outperformed Intel, Nvidia and AMD, or at least one of them.
They're burning more cash than pretty much anyone else and doesn't have anything public that looks like a matching revenue stream so they probably need one very badly.
Perhaps they used gpt 5.5 mini to draft emails. Create a coffee schedule.
AlphaChip is what a chip design with AI is. I'm very suspicious that OpenAI has anything like this or they would be bragging about it.
https://deepmind.google/blog/how-alphachip-transformed-compu...
There is a lot of verilog out there, it's pretty feasible that they had AI assistance writing more to design their chip.
It doesn't have to be revolutionary, it could just be AI-assisted design and lined up well enough with their operations for a custom ASIC to be worth it.
Also there's some much boilerplate around everything. Writing a testbench with codex is extremely feasible. This is the kind of verifiable feedback loop the agents shine at.
I would assume they've already made as big a deal of it as they can without outright lying too much. Read the rest of the press release.
FWIW, Google is now on their 8th generation TPU iteration, having put out the last 4 generations on a 1-year cadence.
VHDL, VLSI are well documented languages, with well build test and verification frameworks and harnesses. Even just by iteration you could get there if you have the money to pay for it.
NVIDIA already designs most of their chips using AI. Why would you assume it's meaningless marketing?
Perhaps because they are suggesting what they are doing is novel.
novel to whom, the reader or the industry?
something can be non-novel in the industry, yet novel to the reader, at which point it is useful ... for such readers.
realistically, how hard are AI accelerators to design?
Probably obvious but still omitted in the OpenAI post: chips are being made by TSMC [1]. Wasn't sure if Intel got it.
1. https://www.investing.com/news/stock-market-news/openai-unve...
I just read a claim on Twitter that the reason these companies (Google and Amazon as well as OpenAI) are using Broadcom isn't just for design expertise, but because Broadcom have allocation agreements in place with TSMC and the memory manufacturers.
...and because most hardware sales except AI accelerators are down due to RAM prices, Broadcom probably can't otherwise use their allocation at TSMC.
Most design partners have allocation agreements. The thing is Broadcom is an absolute GIANT in the ASIC design space, and it's closest competitor Marvell is a fraction of it's size.
There are a lot of large tech companies that most of HN has never heard about that completely dominate entire segments.
I recently put 2+2 together.
Broadcom has become wealthy by being Google's TPU hardware partner, including sharing their TSMC capacity with Google, and evidently now they are doing the same thing with OpenAI. What a brilliant way to take advantage of the AI gold rush!
I wish they weren't using their piles of money to extort money out of the software industry like they are with VMWare and Bitnami.
> Broadcom has become wealthy by being Google's TPU hardware partner...
Kinda, but not exactly.
Broadcom cornered the enterprise infra and security market in the late 2010s and early 2020s after acquiring CA Technologies, BMC (EDIT: Did NOT acquire them, they were considering it back in 2018 but decided against it and KKR ended up acquiring them), Symantec (which they bought instead of BMC), and VMWare and were able to make a strong cybersecurity story during the late 2010s cybersecurity and SaaS boom.
That gave them plenty of cashflow that helped subsidize their hardware business when hardware was not viewed as hot as it is today.
Additionally, Broadcom is GCP's marquee customer and has been for a little under a decade so they were able to make a sweetheart deal where all that software businesses at Broadcom would be exclusively using GCP and in return GCP would working with Broadcom to design it's silicon and source infra needed for their DC buildouts.
Ironically, the DoJ blocking Broadcom's acquisition of Qualcomm was the best thing it ever could have done for Broadcom, because it gave Broadcom the dry powder to dominate the Enterprise SaaS and build a strong niche in the cybersecurity space.
> piles of money to extort money out of the software industry
From personal experience, executives and leadership who started off in the electronics and hardware industry are much more vicious and cutthroat than their peers who started in software.
Working in an industry that historically had to deal with high commodification, low margins, and long tail sales leads to leadership that can execute. Additionally, no one climbs the leadership ladder without having spent years as a line-level engineer, but that's true for software as well to an extent.
Edit: can't reply
> Did they acquire also BMC?
Nope.
Broadcom was considering acquiring them in 2018 but decided not to go through with the opportunity and KKR jumped in.
Did they acquire also BMC?
Good information, Broadcom is a playa, lots and lots of acquisitions! (a quick google search turns up a very eventful history for Broadcom)
> From personal experience, executives and leadership who started off in the electronics and hardware industry are much more vicious and cutthroat than their peers who started in software.
Only The Paranoid Survive is quite a name for a management book. It implies surviving in the world you are speaking about.
[0] https://www.goodreads.com/book/show/66863.Only_the_Paranoid_...
With the pace of AI, and with AI helping to pave the way for faster/better AI, I keep wondering if hardware like this will become obsolete well before it has a meaningful ROI. Huge AI models can be run with less resources already through quantization and offloading, but that's just the beginning. One day, maybe not far from now, a breakthrough will allow huge LLMs (say 200B in size) to run well on an old 5 year old Dell desktop. Think that's crazy? Look at the size of the first hard drives. The IBM 350 was a disk with 50 platters, 24 inches in diameter, that held 3.5Mb, and was leased for today's equivalent of $35K.
https://www.computerhistory.org/storageengine/first-commerci...
Compare that to a multi-terabyte ssd. Now apply that improvement to how an LLM is architected and run now. With AI assisting, it won't be long before a leap occurs and these data centers with all their current ultra-cutting edge Nvidia cards are nearly obsolete overnight.
> One day, maybe not far from now, a breakthrough will allow huge LLMs (say 200B in size) to run well on an old 5 year old Dell desktop.
But if you have such a breakthrough could you not also apply it and run 200T models on todays datacenters?
That assumes scaling laws still hold up. A bigger model might end up only incrementally more intelligent.
Quite true
Interesting comment, but the comparison with hard disk drives is probably unfair.
The IBM 350 was commercialized 70 years ago; it took 70 years for someone like you to be able to compare that to a multi-TB SSD.
Furthermore, nothing says that Moore's Law will necessarily apply to LLMs, for decades to come.
Very true, and all I am basing my comment on is the improvement in speed AI has demonstrated when applied to software development, and inferring it might enable a similar 10X or 100X improvement in both hardware architecture as well LLM structure and/or interface methods. If that speed improvement applies to performance of AI, that could mean the 70 years it took for people to improve storage technology might be able to be compressed to achieve a step change in AI performance in a drastically shorter timeframe.
> One day, maybe not far from now, a breakthrough will allow huge LLMs (say 200B in size) to run well on an old 5 year old Dell desktop.
I think there will be specialized hardware (beside GPUs) that would be custom made for LLMs. Yes TPUs exist, but mainly for datacenter. GPUs exist, but they are adapted from mainly graphic application. Once all the demand from data center dries up, innovation will kick in.
True but as someone else pointed out; at that time we'd be interested in running 200T parameter model rather than 200B. Why, you might ask? Law of human laziness - a human will become as lazy as the technology allows it to. With the 200T or 20,000 T model - I'd be heavily incentivized to ask it to make the bread for me that I enjoy making now or create a movie for me (featuring myself) which will maximize the dopamine production in my brain.
I think Jevons Paradox and scaling laws will make this not the case. If bigger models are always better (which seems they are), then will always need high-end hardware.
> I keep wondering if hardware like this will become obsolete well before it has a meaningful ROI
it will build expertise/infra/know-how foundation for next generation of hardware
I agree with you. Stepping stones are still a part of getting there, if only to be briefly useful.
Usually breakthroughs in computing lead to more usage of computing, not less.
Looking at the development of memory bandwidth, capacity and prices over the last 10 years there is little indication that’s likely.
This is very cool to see - seems like soooo much efficiency waiting to be unlocked at the chip level.
What's everyone think of Taalas?
They're actually burning the LLM model into the silicon, with some onboard memory for fine-tuning. They claim huge cost / latency wins.
Super fast demo live at: https://chatjimmy.ai/
https://taalas.com/
https://www.reddit.com/r/singularity/comments/1r9frzk/taalas...
It'd be cool to see more of this type of thing, but I have to imagine the ability for it to be updated to a brand-new model as new models come out is limited. If that is the case, it's going to be an extremely hard sell.
> extremely hard sell.
It really depends on the pricepoint at which they can get a board. If they can do a ~32B model for 1k$ and a size of an external HDD, I'd buy one now, even knowing that it won't be upgradeable / the model remains fixed. The speeds they've shown are a quality of its own, and there's plenty you can do with such a model and faster than instant responses.
A hard sell right now. The rate of change will slow down
Yes, but with current architectures world knowledge is baked into the weights. We might stop figuring out how to make models better, but the world keeps changing, science is going to keep making progress at understanding the world, etc. This creates a significant minimum rate of change and I'm pretty skeptical that it's worth baking weights into silicon as a result.
That's why we have reasoning/CoT LLMs that can use tools to get updated information.
You don't need SOTA models for all tasks, and being able to do more routine tasks at something like 10% of the cost and 70x speed unlocks LLM use for things that are just unthinkable now (bulk classification tasks, real time speech interaction, etc)
I think the model they chose is out of date and hard to sell, but there are plenty of use cases where today's dumb small models are fine. A Qwen 3.5/3.6 or Gemma 3 model on silicon at those speeds would be genuinely world changing even if it's only 1-3B params. Such a model at those speeds will remain extremely useful even over a 5-6 year timespan, I think.
If you consider the places you could deploy it -- with no network access, and at those high speeds... very useful .. for adding vague "common sense" fuzzy thinking to all kinds of applications that right now piss consumers off with poor UX. Esp if the model can do voice-to-text and text-to-speech well (some of the smaller models can)
I wouldn't be surprised if "fast, cheap, dumb" end us being the market for LLMs.
The state-of-the-art models aren't at "can fully replace knowledge worker" levels yet and I doubt they'll get there any time soon, so charging $2000 / month for access isn't going to happen. Right now everyone and their dog is being handed subsidized credits to play with AI, but the actual outcome is rarely good enough to be worth the money they'd need to charge for it. It might very well take another order of magnitude or two to get LLMs to be truly good (if it is even possible at all), and considering how much money is already being pumped into it I just don't see that happening.
On the other hand, the dumb models are more than adequate for simple noncritical tasks, like directing a user to the appropriate FAQ entry, or playing phone decision tree. There's a lot of money in making chatbot assistants actually useful, or in augmenting website search. Turning it into a glorified "language-to-API-call" translator doesn't take a lot of smarts, but as long as it's cheap you can make a killing in volume.
In a chatbot, 17k tok/s is a neat but nearly useless showcase. In a coding agent it is a meaningful improvement. In robotics, it could be an absolute revolution.
8B models aren't useful in general, but for specific use cases they can provide an enourmous amount of intelligence - nVidia's Tesla/Waymo competitor is a 7B LLM with a 2B diffusion model, and running that at those speeds could be an order of magnitude cheaper than existing solutions.
17K tok/s is approaching realtime motor cortex needs for a robot with ~12 actuators (bipedal humanoid) and an IMU. I don't know how many parameters a motor cortex would need but 8B feels like it is within 2 orders of magnitude.
this is an LLM, not a motor cortex. it will output commands as text (json, ...), so comparing size is not very meaningful, especially considering neurons are highly complex and likely requires thousands of artificial simple neurons (weight+bias)
Could you give me some example how in robotics it can be an absolute revolution?
My understanding is that robotics doesn't really rely much on LLM's in the first place but rather other things.
Is the thing that you are suggesting that it would ingest all real time data and then reason through it at an incredibly fast speed and then act on it and re-iterate? I might imagine some problems with this though I am not a robotics engineer and perhaps someone who deeply understands this topic can give more information.
LLM are very good at looking at images and reasoning about them. much more than just object recognition/segmentation, they can explain the physics in the image, the intents, plan actions, ...
That's because of posttraining optimizing for benchmarks that test that.
They tend to collapse into nonsense and hallucinations pretty quickly if you move slightly out of the envelope of the current visual reasoning benchmaxxing.
Disclaimer: I'm a robotics noob, but I've been working on robotics for a few months now.
I'd say virtually all robots you've seen in the real world today rely on classical approaches - you build a rudimentary map, then use classical algorithms to find paths/do area coverage. The robots do no reason or understand what they're looking for, they're more like in-game units. At most there's some bounded, lightweight image classification going on.
LLMs can understand and reason about the world natively. nVidia has a Tesla FSD/Waymo competitor which simply their 7B reasoning LLM but instead of outputting tokens directly, its outputs are fed to a 2B diffusion model that outputs 1.6 second long trajectory for the car, and this is enough for an L2 system. But to make this work, they need the model to run at 10Hz, so they use super high-end hardware to do it (Jetson Thor) and the car is still "blind" for 100ms at a time (they have a parallel classical safety system).
With on-chip LLMs you could run this loop at like 100Hz on a chip that costs a few hundred bucks, rather than 10Hz on a board that costs several thousand.
Pretty huge move. Google and their TPUs are looking infinitely more prescient as I think they are on their 7th generation, along with the offshoots it inspired like the LPU and even others, perhaps like Cerebras and their Wafer Scale Engine.
However, based off first impressions, it seems like this is meant for inference side, and not training, which is also an interesting choice.
Training is pretty much a 1x cost, and efficiency there is already on the way down with architectural improvements. Inference though is an ongoing cost which over time takes orders of magnitude more resources, so focusing on making that far more efficient means way greater gains over time.
Inference costs are higher than training now. I think.
Nvidia is king of general purpose training chips. But inferences can be specialized.
> early testing shows that Jalapeño will deliver performance per watt substantially better than current state-of-the-art
We're starting to see what really matters here, and though this is hand wavy the TPU makes similar claims.
I think googles memo about having no moat still stands (see: https://newsletter.semianalysis.com/p/google-we-have-no-moat... if you are unaware). It kind of makes sense that all of this is looking more like 60's to 90's IBM, DEC, Cray, Sun and the hardware race that happened then. History doesn't repeat but it often rhymes and I suspect that these efforts will follow the same trajectory.
To be clear, that is not "Google's memo". It's a memo by a guy who happened to work at Google. There is a diversity of opinions at a company that employs 180,000 people.
>designed for initial deployment by the end of 2026 and expanding in the years ahead,
So after the IPO and will be featured heavily in the IPO sales brochure as a future promise?
I'm sceptical over any pre-IPO announcements.
Yeah, the narrative feels like pre-IPO shenanigans, and it looks like the lid on my laundry basket. I wouldn’t be surprised if this is a con.
Who's IPO? Broadcom and Google are already listed, obviously.
OpenAI's upcoming mega IPO
OpenAI, the non profit organization, is going to become a publically traded profit maximizing corporation
> OpenAI, the non profit organization
No, the nonprofit org stays nonprofit, while the for-profit org it owns will become publically traded.
See https://openai.com/index/evolving-our-structure/
> OpenAI was founded as a nonprofit, and is today overseen and controlled by that nonprofit.
Does anybody actually believe that?
I had Opus 4.5 design an LLM inference engine in verilog, including firmware and automated verification a while ago: https://github.com/cpldcpu/smollm.c
It's of course far from optical. But lowering the implementation through the abstraction levels turned out to be extremely powerful.
Can you suggest some tutorials for Verilog and FPGAs in general?
I have a spare Tang Nano 9k but I don't feel confident about blindly asking Claude to vibecode me a solution and still would like to have at-least a basic level of understanding.
Microsoft, Google, and Amazon also do this, but they also have the hyperscaler datacenter infrastructure to host the chips. Designing and taping out the chip is one thing, packaging, cooling, deploying, powering, and managing the fleet is another stack entirely. Wonder where that will come from?
I am not sure how much of the work is done by OpenAI, or whether it is basically a Broadcom chip specifically built for OpenAI models. It is a necessary step, but building a high-performance chip is not easy. Look at companies like Groq, Amazon, and Google.
Both Google and Amazon also codesign heavily with Broadcomm (Amazon also with Marvell and Alchip)
Broadcomm does stuff like physical design, provides IP blocks, managing manufacturing process with TSMC, packaging and testing. Google and Amazon work with system architecture, performance targets, and requirements but Broadcomm as consultant.
The new chip sounds like it's vustom made to accelerate a few specific models they really need to run fast. The advantage is it's truly and ASIC, not a xPU. There are several new startups targeting EDA tooling automation, Chip Agents is the biggest one I can think of but their are smaller players too, Silimate is one I recall. These companies are focusing on building fast AI powered tools to speed up the tape out cycle.
We’ve entered the “if you care about software, build hardware” phase of AI
I have been eyeing what Taalas is doing [1] by making pure hardware models. The speed is absurd.
[1] https://taalas.com/products/
They talk about products, but they don't sell the hardware, thus they don't really have a product, just a service.
I know, it's nick picking, but when people can just reach in and take services away, like Fable/Mythos, hardware is the only thing worth buying.
I'm sure they'll have a product for you if you have millions to invest in a partnership with them.
"Nitpicking"
crazy product. their test chatbot feels a db query.
https://chatjimmy.ai
What are the other phases. Or what are you referring to in general?
“People who are really serious about software should make their own hardware.” ― Alan Kay
cheap token is more important now than ever. Chinese open weight model is getting pretty good. the real cost of AI adaption will come down to who (China or US) can provide cheap token for consumers and companies. Microsoft consider DeepSeek for their cowork is an example and now OpenAI with its own AI inference chip.
I hope to see something like this, but in a small form factor like the NVIDIA spark.
I want a super fast LLM that is Opus 4.6+, like, in ability.
Memory bandwidth is the bottleneck in the Spark. If you replace the SoC with an optimized ASIC but keep the same 256-bit LPDDR5 the performance will be the same. You can increase performance by using wider memory but that's also more expensive.
M3 Ultra has a 1024 bit memory bus (819 GB/s) and starts at $3,999 (96GB of RAM). It can be done....
The tradeoff is that the M3 Ultra's GPU loses to laptop GPUs in compute benchmarks. All of that bandwidth is wasted idling for token prefill.
For inference workloads, it makes a lot more sense to optimize for prefill/ttft before maxing out memory bandwidth.
Unfortunately Sam Altman won't be the one to deliver us at-home hardware that can run Opus-level models
Forget about it. Datacenter class hardware is getting farther and farther from desktop use. It’s not PCIe GPUs anymore.
This seems like more competition for Cerebras? Am I understanding correctly?
This is just an uncut wafer - I don't think it's intended to be a wafer-scale chip.
Cerebras etch memory onto the wafer alongside the processing elements, but AFAIK OpenAI are going to be using HBM memory and a conventional chiplet design.
Still competition for cerebras. Seems quite unlikely they will get an OpenAI deal anytime soon.
The only surprising thing about this is that they didn't do it three years ago.
> May we scale smoothly, exponentially and uneventfully through A[SI]
That sentence sounds weird to me. I can't really put my finger on why, maybe the combination of adverbs, or just the fact of writing the desire of scaling as a company so directly. It feels (to me) like openly claiming their selfish goals. Or maybe I am just misinterpreting and they are referring to the whole humanity as "We" (but knowing Broadcom and in a lesser extent OpenAI doings, I am not convinced).
There is a never ending torrent of money coming, so why not make custom chips.
Whoo ... party!
I'm assuming they used LLMs to (help humans) do custom circuit design. Even pre LLM there were various computer optimizations that didn't require humans like genetic algorithms. It'd be cool to see a paper on how they did it.
I mean I'd love to be able to buy something like the 17k tps taalas chip as a pcie or m.2.
Imagine when we can roar along at that speed, low power. Can just have the model reason for a while about anything and everything. It reminds me of the "race to idle" for mcus etc.
The current taalas chip is for a 3.1B param model. I’m hope so much that they can get that up to the 30B range. Just imagine Gemma 4 or Qwen 3.6 at 17k tps.
> 17k tps taalas chip
It's odd to me that I haven't heard anything about this approach (baking LLMs/weights into silicon directly) since. It seems almost common-sense that we're going to end up there eventually. And it feels like that point is drawing ever closer now that model capabilities, if not quite plateauing out, are at least getting to a "good enough" point for a LOT of use cases.
I wonder if it's being worked on in secret, if there's something about it that makes it infeasible, or if companies are really too nervous to lock in one model like that because the next one down the line could be a huge improvement. Re. infeasability, I have heard that the Taalas demonstration chip ran Llama 3.1 8B (a pretty horrible model) and that even that took a massive amount of transistors / die area. So it might just be the case that the good models are too big to fit on silicon?
I have also been thinking about this a lot, and share your belief that this is inevitable.
Taalas has a running demo here: https://chatjimmy.ai/
It's eye opening: generated an AVX-512 optimized Mersenne Twister in C in 0.076s, 13,706 tok/s. Too fast for the tok/s to be terribly accurate.
Good models will require multiple Taalas chips but Groq and Cerebras also require a lot of chips and that hasn't stopped them.
Word of Advice for OpenAI:
Never underestimate Broadcom’s ability to shaft their own customers
- VMware
- CA Technologies
- Symantec Enterprise Security
- Brocade
- LSI Corporation
CA Technologies was much worse than Broadcom in its heyday.
Three of their top execs - CEO, CFO, and head of sales - went to federal prison on securities fraud, conspiracy, and other charges. The CEO, Sanjay Kumar, who was at least partly the fall guy for co-founder Charles Wang, served 10 years.
Being acquired by Broadcom could only have been an upgrade, as strange as that may sound.
NVidia stocks are red now
Because of Micron, no? I don't think it's related to OpenAI's announcement
Wow thats sounds tempting to use open ai newest chips
Look at the SIZE of that chip.
Cerebras stock is down nearly 20% today.
Not only is approach overlapping, OpenAI is also Cerebras's only major customer.
If you're referring to the big circle of silicon, that's a wafer, generally contains many chips (100-1000s).
The alt text of the first image describes it as the "Jalapeño inference chip".
As a non-RTFA-er. I'm assuming it's a wafer-scale chip, similar to the ones made by Cerebras.
EDIT: From TechRadar[0]: "The 300mm wafer that both CEOs are holding will generate about 50 to 60 ASICs."
[0] https://www.techradar.com/pro/broadcom-and-openai-debut-jala...
That made me chuckle but I guess if you have never seen one I could see how that assumption could be made.
If this photo is real I wonder what can be revealed about the approach they have taken by analyzing the architecture of what we can see.
For reticle-limit chips, it's on the order of 100. And less than that once you filter out bad dies.
Everybody here knows that.
What some don't know (including you) is that the industry is doing wafer-sized chips nowadays, of which Cerebras is the flagship company.
That's why the stock movement could be related, and that is why GP wrote that comment.
I think Cerebras stock going down could also be partly caused by the lock-up period ending today for 200k shares (page 73 of their prospectus) - https://www.sec.gov/Archives/edgar/data/2021728/000162828026...
It doesn’t seem like it? Unless I am misunderstanding these Nasdaq insider trading reports: https://www.nasdaq.com/market-activity/stocks/cbrs/insider-a...
That's just the wafer disc. Looks like it was presented to Sam Altman for ceremonial purposes.
The wafer disc is what the CPU gets "printed" on.
Dang, I just checked and CBRS is in free-fall since the IPO.
Sucks, I think they're a cool company.
OTOH, I was the only person back then pushing hard during my time at KAUST (back in 2019) to buy one of their systems when they were nobody, eventually resulting in a partnership between the two.
Then I joined their online discourse, very few users, I was semi-active there but they didn't care much.
Then I came to Toronto and heard they were opening an office here, tried to get noticed several times but got mostly ignored. I asked about upcoming events several times, anything to get involved, "yeah man, maybe one day". Then they made an event during Toronto Tech Week and didn't even tell me ... idk.
I don't get schadenfreude as I still think they're a cool company.
My point is they put all the eggs in one basket (AI inference) and neglected everything else. They seem to be on shaky ground now ... sad.
my friend briefly worked there and then got hit by layoffs, as a result, I am enjoying the schadenfreude.
aw shucks nvda has some spicy competition
Make sure you all use that fancy ñ
They don't have true competition, what they lose out on is market share with hyperscalers, since OpenAI would have no plans to share inference hardware with any other company right now. Plus, I don't know how does NVIDIA's investment equation pans out long terms given OpenAI will be investing in more purpose built inference stack for the future.
they're still kings for training, though I've heard Anthropic is training now on JAX+TPU setup, so might not be a monopoly in that segment.
No surprise here. [0]
[0] https://news.ycombinator.com/item?id=45429514
If this is something that will hurt Nvidia, I'm all for it
Although this seems to be for inference itself only and not training but inference is a recurring cost and training is a one time cost and so to me, even if Nvidia still gets moat on training, I don't think that it could ever justify its massive evaluations because for example, some chinese models are actually trained on Non-Nvidia models. The moat in that is incredibly thin.
(at the moment), I think that if I were Nvidia, I would be a bit terrified and I imagine the stock to not be doing super great as I can just imagine everyone online might start talking about it for better or for worse.
I am a bit impressed by OpenAI but is this what can be classified as a plan for OAI to salvage itself and all the commitments it has made nearing a 1.4 Trillion dollars from my memory and this article[0] is from 2025
But could OpenAI simply walk out of its commitments when necessary (for example to Nvidia) if this chip works out or what exactly might happen in the future as these commitments are asked to be paid for, its still smart for OAI to diversify with this chip and to have more deeper ways of revenue than just being a simple middleman but I imagine that Nvidia and others have also invested in OpenAI and they must not be happy with this change.
The thing with AI deals are that they have become so complicated that it is hard for me to find the first order impact of things, let alone second or third order impacts and financial accountability seems to be impacted quite heavily because of all of it and there is some sense that it is done so intentionally.
https://techcrunch.com/2025/11/06/sam-altman-says-openai-has...
I wonder how close OpenAI is getting to using the memory they purchased. Are they planning to stack a huge amount of HBM2 into these chips?
I assume OpenAI has been buying memory and "giving" it to Nvidia in exchange for a discount.
So this is where all the memory they bought is going to.
that's not really how it works
> significantly better performance-per-watt than current state-of-the-art alternatives
An interesting example of how the current market dynamics incentivize low cost and therefore power efficiency and therefore lowering resource use.
But nvidia's moat is software support, isn't it?
You don't need a whole lot of software support if you just want to serve a single family of LLMs.
how much does this chip help with inference speed?
It's probably the same speed but cheaper.
I call BS. It’s probably a white label around existing Broadcom IP, impossible to go from zero to this kind of chip in nine months. I doubt OpenAI had any significant contribution.
That’s exactly what this is.
9 months to production is completely impossible anyway.
9 months from design to early samples is probably impossible given than TSMC takes 3 months after tape out to produce them. Then it’s up to the customer to qualify and revise for production. TSMC doesn’t do that.
There’s no AI that makes this happen in 9 months.
lol
The similarities between the AI world and the crypto world are so much closer than any AI fanboy would ever admit.
One thing I don't like about California based companies is how cringe the names always are.
"Jalapeño" is such a bad name, having an "ñ" already makes it difficult and annoying to deal with in so many little ways. Good luck with that.
But also, theres the sort of "yes lets use Mexican related things because we're California" thought that I just really hate. I don't know, its like corporate Memphis to me. You see a product like this, you know it's an uppity califonia based firm that came up with it.
No worse, I suppose, than, the obsession with Lord of the Rings that the authoritarian surveillance companies have. Palantir, Anduril. Then we have the not defense/surveillance ones: Mithril, Valar, Narya, Erebor
What kinds of names would you suggest?
None, probably. Just saying Jalapeño is no worse than any other non-descriptive company name. Although at least Palantir and Anduril are aptly named for what they do. The VC firms less so.
Strawberry was too complicated as a codename.
Too many Rs.
Too many? But there are only two Rs in strawberry, how can that be too many?
Don't worry, in Europe it's the same, but for insurances/lawyer stuff. Tons of companies have names based on Latin words such as Civitas/Insalus/Legalia/Legalitas or whatever which looks tacky/rancid/old fashioned kilometers away.
Jalapeño
Jalapeño
Jalapeño
Really has a… ring to it