Back when ChatGPT came out, I was so shocked by how _good_ it was for an “AI” product that I simply had to know how it worked. Over the next month I ended up drawing out a block diagram on a whiteboard I have in my office, with the math involved next to each step in the blackboard. I’d puzzle about each step along the way, and the triumph of completing the drawing was also that of this sense of deep understanding. I kept that drawing up for many months after, and would gaze at it often during meetings and idle moments in wonder.
This is to say: the autoregressive decoder-only transformer llm architecture as pioneered by openai is wildly simple for how revolutionary its results are. I was reading about non-learned classical SLAM systems (uses video + handcrafted math to produce 3d mappings of physical spaces while also locating the camera in those spaces) at the time, and comparatively speaking I’d say the math is about as complicated as ONE of the components in those formulations. The only reason it needs 6-figure computers to run is because they made the middle bit in those models REALLY BIG, dimensionally speaking.
This was openai’s entire breakthrough. Making this particular model architecture larger leads to emergent capabilities like being able to pick the best ending to a story/set of instructions or answer questions about broad factual knowledge. I’ve been meanwhile watching these AI companies attempt, successfully, to sell this capability as some sort of robot consciousness hand-crafted by supergeniuses. The fact that they are getting away with it is almost as shocking to me as the discovery itself.
I learned TCP/IP by watching and reading raw packets over packet radio at 1200 baud.
I've noticed the same thing is possible if you watch the output of a slow LLM. Eventually you start to see the machinery. input tokens = output tokens, it's math. I can't exactly predict the tokens generated but I can see how they are formed. It's a lot like chess. You can't see every possible move but the mechanism is understandable.
considering they work with any architecture/configuration given enough compute, just more or less efficiently - then maybe it's fundamental, in the same sense as why electricity works...
Why does linear regression works? Why does computer works? Because it's about math and the encoding information. If we can encode words as numbers, then why can't we encode their order as a relation? It's just that neural networks are very apt at finding that relation even if it's noisy.
Back when ChatGPT came out, I was so shocked by how _good_ it was for an “AI” product that I simply had to know how it worked. Over the next month I ended up drawing out a block diagram on a whiteboard I have in my office, with the math involved next to each step in the blackboard. I’d puzzle about each step along the way, and the triumph of completing the drawing was also that of this sense of deep understanding. I kept that drawing up for many months after, and would gaze at it often during meetings and idle moments in wonder.
This is to say: the autoregressive decoder-only transformer llm architecture as pioneered by openai is wildly simple for how revolutionary its results are. I was reading about non-learned classical SLAM systems (uses video + handcrafted math to produce 3d mappings of physical spaces while also locating the camera in those spaces) at the time, and comparatively speaking I’d say the math is about as complicated as ONE of the components in those formulations. The only reason it needs 6-figure computers to run is because they made the middle bit in those models REALLY BIG, dimensionally speaking.
This was openai’s entire breakthrough. Making this particular model architecture larger leads to emergent capabilities like being able to pick the best ending to a story/set of instructions or answer questions about broad factual knowledge. I’ve been meanwhile watching these AI companies attempt, successfully, to sell this capability as some sort of robot consciousness hand-crafted by supergeniuses. The fact that they are getting away with it is almost as shocking to me as the discovery itself.
I learned TCP/IP by watching and reading raw packets over packet radio at 1200 baud.
I've noticed the same thing is possible if you watch the output of a slow LLM. Eventually you start to see the machinery. input tokens = output tokens, it's math. I can't exactly predict the tokens generated but I can see how they are formed. It's a lot like chess. You can't see every possible move but the mechanism is understandable.
I couldn't load the article directly due to an SSL issue, so here's the archive link:
https://archive.ph/aWtFG
find it difficult to engage with AI generated text. What am I getting here that I couldn't get from a chatbot.
Is this text generated by AI? I couldn't tell but I'd believe it if it was.
I imagine if resources were spent writing this text then one benefit of using it is not using more resources or the pollution caused from a chatbot.
Next do "why LLMs work"
considering they work with any architecture/configuration given enough compute, just more or less efficiently - then maybe it's fundamental, in the same sense as why electricity works...
Universal approximation theorem, embeddings, self-attention, gradient descent. And empirically, scaling laws.
Why does linear regression works? Why does computer works? Because it's about math and the encoding information. If we can encode words as numbers, then why can't we encode their order as a relation? It's just that neural networks are very apt at finding that relation even if it's noisy.