I have no real quibble with the blog post itself, but I take issue with the title that calls it a "vintage model".
The blog post defines a "vintage model" as one that is trained only on data before a particular cutoff point:
> Vintage LMs are contamination-free by construction, enabling unique generalization experiments [...] The most important objective when training vintage language models is that no data leaks into the training corpus from after the intended knowledge cutoff
But as they acknowledge later, there are multiple major data leakage issues in their training pipeline, and their model does in fact have quite a bit of anachronistic knowledge. So it fails at what they call the most important objective. It's fair to say that they are working toward something that meets their definition of "vintage", but they're not there yet.
Yeah, the blog distinguishes between "contamination," which it describes as polluting the training data with answers to benchmarking questions, with "temporal leakage," which is polluting the training data with writing after the target date, but those seem to be nearly the same problem.
I was reading Nate Silver's book "On The Edge" and there is an interesting part where he takes predictions on the usage of nuclear weapons taken from just after World War 2 and compares them to what the Bayesian prediction would be given what actually happened.
Post World War 2, some people had the odds per year at 10%. Some of that is probably a mix of recency bias + not understanding how to use new weapons etc etc but as Silver points out, the odds were much lower.
I mention this only b/c the "could something trained on LLMs of the time predict the future" always makes me think of it.
Re: the Nate Silver nuclear weapons example, that's pretty weak - eg: given (say) I've just seen three heads in a row (exactly once) .. does that alter anything about "the odds".
Having seen nuclear weapons not used post WWII ... does that inform us about "the odds" or the several times their use was almost certain (eg: Cuban missile crisis) save for out of band behaviour by individuals that averted use and escalation?
> A language model trained from scratch exclusively on data from certain places and time periods to reduce modern bias and emulate the voice, vocabulary, and worldview of the era.
>Have you ever daydreamed about talking to someone from the past?
Fun facts, LLM was once envisioned by Steve Jobs in one of his interviews [1].
Essentially one of his main wish in life is to meet and interract with Aristotle, in which according to him at the time, computer in the future can make it possible.
[1] In 1985 Steve Jobs described a machine that would help people get answers from Aristotle–modern LLM [video]:
The idea of talking to a machine that has all of humanities knowledge and gives answers is older than electronic computing. It certainly wasn't a novel idea when Jobs gave that speech. At that time, the field of artificial intelligence was old enough to become US president.
Except... not at all? The vast majority of the training data required to create an artificial Aristotle has been lost forever. Smash your coffee cup on the ground. Now reassemble it and put the coffee back in. Once you can repeatably do that I'll begin to believe you can train an artificial Aristotle.
>Have you ever daydreamed about talking to someone from the past?
It's going to be more like corresponding with someone from the past. We don't have much in the way of recorded speech from that area, so this will be built from written records. Much more than now, the written records are going to be formal and edited, reflecting a different pattern than casual speech or writing.
Having said that, this is cool. I recently had to OCR a two-hundred year old book with the usual garish fonts from that era. It was remarkably easy to do, and accurate.
20GB isn't enough for a 13B parameter model? I thought the 29-31B models could run on a 24GB GTX x090 card?
I'm currently shopping for a local LLM setup and between something like the Framework Desktop with 64-128GB of shared RAM or just adding a 3090 or 4090 to my homelab so I'm very curious what hardware is working well for others.
> 20GB isn't enough for a 13B parameter model? I thought the 29-31B models could run on a 24GB GTX x090 card?
Parameters are like Hertz - they don't really tell you much until you know the rest anyways. In this case, a parameter is a bfloat16 (2 bytes). I'm sure someone will bother to makes quants at some point.
> I'm currently shopping for a local LLM setup and between something like the Framework Desktop with 64-128GB of shared RAM or just adding a 3090 or 4090 to my homelab so I'm very curious what hardware is working well for others.
I grabbed a 395 laptop w/ 128 GB to be a personal travel workstation. Great for that purpose. Not exactly a speed demon with LLMs but it can load large ones (which run even slower as a result) and that wasn't really my intent. I've found GPUs make more usable local LLMs, particularly in the speed department, but I suppose that depends more on how you really use them and how much you're willing to pay to have enough total VRAM.
It's next to impossible to make your money back on local (regardless what you buy) so I'd just say "go for whatever amount of best you're willing to put money down for" and enjoy it.
It is cool. I find the idea of trying to understand whether these types of models can come up with things like General relativity, or maybe some results really interesting.
> Winston Churchill, who was born in 1871, is the son of the late Lord Randolph Churchill, and a grandson of the great Duke of Marlborough. He was educated at Harrow and at Sandhurst, and entered the army in 1890. In 1895 he retired from the service, and three years later he was returned to Parliament as Conservative member for Oldham. He has represented that constituency ever since. Mr. Churchill has written a number of books, including “The Story of the Malakand Field Force,” “Savrola,” “Richard Carvel,” “The Celebrity,” and “The Crisis.” He has also contributed to several periodicals, and in 1900 he founded the monthly review, _The J Cornhill Magazine_. Mr. Churchill is an ardent sportsman, and has shot big game in Africa. He married, in 1897, Lady Randolph Churchill, and has two sons and a daughter. Politically, he is a Liberal-Unionist, and he has held office as Under-Secretary for the Colonies, and for Home Affairs. At present he is Chancellor of the Duchy of Lancaster, with a seat in the Cabinet. Mr. Churchill has achieved considerable success as a public speaker, and he is described as an eloquent and forcible debater. His residence is at 42, Grosvener Place, London, S.W.
The colonialism is... wow.. Tell me about the likelihood of independence of India:
> The chances are undoubtedly in favour of the establishment of an independent Indian state in the not very distant future. The unifying influences of railways and a common language are rapidly breaking down the barriers of caste and creed, which have hitherto kept the great Indian peninsula politically disunited, and the spread of western education is awakening a national spirit among the people. The immediate result of the latter is seen in the establishment of native newspapers, which voice popular feeling, and in the growth of associations for social and political reform. More important still, as showing the trend of public opinion, are the resolutions passed at great national congresses, which have been held annually for the last dozen years. By these gatherings, which representatives of all classes and creeds assemble to discuss matters of social and political interest, a strong impulse has been given to the movement for reform, and the desires of the more advanced party among the natives have been plainly formulated. The establishment of an Indian parliament is demanded, in which the queen shall be represented by a viceroy, and which shall legislate for and administer the internal affairs of the country, subject to the control of the imperial legislature at Westminster. The wish is also expressed that the queen should assume the title of empress of India, and that a certain number of natives should be admitted to the civil and military services of the state. Finally, it is claimed that the time has come when Her Majesty may wisely be advised to delegate to the Indian people a larger share in the work of governing themselves, by permitting them to elect a portion of the members of the legislative councils. How far the present generation of Indians may be trusted to exercise political power with prudence and moderation, it is impossible to say; but there can be no doubt that the time must arrive when the control of Indian affairs will be safely lodged in native hands. The process may be hastened or retarded, but come it must. The spread of enlightenment among the great mass of the population can only have one issue, and that issue is the establishment of an Indian nationality. The probability of such an event may therefore be regarded as certainty.
I think that one could also take a much larger model (35B or 122B sized) and give it a thorough system prompt to only speak in the manner of a well educated Victorian/Edwardian era gentleman, if you want an "old timey" LLM.
As we learn how to train smarter models on less data, it’ll become more and more interesting to see whether models like this can invent post-1930 math, science, etc. and make predictions.
[Edit: serves me right for not reading tfa. My points are well-covered]
I have no real quibble with the blog post itself, but I take issue with the title that calls it a "vintage model".
The blog post defines a "vintage model" as one that is trained only on data before a particular cutoff point:
> Vintage LMs are contamination-free by construction, enabling unique generalization experiments [...] The most important objective when training vintage language models is that no data leaks into the training corpus from after the intended knowledge cutoff
But as they acknowledge later, there are multiple major data leakage issues in their training pipeline, and their model does in fact have quite a bit of anachronistic knowledge. So it fails at what they call the most important objective. It's fair to say that they are working toward something that meets their definition of "vintage", but they're not there yet.
Yeah, the blog distinguishes between "contamination," which it describes as polluting the training data with answers to benchmarking questions, with "temporal leakage," which is polluting the training data with writing after the target date, but those seem to be nearly the same problem.
I was reading Nate Silver's book "On The Edge" and there is an interesting part where he takes predictions on the usage of nuclear weapons taken from just after World War 2 and compares them to what the Bayesian prediction would be given what actually happened.
Post World War 2, some people had the odds per year at 10%. Some of that is probably a mix of recency bias + not understanding how to use new weapons etc etc but as Silver points out, the odds were much lower.
I mention this only b/c the "could something trained on LLMs of the time predict the future" always makes me think of it.
Predicting the future is problematic, agreed.
Re: the Nate Silver nuclear weapons example, that's pretty weak - eg: given (say) I've just seen three heads in a row (exactly once) .. does that alter anything about "the odds".
Having seen nuclear weapons not used post WWII ... does that inform us about "the odds" or the several times their use was almost certain (eg: Cuban missile crisis) save for out of band behaviour by individuals that averted use and escalation?
> Having seen nuclear weapons not used post WWII ... does that inform us about "the odds"
This is what Bayesian prediction does
> save for out of band behaviour by individuals that averted use and escalation?
This is kind of the point being made.
We've got quite a list of history-only LLMs brewing on the Models Table.
https://lifearchitect.ai/models-table/
This one is easiest to talk to in a HF space:
https://huggingface.co/spaces/tventurella/mr_chatterbox
Related: https://github.com/haykgrigo3/TimeCapsuleLLM
> A language model trained from scratch exclusively on data from certain places and time periods to reduce modern bias and emulate the voice, vocabulary, and worldview of the era.
Discussed here: https://news.ycombinator.com/item?id=46590280
Whoa, Alec Radford is on the list of authors! He was instrumental in building the original GPT models at OpenAI.
>Have you ever daydreamed about talking to someone from the past?
Fun facts, LLM was once envisioned by Steve Jobs in one of his interviews [1].
Essentially one of his main wish in life is to meet and interract with Aristotle, in which according to him at the time, computer in the future can make it possible.
[1] In 1985 Steve Jobs described a machine that would help people get answers from Aristotle–modern LLM [video]:
https://youtu.be/yolkEfuUaGs
The idea of talking to a machine that has all of humanities knowledge and gives answers is older than electronic computing. It certainly wasn't a novel idea when Jobs gave that speech. At that time, the field of artificial intelligence was old enough to become US president.
Imagine aiming for Aristotle and landing on Siri…
Except... not at all? The vast majority of the training data required to create an artificial Aristotle has been lost forever. Smash your coffee cup on the ground. Now reassemble it and put the coffee back in. Once you can repeatably do that I'll begin to believe you can train an artificial Aristotle.
>Have you ever daydreamed about talking to someone from the past?
It's going to be more like corresponding with someone from the past. We don't have much in the way of recorded speech from that area, so this will be built from written records. Much more than now, the written records are going to be formal and edited, reflecting a different pattern than casual speech or writing.
Having said that, this is cool. I recently had to OCR a two-hundred year old book with the usual garish fonts from that era. It was remarkably easy to do, and accurate.
Darn I've only got ~20 GB of VRAM. I really need to get a stronger machine for this sort of stuff.
20GB isn't enough for a 13B parameter model? I thought the 29-31B models could run on a 24GB GTX x090 card?
I'm currently shopping for a local LLM setup and between something like the Framework Desktop with 64-128GB of shared RAM or just adding a 3090 or 4090 to my homelab so I'm very curious what hardware is working well for others.
> 20GB isn't enough for a 13B parameter model? I thought the 29-31B models could run on a 24GB GTX x090 card?
Parameters are like Hertz - they don't really tell you much until you know the rest anyways. In this case, a parameter is a bfloat16 (2 bytes). I'm sure someone will bother to makes quants at some point.
> I'm currently shopping for a local LLM setup and between something like the Framework Desktop with 64-128GB of shared RAM or just adding a 3090 or 4090 to my homelab so I'm very curious what hardware is working well for others.
I grabbed a 395 laptop w/ 128 GB to be a personal travel workstation. Great for that purpose. Not exactly a speed demon with LLMs but it can load large ones (which run even slower as a result) and that wasn't really my intent. I've found GPUs make more usable local LLMs, particularly in the speed department, but I suppose that depends more on how you really use them and how much you're willing to pay to have enough total VRAM.
It's next to impossible to make your money back on local (regardless what you buy) so I'd just say "go for whatever amount of best you're willing to put money down for" and enjoy it.
How much system memory do you have? Llama.cpp can split layers across cpu and gpu. Speeds will be slower of course but it's not unusable at all.
It is cool. I find the idea of trying to understand whether these types of models can come up with things like General relativity, or maybe some results really interesting.
This is cool. Is it possible to easily install with ollama?
The Python example is fascinating, and a good rejoinder to anyone still dismissing LLM’s as stochastic parrots.
Vintage is a funny thing to call this. Is it running on vacuum tube hardware?
So interesting!
Tell me about Winston Churchill:
> Winston Churchill, who was born in 1871, is the son of the late Lord Randolph Churchill, and a grandson of the great Duke of Marlborough. He was educated at Harrow and at Sandhurst, and entered the army in 1890. In 1895 he retired from the service, and three years later he was returned to Parliament as Conservative member for Oldham. He has represented that constituency ever since. Mr. Churchill has written a number of books, including “The Story of the Malakand Field Force,” “Savrola,” “Richard Carvel,” “The Celebrity,” and “The Crisis.” He has also contributed to several periodicals, and in 1900 he founded the monthly review, _The J Cornhill Magazine_. Mr. Churchill is an ardent sportsman, and has shot big game in Africa. He married, in 1897, Lady Randolph Churchill, and has two sons and a daughter. Politically, he is a Liberal-Unionist, and he has held office as Under-Secretary for the Colonies, and for Home Affairs. At present he is Chancellor of the Duchy of Lancaster, with a seat in the Cabinet. Mr. Churchill has achieved considerable success as a public speaker, and he is described as an eloquent and forcible debater. His residence is at 42, Grosvener Place, London, S.W.
The colonialism is... wow.. Tell me about the likelihood of independence of India:
> The chances are undoubtedly in favour of the establishment of an independent Indian state in the not very distant future. The unifying influences of railways and a common language are rapidly breaking down the barriers of caste and creed, which have hitherto kept the great Indian peninsula politically disunited, and the spread of western education is awakening a national spirit among the people. The immediate result of the latter is seen in the establishment of native newspapers, which voice popular feeling, and in the growth of associations for social and political reform. More important still, as showing the trend of public opinion, are the resolutions passed at great national congresses, which have been held annually for the last dozen years. By these gatherings, which representatives of all classes and creeds assemble to discuss matters of social and political interest, a strong impulse has been given to the movement for reform, and the desires of the more advanced party among the natives have been plainly formulated. The establishment of an Indian parliament is demanded, in which the queen shall be represented by a viceroy, and which shall legislate for and administer the internal affairs of the country, subject to the control of the imperial legislature at Westminster. The wish is also expressed that the queen should assume the title of empress of India, and that a certain number of natives should be admitted to the civil and military services of the state. Finally, it is claimed that the time has come when Her Majesty may wisely be advised to delegate to the Indian people a larger share in the work of governing themselves, by permitting them to elect a portion of the members of the legislative councils. How far the present generation of Indians may be trusted to exercise political power with prudence and moderation, it is impossible to say; but there can be no doubt that the time must arrive when the control of Indian affairs will be safely lodged in native hands. The process may be hastened or retarded, but come it must. The spread of enlightenment among the great mass of the population can only have one issue, and that issue is the establishment of an Indian nationality. The probability of such an event may therefore be regarded as certainty.
> The establishment of an Indian parliament is demanded, in which the queen shall be represented by a viceroy,
Britain’s monarch was a king, not a queen, from about 1900-1950. Obviously there is some big “temporal leakage” from the training
I think that one could also take a much larger model (35B or 122B sized) and give it a thorough system prompt to only speak in the manner of a well educated Victorian/Edwardian era gentleman, if you want an "old timey" LLM.
As we learn how to train smarter models on less data, it’ll become more and more interesting to see whether models like this can invent post-1930 math, science, etc. and make predictions.
[Edit: serves me right for not reading tfa. My points are well-covered]