After a quick content browse, my understanding is this is more like with a very compressed diff vector, applied to a multi billion parameter model, the models could be 'retrained' to reason (score) better on a specific topic , e.g. math was used in the paper
I agree, I don't think gradient descent is going to work in the long run for the kind of luxurious & automated communist utopia the technocrats are promising everyone.
The quality of custom models trained with proper reasoning datasets[0] even with small parameters (3-7B is sweet spot) is incredible now
[0]: cartesien.io or Salesforce's WebscaleRL
What are you basing how good they are on? Personal experience or some benchmarks?
With four parameters I can fit an elephant, and with five I can make him wiggle his trunk so there is still room for improvement.
Except learning to reason is a far cry from curve fitting. Our brains have more than five parameters.
After a quick content browse, my understanding is this is more like with a very compressed diff vector, applied to a multi billion parameter model, the models could be 'retrained' to reason (score) better on a specific topic , e.g. math was used in the paper
speak for yourself!
reasoning capability might just be some specific combinations of mirror neurons.
even some advanced math usually evolves applying patterns found elsewhere into new topics
I agree, I don't think gradient descent is going to work in the long run for the kind of luxurious & automated communist utopia the technocrats are promising everyone.