WDYM? I don't want to train a model, only use inference. From what I know it must be much cheaper to buy "normal" ram + a decent CPU vs a GPU with similar amounts of vram.
The bottleneck of the inference is fitting a good enough model into memory. A 80B param model 8bit fp quantization equates to roughly ~90GB ram. So 2x64GB DDR4 sticks is probably the most price efficient solution. The questions is: Is there any model which is capable enough to consistently deal with an agentic workload?
CPU? Good luck.
WDYM? I don't want to train a model, only use inference. From what I know it must be much cheaper to buy "normal" ram + a decent CPU vs a GPU with similar amounts of vram.
The bottleneck of the inference is fitting a good enough model into memory. A 80B param model 8bit fp quantization equates to roughly ~90GB ram. So 2x64GB DDR4 sticks is probably the most price efficient solution. The questions is: Is there any model which is capable enough to consistently deal with an agentic workload?