22 points | by snikolaev 3 hours ago
2 comments
We really need a replacement for all-MiniLM-L12-v2 that can create more robust embeddings with the same compute.
You can technically do Q4 quantization for larger embedding models but I am not sure if that plays nice with ONNX.
ONNX is my first suggestion to people looking for speed gains on CPU
We really need a replacement for all-MiniLM-L12-v2 that can create more robust embeddings with the same compute.
You can technically do Q4 quantization for larger embedding models but I am not sure if that plays nice with ONNX.
ONNX is my first suggestion to people looking for speed gains on CPU