Astro - Hacker News

2 comments

minimaxir 4 minutes ago

We really need a replacement for all-MiniLM-L12-v2 that can create more robust embeddings with the same compute.
You can technically do Q4 quantization for larger embedding models but I am not sure if that plays nice with ONNX.
electroglyph 41 minutes ago

ONNX is my first suggestion to people looking for speed gains on CPU