KVarN: Native vLLM backend for KV-cache quantization by Huawei

(github.com)

49 points | by theanonymousone  2 hours ago

6 comments