Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA

(github.com)

45 points | by yu3zhou4  3 hours ago

5 comments