CUDA-l2: Surpassing cuBLAS performance for matrix multiplication through RL

(github.com)

126 points | by dzign  a day ago

15 comments