The Performance Wins from Fusing Kernels
Every CUDA kernel launch has overhead. Fusing three operations into one can be 3x faster. Here's where fusion helps and how to get it.
1 post tagged with "cuda"
Every CUDA kernel launch has overhead. Fusing three operations into one can be 3x faster. Here's where fusion helps and how to get it.