year baseline workload analysis GPU platform before/after BW, core-util
Libra 2025 cuSPARSE, DTC-SpMM, FlashSparse, Acc-SpMM, Voltrix-SpMM NVIDIA H100 PCIe 80GB
Voltrix 2025
Acc-SpMM 2025 RTX4090, A800, H100, TCU
DTC-SpMM 2024 cuSPARSE, TC-GCN - latency-bound

Background

CSR

SpMV in CSR

SpMM in CSR

Paper List

DTC-SpMM

GE-SpMM

Acc-SpMM

core concept

BW and Latency

image.png

1) : stalled by waiting data which requires long access latency (have opportunity for prefetching)
2) : stalled by waiting data which requires long access latency, saturating all the DRAM BW (have to increase the arithemetic intensity)
3) : stalled by waiting data which saturates all the DRAM or cache BW
4) : no stall