| year | baseline | workload analysis | GPU platform | before/after BW, core-util | |
|---|---|---|---|---|---|
| Libra | 2025 | cuSPARSE, DTC-SpMM, FlashSparse, Acc-SpMM, Voltrix-SpMM | NVIDIA H100 PCIe 80GB | ||
| Voltrix | 2025 | ||||
| Acc-SpMM | 2025 | RTX4090, A800, H100, TCU | |||
| DTC-SpMM | 2024 | cuSPARSE, TC-GCN | - latency-bound |

1) : stalled by waiting data which requires long access latency (have opportunity for prefetching)
2) : stalled by waiting data which requires long access latency, saturating all the DRAM BW (have to increase the arithemetic intensity)
3) : stalled by waiting data which saturates all the DRAM or cache BW
4) : no stall