Summary

To overcome the SpMV based GNN acceleration, they invented general purpose SpMM with cusomized CSR-based.

Naively expanding SpMV based scheme makes two pain points (uncoalesced access pattern between threads, week reuse == redundant data loading)

For deal with problems, they mada two schemes;

target

GNN

if the reduce is sum = SpMM

if the reduce is max-pooling = SpMM-like (this paper’s target; cuSPARSE does not support)

workload Analysis

image.png

They argued that the BW is saturated when we have high N value (which is a column size of output matrix; feature dimension). Therefore, they said that data re-use mechanism is necessary. This means that BW is not efficiently used.

However, they said that SpMV is bounded by low BW utilization. But, these results are based on the RTX 2080 which is very old machine.

Proposed Work