Summary

Previous work	this work’s approach	challenge	solution	scheme
Current Tensor Core supports only weight sparsity	Supports activation sparsity on Tensor Core.
(first in GPU)	activation sparsity is un-predictable	BITMAP-Based SpGEMM	- Outer-product-based operation (dense multiplication and gather/scatter-based accumulation)

two-level bitmap design for small local buffer | | Several ASIC works support dual-side sparsity. But, SpGEMM only. | Provides SpCONV | im2col overhead. Also, im2col for sparse activation introduces random access. | Dual-Side Sparse Convolution | - outer-product friendly im2col
bitmap-based sparse im2col | | 1. different between SpGEMM vs SpCONV : explict im2col

cost of im2col : affected by access to Shared MEM. (we don’t want to use register file) | | 1. activation sparsity : online. prefer to do it online.

cf) NVIDIA 4:2 pruning. off-line | | |

Background

SpCONV

DNN acceleration library

cuDNN applys im2col for CONV on Tensor cores. cuDNN uses the implicit im2col not to expand the memory footprint. (implicit im2col : original feature map is on global memory and gather them to on-chip memory using address calculation)

The im2col is mainly for activation (GUESS : weight can be processed off-line). So far, when utilizing the weight sparsity, the im2col is based on dense format.

Proposed Work

BITMAP-based SpGEMM

Dis-advantages of inner-product-based Tensor Core on dual-side sparsity

Under the current Tensor Core design, B(activation matrix)’s sparsity produces an under-utilization problem in Tensor Core (’Not used’ in Fig 3(c)). This damages the parallelism of dot products.

Also, prior ASIC papers proposed several methods. But, the cost of those peripherals are considerable overhead to Tensor Cores. (propotional to large TC die-size)

Advantages of outer-product-based Tensor Core

If the sparse vectors are condensed, then vector-vector outer product would be condensed (full util, Fig 4(c) ). Also, some of the unnecessary computations would be skipped.