Ecco: Improving Memory Bandwidth and Capacity for LLMs via Entropy-aware Cache Compression
DECA: A Near-Core LLM Decompression Accelerator Grounded on a 3D Roofline Model