Ecco: Improving Memory Bandwidth and Capacity for LLMs via Entropy-aware Cache Compression

DECA: A Near-Core LLM Decompression Accelerator Grounded on a 3D Roofline Model