推理加速 | Notion

淘天技术团队

diffusers SD推理加速方案的调研实践总结

目前主流的加速思路包含算子优化、模型编译、模型缓存、模型蒸馏等

如果使用的SD pipeline没有对unet的各种子模块进行复杂修改，oneflow仍然值得尝试；否则，确保pytorch版本为最新的稳定版本以及适度使用deepcache可能是更省心且有效的选择。

FlashAttention：
- https://github.com/Dao-AILab/flash-attention
- https://courses.cs.washington.edu/courses/cse599m/23sp/notes/flashattn.pdf
oneflow
- https://github.com/Oneflow-Inc/oneflow
- https://github.com/siliconflow/onediff
stable-fast
- https://github.com/chengzeyi/stable-fast
deepcache
- https://github.com/horseee/DeepCache
lcm-lora
- https://latent-consistency-models.github.io/
pytorch 2.2

https://pytorch.org/blog/pytorch2-2/