PPO 与 Sample-Factory 简介

通过使用 Sample-Factory（PPO 算法的异步实现）来深入探讨 PPO 优化，以此训练我们的智能体玩 vizdoom（Doom 的一个开源版本）。

在笔记本中，将训练你的智能体玩 Health Gathering 关卡，在这个关卡中，智能体必须收集生命包以避免死亡。

https://github.com/a1024053774/RL_Boot/blob/master/Hugging_Face/notebooks/unit8-part2-Doom/unit8_part2.ipynb

a1024053774/rl_course_vizdoom_health_gathering_supreme · Hugging Face