by Zahid-Ammar and Gang
World big. Agent small. Agent say:
βMe do action!β
World say:
βOk, here is what happen now.β
This called environment.
Agent take step, world give reward and new world.
Policy is big thinky-thing.
It say:
βYou in place X? Do action Y.β
But brain dumb at first.
Do bad action. Get bad reward. Cry.
Agent no just have one thinky-thing.
Agent have two: