by Zahid-Ammar and Gang

🌍 Agent want do thing. Live in world.

World big. Agent small. Agent say:

β€œMe do action!”

World say:

β€œOk, here is what happen now.”

This called environment.

Agent take step, world give reward and new world.


🧠 Agent brain is policy.

Policy is big thinky-thing.

It say:

β€œYou in place X? Do action Y.”

But brain dumb at first.

Do bad action. Get bad reward. Cry.


🧠 Actor-Critic Method

πŸ– Brain actually two brains. Actor and Critic.

Agent no just have one thinky-thing.

Agent have two: