0. Course Logistics

Instructor: Prof. Sungbin Lim @ Department of Statistics, sungbin-lim.net
- Office Hour: Wednesday, 12:00 - 13:00, Woodang Hall (우당교양관) #518
- TA Contact: stat436.ta@gmail.com
Grading (absolute evaluation)
- Assignment: 20% / Mid-term: 30% / Final-term: 50%
Score [0, 30) [30, 40) [40, 50) [50, 70) [70, 80) [80, 90) [90, 100]

Grade F D+ C+ B B+ A A+
- If you miss one of midterm / final → F automatically
- Open-book test (excluding electronic devices)
Assignment
- In the midterm and final exams, assignments will be presented.
- If you bring your solutions to assignments during office hours, I will provide feedback (≠ grading) individually.
Prerequisite
- STAT221 (Introduction to Probability Theory)
- STAT232 (Mathematical Statistics)
- STAT323 (Statistical Computing Methods)

Score	[0, 30)	[30, 40)	[40, 50)	[50, 70)	[70, 80)	[80, 90)	[90, 100]
Grade	F	D+	C+	B	B+	A	A+

1.1 Introduction

Welcome to STAT436! This course will cover the fundamentals of reinforcement learning and its algorithms. Reinforcement learning is a rapidly growing field that intersects artificial intelligence, machine learning, and decision-making science. The coursework is designed for senior-level undergraduate students who aspire to pursue graduate studies in artificial intelligence and conduct AI research.

1. Course Motivation

What is artificial intelligence? The definition of intelligence has sparked controversy over the years. Some earlier researchers believed that computers could exhibit true intelligence if they could outperform humans in games such as chess and Go. Other researchers argued that computers could demonstrate true intelligence if they could create images, videos, articles, and music. Nowadays, we can use such programs that can conduct the above tasks! Then can we say computers have intelligence like humans? What do you think about it?

https://youtu.be/ck4RGeoHFko

Intelligence is not just the possession of knowledge but also the ability to acquire and apply it. If we can devise a way to teach machines to extract knowledge from data (or memory), we may be able to create machines that possess intelligence albeit limited. Therefore, AI researchers are focused on developing machines that can learn from data, which is known as machine learning.

In this course, we will focus on reinforcement learning, which is a powerful machine learning technique. It offers a framework for training intelligent agents that can learn and adapt to their environments through trial-and-error interactions.

Psychology

Reinforcement learning has a strong connection to psychology because it is inspired by the way humans and animals learn through rewards and punishments. In psychology, this is known as instrumental (operant) conditioning, which is a type of learning where behaviors are shaped by the consequences that follow them.

Pavlov’s experiment on classical conditioning (image source: link)

Every student may know about I. Pavlov’s experiments with dogs. This is an example of classical conditioning. An initial neural stimulus (bell ringing) becomes conditional stimulus as the dog learns that it predicts the unconditional stimulus (food) and so starts producing the conditional response (salivation) in response to the conditional stimulus. The unconditional stimulus is called a reinforcer. It has been found that conditioning is activated by synchronization of neuron cells in the hippocampus of the temporal lobe.

Activated neurons, Zhou et al., 2020

On the other hand, with instrumental conditioning an agent is rewarded or punished depending on what it did, hence it learns to increase its tendency to produce rewarded behavior and to decrease its tendency to produce penalized behavior.

Thorndike’s experiment on instrumental conditioning (image source: link)

In fact, many of the foundational ideas in reinforcement learning were inspired by behavioral psychology and animal learning experiments. For example, the concept of a reward signal, which is a critical component of reinforcement learning, was first proposed by psychologist B.F. Skinner in his experiments on operant conditioning. Reinforcement learning algorithms operate on a similar principle, where an agent learns to take actions in an environment based on the feedback it receives in the form of rewards and punishments. This feedback signal is a central concept in both reinforcement learning and psychology, as it provides a mechanism for shaping behavior.