reward 분포 특징(stationary, non-stationary)
context와 state 비교
action selection