Trajectory
s0
s1
s2
a0
a1
...
Before executing a
What do you know?
Prob(sj | si, a),
Prob(sk | si, a),
Prob(sl | si, a), ...
MDP Model of Agency
si
sj
sk
sl
a
Agent consults policy to determine what to doObjective: find policy that maximizes value function over finite horizon (or discounted ?)
Previous slide
Next slide
Back to first slide
View graphic version