CSE logo

University of Washington Department of Computer Science & Engineering

 CSE 573 – Artificial Intelligence - Autumn 2003



Final Problem Set / Take-Home Exam

Distributed: Friday 12/12/03 2:30pm

Due: Monday 12/15/03 2:30pm – hand in to Alicen Smith, room 546.

Instructions: Open book, but please no collaboration or discussion of the problems except via email to Dan (who will likely answer to everyone). If any problem is underspecified, feel free to make assumptions. Just state and justify your assumptions.

1.      In our discussion of Graphplan, we talked about its relation to constraint satisfaction techniques. This problem explores another such connection: the connection between mutual-exclusion reasoning and k-consistency preprocessing. Consider the problem of determining if there exists an n-step parallel plan to a planning problem (i.e. a plan which achieves its goal by executing zero, one or more actions per time step, for n time steps). This can be expressed as a CSP in several ways, one of which has Boolean action and proposition variables. If there are |A| actions, then one uses n|A| action variables and the i-jth  such variable denotes whether the ith action is true at the jth time step. (Note that no-op actions must be considered to be members of the set of actions, A). One requires (n+1)|P| proposition variables, which similarly denote whether proposition I is true or false at a time step. Constraints specify that the initial conditions are true at time step 0, that the goals hold at time n, that an action implies all of its preconditions, that an action implies all of its effects, that if a proposition is true at time step t, then there exists an action at the previous level which makes that proposition true (this could be a no-op, which is an action having P as sole precondition and effect, and that two action variables may not both be true at a time step if one action deletes the other’s preconditions.

a.       [2 points] Given this model, do you need to explicitly record constraints saying that two action variables may not be both be true at a time step if one deletes a proposition that the other makes true?  Explain in one clear and concise English sentence.

b.      [6 points] Once we write the complete set of primitive constraints (i.e. including those from the previous question if necessary and perhaps some others which Dan has forgotten) one could solve the planning problem by using any constraint satisfaction algorithm: e.g. conflict-directed backjumping, etc. However, it might be possible to preprocess the CSP, simplifying it in a way (e.g. by eliminating inconsistent values from the domains of certain variables) that would speed the subsequent search. For example, one might use arc consistency, or path consistency, or k-consistency for some k.  Then again, one might view traditional Graphplan mutual exclusion reasoning as a form of preprocessing. I.e. if we used a planning to generate a subset of the action and proposition variables above, one can think of that process as eliminating the “true” value from the domain of variables which aren’t in the graph. Fred Foosball has gone on record as stating “There exists a constant, m, such that that mutual exclusion reasoning prunes exactly the values that m-consistency would”. Do you think Fred is right, partly right, or dead wrong?  Explain as clearly as possible.

2.      Consider an undiscounted MDP having three states X, Y, Z with rewards -1, -2, and 0 respectively. State Z is terminal, but two actions (a and b) are available for X and Y. Executing a in X takes the agent to Y with 0.8 probability, otherwise it stays put. Executing a in Y takes the agent to X with 0.8 probability, otherwise it stays put. Executing b in either X or Y takes the agent to Z with 0.9 probability, otherwise it stays put.

a.       [5 points] Assuming an initial policy has the policy of always executing action b, perform policy iteration, showing each step in full, until you arrive at the optimal policy. What is value of each state?  You’ll be graded on how clearly you work through the policy iteration algorithm, so be neat.

b.      [2 points] What happens to policy iteration if the initial policy has a in both states? (Answer with one clear and concise English sentence).

c.       [2 points] Does discounting change things? How? (Answer with one clear and concise English sentence).

d.      [2 points] Does the numeric value of the discount factor affect the final policy? (If so, explain with one clear and concise English sentence).

3.      The textbook defines the Bellman equation (eq 17.5) for a discounted MDP assuming that the reward is a function of a state, but what if the reward is a function of the source state, action executed and the outcome state: R(s, a, s’)?

a.       [2 points] Write the Bellman equation for this case

b.      [5 points] Show how an MDP with reward function R(s, a, s’) can be mechanically transformed into an equivalent MDP whose reward function is R(s), such that optimal policies in the new MDP correspond to optimal policies in the original MDP. Hint, this is a little bit like the conversion from POMDPs into an MDP over belief space. You’ll need to define a new space for the new MDP.

4.      Consider playing tic-tac-toe against an opponent who plays randomly. Specifically, suppose that the opponent chooses any open space with uniform probability, unless the move is forced (in which case it makes the obvious correct move).

a.       [3 points] Formulate the problem of learning an optimal tic-tac-toe strategy in this case as a Q-learning task.

b.      [1 point] Will your program succeed if the opponent plays optimally rather than randomly?

c.       [1 point] What if your opponent changed its strategy after playing a while. Under what condition would your agent adapt to the new strategy?

d.      [1 point] Is there a way to exploit the fact that the tic-tac-toe board, is symmetric? (Answer with one clear and concise English sentence, describing what you might do and another sentence explaining the benefit(s).

e.       [1 point] Could symmetry analysis hurt the performance of the learner? (Explain with one clear and concise English sentence).

5.      [2 points] Russell & Norvig problem 21.8.