Project Option 3
CSE 415: Introduction to Artificial Intelligence
The University of Washington, Seattle, Spring 2019
OPTION 3: Feature-Based Reinforcement Learning for the Rubik Cube.

This option involves applying reinforcement learning to a problem whose state space is very large. In this option you will explore the use of Q-Learning in solving a combinatorially challenging problem: Rubik's Cube.


Image courtesy of ruwix.com.

Because of the very large state space for straight Rubik's cube, you may consider a variety of simplifications including:

  1. Feature-based state representation. See Poole and Mackworkth, AI, 2nd ed. You should definitely take this approach, probably in combination with some of the following ideas.
  2. 180-degree moves only. This reduces the branching factor by over half, leading to a smaller state space.
  3. 2x2x2 size (similar to the extra credit option in Assignment 3). Instead of 26 visible cubies (plus an imaginary center cubie), this version has only 8 cubies, making its state space much smaller.
  4. fewer than 6 distinct colors. If there were only one color, all different rotational configurations of the Rubik's Cube would be identical, and so the state space would be reduced to a single state. Clearly with some number of colors c, where 0 < c < 6, we can reduce the size of the state space while maintaiing some of the challenge of the original puzzle.
  5. combinations of heuristics and Q-Learning, where heuristics may be based on the use of "pattern databases". By marking some cube faces as "don't care" (i.e., wild-card) colors, we make it easier to reach a goal state. As in Poker where, for example, a deuce could be declared "wild" and be part of flush, straight, full house, etc., a wild-card cubie face could be considered blue, and thus be one of 9 cubie faces contributing to a monochrome blue face of the whole Rubik's Cube, etc. Another way to think of a wild-card cubie face is like a blank tile in the game of Scrabble, where the blank may be interpreted by the player as any normal letter A-Z. With the pattern database approach, several cubie faces can get a special color, such as black, and we consider the puzzle solved is all the non-black cubie faces appear on separate Rubik's Cube faces.
  6. constraining allowable starting states. Note that Rubik's Cube does not have standard starting state, unlike the Towers of Hanoi. Thus there will be a different solution for each possible state of the Rubik's Cube puzzle. For an agent to get good at solving in this domain would mean learning or knowing about the entire state space. If you need to, you could limit the subset of allowable starting states to a small number, or perhaps only just 1.
For insight on the Rubik cube itself, see this paper by Richard Korf.

Note that this option is not about exploring heuristics for the Rubik Cube. (That was the topic of A3's extra credit option.) The objective here is to explore the use of feature-based Q-learning (also known as approximate Q-Learning) for the problem.