Assignment 4 for CSE 415 (Winter 2025)

Assignment 4: Advanced Game Agents

CSE 415: Introduction to Artificial Intelligence
The University of Washington, Seattle, Winter 2025

Due Friday, February 21 at 11:59 PM. Partnerships are optional for this assignment but strongly encouraged. Complete the planning questionnaire by Tuesday night, Feb. 11 for 1 point of participation credit. (See link at the bottom of this page.)

Here is the starter code for this assignment. There is also a partial autograder. available.

A. Introduction

K-in-a-Row

In this assignment, you'll create a program that implements an agent that can participate in a game of K-in-a-Row with Forbidden Squares (defined below). Then you'll demonstrate your agent, providing one or more transcripts of the agent playing against another agent. This game category was chosen by class vote in ED.

PART I: Agent Code to Play K-in-a-Row (40 points). Your program should consist of a single file with a name like yourUWNetID_KInARow.py where the first author's UWNetID is used.

Your program will consist mainly of a sub-class OurAgent of KAgent, providing collection of specific methods, for playing games of "K in a Row". We define a K in a Row game as a kind of generalized Tic-Tac-Toe with the following features:

(a) Just as in Tic-Tac-Toe, there are two players: one plays X and the other plays O;
(b) the board is rectangular, but is not necessarily 3 by 3; it is nRows by mColumns, where these numbers are set by the Game Master (referee) at the beginning of the game according to the current game type (e.g., Tick-Tac-Toe, Five-in-a-Row, Cassini);
(c) a player wins by getting K in a row, where K is not necessarily 3; K can be any integer greater than 1 and less than or equal to the maximum of mRows and nColumns;
(d) there can be "forbidden squares" on the board; these are chosen at the beginning of the game by the Game Master; a square on the board that is available is represented by a blank, whereas a forbidden square is represented by a dash "-" ;
(e) there can be "handicaps" in the initial state, meaning that some X and/or O tokens can be set up on the board by the Game Master in order either to influence the succeeding play or to change the balance of advantage and disadvantage to the players. For example, the initial state of Cassini has two Os already on the board.

The starter code consists of several files:

File you will modify and turn in:

yourUWNetID_KInARow.py (rename it as explained above)

File you will modify and not turn in:

Game_Master_Offline.py (edit the test function at the end to include your agent and choose a game)

Files you will create and turn in:

GameTranscript.pdf An HTML file will be created and named automatically by the Game Master. It will represent the transcript of a match between your agent and that of another team. It will be in the form of an html page that can be viewed in a browser. please keep the file name but convert the file to PDF for submission.
A4-Report.pdf This file will contain your answers to questions about your agent, as well as details on any of the extra-credit items you choose to include.

Files you should read and understand:

game_types.py (defines a State class for all the games, and a Game_Type class for the games)
agent_base.py (defines a basic agent that you'll subclass)
RandomPlayer.py (a sample agent that plays randomly)

Other files:

winTesterForKinARow.py (helper to Game_Master_Offline.py)
gameToHTML.py (creates an HTML page transcript of the game session)

If you do have to use additional files, put them in a folder whose name is yourUWNetID, so that they can be found automatically from your UWNetID. The number of extra files should be kept at a minimum, say 3.

Key Methods to Implement

Some of the functionality you need to implement in your agent is templated or given simple versions in the starter code. For example: introduce(self)

This function should return a multi-line string to introduce your agent. This should include your agent's name (you get to make this up), your name and netid, and anything else you'd like the agent to say at the start of the game.

make_move(self, state, current_utterance, time_limit, autograding=False, use_alpha_beta=True, use_zobrist_hashing=False, max_ply=3, special_static_eval_fn=None)

This function determines which move to make given a game board (state). It is the main entry point into your game playing algorithm, and is called by the Game Master.

If the time_limit argument is not None, the Game Master will cut off your agent's execution once that many seconds have passed. If you have not returned a valid move by that time, you will forfeit the game. To make sure this does not happen, you can handle the time_limit parameter to end your search early and return your best move so far.

You may assume that the state provided to this function will be a valid state where there is a possible space to move and no winner has been established.

should return a list of the form [[newMove, newState, stat1, ..., stat4], myUtterance], where newMove is of the form (i, j) giving the row and column positions where the player is putting an X or O. Here newState is the result of makeing the move. It must be a valid State object for the game being played. myUtterance is the player's current contribution to the ongoing dialog and must be a string. The string should be at least one word and typically a phrase that is around one line long or a sentence. It can be multiple lines if that is an important part of the agent's intended behavior. (See "Utterances" below.) The returned inner list that starts with newMove, may be of length 2 (typical in game-play), or may be of length 6 or more, with the inclusion of "statistics" stat1 through stat4, etc. During autograding this is how you will return information that the autograder needs in order to score your agent's features such as alpha-beta implementation. For more information on these statistics, see the make_move stub in the starter code file agent_base.py.

Your minimax function and alpha-beta pruning will be graded, in-part, based on the number of cutoffs they make in a controlled game-tree search. In this mode (autograding=True) the autograder's "special" static evaluation function has to be used by your agent. Your move-generator should consider moves in a specific order when in this mode -- the same order that they are generated (but the order not in which they are selected, which is where the randomness comes in) by the RandomPlayer agent. Also in autograder mode, it will be important to respect the max-ply value that is passed in.

The parameter use_zobrist_hashing can be used by the autograder to specify whether or not this feature (optionally implemented) is turned on or off for a comparative test of its effects. If you implement Zobrist hashing, and it works well, you should have it on by default, and whether turned on or off, return valid statistics. If you do not implement Zobrist hashing, your agent can ignore the parameter and just return the default stats (-1, for example) that are in specified in the agent_base starter code. minimax(self, state, depth_remaining, time_limit, alpha, beta)

This function should be called by make_move and should implement the minimax algorithm. state and depth_remaining should be used for recursively searching deeper layers and stopping at the correct layer, and will be used in our testing. The remaining arguments are up to you to use how you wish, with the caveat that your code must be able to handle a value of None for any of the remaining arguments which indicates to not use the associated feature.

alpha and beta are all present to support the implementation of your agent's alpha-beta pruning. The minimax method should be coordinated with your make_move method. For example in make_move, the autograder might pass in a value of use_alpha_beta=False; in that case your minimax method should ignore the values of parameters alpha and beta.

The minimax function must return a move and a float value that corresponds to the evaluation of that move.

static_eval(self, state, game_type=None):

This is your static evaluation function. Given a game state, it should return a numeric evaluation of that state where a higher value means that the state is better for X and a lower value is a state better for O. Note that this is regardless of which piece your agent is playing. For this assignment we define that X will always be the maximizing player and O always the minimizing player.

In normal game-play, your agent will know what type of game is being played, because the game master will provide that information via the call to your agent's prepare method. However, during autograding, your static evaluation function might be called with not only a state, but a game type. Although one could sometimes guess the game type from the state, the actual value of k is not necessarily specified in the state. If your static evaluation function were given a 10 by 15 board with some Xs and Os on it, it would be difficult to evaluate without knowing, say, that k is 9 in the game. So the game_type argument would provide the information about k as well as the other properties of the game.

Your static evaluation function should be non-trivial and should return different evaluations for states that are clearly different. We will grade based on comparing your evaluations across states that are easily distinguished by a human as significantly better or worse, so don't worry too much about determining small differences between mostly even states.

The amount of work done by your static evaluation function should depend on the type of game being played, which will be stored as a property of your agent. The game type has the value of k, which is the number of tokens in a line required to win the game. Your static evaluation function will probably need to make use of this value k. For example, if there is a line of k-1 Xs and O has not blocked the completion of k in a row, then this state should probably have a very high value.

We provide little guidance for this function beyond the lecture slides. This will be the main opportunity for you to set your agent apart from the rest of the class for the tournament, so making sure that this function gives good results and runs quickly will go a long ways towards improving your agent's performance.

When a call to make_move has autograding=True, then your code should use the special static evaluation function given in the call instead of your own function. When autograding=False, then no special static evaluation function will be provided in the call. not

For our testing, you may assume that the state parameter is a valid state that matches the size and k values of the initial state passed to your agent in the prepare method call.

PART II: Agent Code for Dialog (30 points).

Utterances:

Each player participates not only in the game being played, but also in a conversation that takes place at the same time. In each turn, the player returns both a move (and corresponding new game state) and a text string we call the utterance. The utterance should be "in character" with your agent's persona. Ideally, it is also somewhat intelligent in that it represents comments that are appropriate. The utterances may achieve any subset of the following behaviors: persona-specific, game-specific, game-state-specific, funny, serious, observant, didactic (teaching something about the agent technology or search techniques), or responsiveness to the utterances of the opponent.

We expect each team or individual to make some effort to incorporate interesting technology or creativity into the production of the utterances. Possibilities include: (a) using an LLM with well-designed prompting to achieve both the desired persona and appropriate comments on the progress of the game; (b) designing a set of IF-THEN rules that trigger utterance patterns based on current conditions in the game or conversation; (c) instrumentation of the game search process -- numbers of states evaluated, alpha/beta cutoffs, counts of reuses of hashed state values using Zobrist hashing, etc., as the basis for insightful comments on the game. We ask you to report on how you designed your utterance-related features.

PART III: Reporting (30 points).

HTML Transcript (10 points)

Create an HTML transcript of a match between your agent and that of another person or partnership in the class. (Do not have your agent play against it self for this transcript, and do not have it play against an agent from the starter code.)

It's easy to create the transcript. Simply check that the variable UseHTML is set to True in Game_Master_Offline.py. Then whatever agents and game it runs will result in an HTML file being created. The title will be generated in a way that identifies the agents and the game type. Please convert the .html file to .pdf for submissiion.

The game type for this transcript should be Five-in-a-Row On Seven-by-Seven Board with Corners Forbidden.

The transcript will be one of the inputs used in grading the dialog ability of your agent.

Running a Game

You will normally use the file Game_Master_Offline.py to run games between agents. RandomPlayer.py is a sample agent that plays randomly and is provided for you to test against. At the bottom of Game_Master_Offline.py, you'll see that the file is already set up to play your agent against our sample agent (as soon as you edit in the correct name of your agent file). You may change this to test different starting configurations as well as play against yourself or another agent. See the instructions in that file for making those changes.

Report (20 points)

Create a report file named A4-Report.pdf with the following contents.

Your name and that of your partner if you have one, with the format: Lastname1, Firstname1; Lastname2, Firstname2, in alphabetical order by lastname.
Agent name, and a short description of its persona (about 2 to 3 lines).
How the agent's "twin" differs from the basic agent. The twin feature helps when having your agent play against an alternate version of itself. For a simple example of the twin feature, look at the starter-code agent RandomPlayer.py.
How your basic alpha-beta pruning is implemented, and how effective it seems to be. In which function are cutoffs detected? Have you done any tests to find out how many static evals are saved due to alpha-beta cutoffs on search trees of depth 3? If you use child ordering by static eval, how much does that help?
Describe the persona of your agent in more detail. What is the intended tone of your agent? Is it modeled after a real person, media character, etc?
What dialog features if any does your agent use to keep its utterance relevant to the game state or trajectory? Explain how they work.
What dialog features if any does your agent use to keep its utterance responsive to the opponent's remarks? Explain how they work.
How did you develop the dialog capabilities of your agent? If you used an iterative design process, describe the phases of this process and what you learned at the end of each phase in terms of what worked or didn't.
Describe each extra credit option you implemented, with a separate subheading and subsection for each one.

Extra Credit Options

There are two ways to get extra credit in Part I and three ways to get extra credit in Part II. There are four more ways to obtain extra credit during the tournament: making the top 50 percent of agents, making it into the top 5 and making best agent in terms of playing the game. Note that if there are too many tied games in any round using a particular time limit per move, then the round will be repeated with a shorter time limit, so that the stronger agents get a better change to show their strength. We will also award 5 points of extra credit to 5 of the agents that, in the opinion of the staff, are the best in terms of dialog, with features such as relevance of comments to the game, including possible comments on technical issues during the search, clarity of persona, degree of apparent engagement in the dialog with the opponent, etc. Each item is worth 5 points, it's conceivable that some agent could not only win the tournament, but score in all the other extra credit, too, for a whopping 45 points of extra credit.

If you do any extra credit items, you will document them in the Extra Credit sectioin of your report, identifying what extra credit options you completed and describing them.

in Part I:

A. Implement the use of static evaluation of children to order the search, so as to increase the likelihood of alpha-beta cutoffs.
Then figure out a way to measure the benefit of this technique over the default ordering, and report on the benefits of it.
Be sure to use a fixed-depth search in the comparisons, such as 4 ply.
B. Implement Zobrist hashing. Also instrument your code so that the hashing statistics can be reported: Number of Writes to the hash table, number of read attempts, and number of successful attempts (those attempts where previously hashed date is found and returned).
If you complete this option, provide a brief description of Zobrist hashing and benefits of it that you measured, in your report file. Note that saved values of states can be used for ordering children just like in A (above), to increase the likelihood of alpha-beta cutoffs.

in Part II:

Use well thought-out prompting to get an LLM to come up with some or all of your agent's utterances, though a Python-callable API such as Google's Gemini. Note: Your agent code should, by default, not import any external APIs. However, if given the go-ahead by the Game Master through the apis_ok parameter of the call to your agent's prepare method, then your code can import external APIs that are compatible with the Game Master. (As of Jan. 28, we anticipate that a web version of the Google Gemini API will be compatible.)
You may also use a locally hosted LLM if you have access to one. If the staff will not be able to easily use it, too, then provide one or more extra transcripts to show what your agent can do.
Implement a feature in your agent such that, if the opponent says "Tell me how you did that", your agent gives a thorough explanation of real statistics about its own computation of the last move, with counts of states evaluated statically, time spent, numbers of cutoffs, etc.
Implement a feature in your agent such that, if the opponent says "What's your take on the game so far?" then your agent will use the history of the game (past states and past utterances) to come up with a story about the game from the beginning to the current state, and a prediction of who will win.

What to Turn In.

Turn in both your yourUWNetID_KInARow.py file and your game-transcript file(s). Each game-transcript file should be the PDF (preferred) or HTML file that is generated by Game_Master_Offline.py.
If you are doing any of the extra credit items, then also turn in your file ExtraCredit.txt that explains those special features of your agent and/or what opponent agents your submitted games involve.
If you wish to enter the competition, explain that in the ExtraCredit.txt file, and submit the additional qualifying game transcript.

For 1 point of participation credit, fill out the A4 planning questionnaire by Tuesday night (midnight) February 11. It's here.

Updates and Corrections

If necessary, updates and corrections will be posted here and/or mentioned in class or in ED.