Assignment 5
CSE 415: Introduction to Artificial Intelligence
The University of Washington, Seattle, Winter 2004
Working mode: In Part I, work individually. In Parts II, work in a team of two.
Part I. At the end of Chapter 6 of the text, do exercises 10, 13, 14, and 15. Then do exercises 2a, b, c, and d, and 3 at the end of Chapter 5 of the text. Turn these in as hardcopy in class on Monday, Feb 23.
 
Part II.
A. (30 points) Implement an iterative-deepening depth-first search program in Lisp that searches for the first web page containing a given string as a substring of the text string representing the page. It is recommended that you use CLISP for this assignment, rather than Allegro Common Lisp, and that you make use of some web-page fetching functions by Matti J. Karki. The program should start searching the web from a URL given by the user. It should report on its progress by printing an asterisk each time it fetches a web page, and each time it begins a new level in the iterative deepening it should print the new depth limit (e.g., starting DFS with maxdepth 6). If the search is successful, it should return the first URL where the string was found.
B. (20 points, but this variation is more difficult than version A). Implement an A* search version of your web-page finding program. Your evaluation function f'(s) will consist of the sum of a function g'(s) and h'(s). For g'(s) you should assume that for pair of web pages (p1,p2) connected by a hyperlink, the distance between them is 1.0 units. Then, you should define a heuristic evalution function h'(s) that somehow determines an estimate of the distance from s to the closest goal. You can base this upon how close the given string comes to matching in the current web page. For example, if the string consists of several words, you might determine how many of these words do NOT occur in the current web page; and if the string consists only of a single word, determine the length L of the shortest prefix that does not occur in the current web page and use C/(L+1) as your distance, where C is some constant. Using such a measure makes an implicit assumption that web pages using similar terminology tend to be linked together more than web pages using very different sets of terms.
Your program should print out the URLs of the first 25 web pages it searches, in the order in which these pages are processed (in A, in order of being removed from OPEN, and in B, in order of computing f'(s).).
Test your solutions for A and B on three examples: one to be provided by Jeff Bigham in a specially constructed web site, another one that will be part of the real web (to be given later), and one of your own choosing.

Part II is due Friday, February 27 at 11:59 PM. Use electronic turn-in over the web at this URL.

Note: Part II is intended as a partnership programming activity. Students who do their work in partnerships of two (rather than individually) will obtain a 20 percent point bonus on this these parts of this assignment.
Version 1.0, posted February 13.