Assignment 3: Language Processing In Python
CSE 341: Programming Languages
The University of Washington, Seattle, Winter 2012
Reading: The reading for this assignment is Let's Parse.
Purposes: The purposes of this assignment are to give you experience developing a more complicated application in Python, as well as helping you learn about one of the most popular parsing methods in computational linguistics.
Due: Monday, January 30, at 5:00 PM via Catalyst CollectIt. (The target date is Friday, January 27).
Teamwork.
Do this assignment in partnerships. You may use the same partners as in Assignment 2 or you may change partners.
What to Turn In: You should turn in the following files:
chartParser.py
ourCFG.py
Each Python file should begin with a multiline string that serves as a comment and that gives the name of the file, a program name (that is more descriptive and English-like than the filename), the authors' (your) names, followed by a brief description (2-5 lines) of what the program does. The brief description in the file chartParser.py should also mention what extra-credit options you have implemented, if any.

The file ourCFG.py should consist primarily of the starter code from CFG.py. However, you will probably want to modify the definitions for Production and Edge in that file, to support extra functionality. For example, you may add a method getLeftCorner to the Production class, and you may add a method isComplete to the Edge class. If you need to modify some of the existing methods in these classes or the Grammar class, you are welcome to do that, too.


 
The Python file chartParser.py that you turn in for this assignment should contain class definitions, function definitions and possibly variable assignments, but should not contain any calls to the print function except in test functions that you may optionally include. Thus, when your program is imported as a module, it should not cause anything to be printed. This will facilitate grading of your work by allowing grading scripts to control the execution of your functions.

Chart Parser: Create a Python program that implements a chart parser with the following features:
  1. (70 points) There is a function parse(grammarFilename, sentenceFilename) that can be called with a pair of strings that specify file names. It should read in a grammar file and process it in a manner similar to what you did in Assignment 2, Problem 6 for sentence generation. It should read in a sentence from the file given by sentenceFilename. It should construct a chart (see handout) and return the chart object. The class of which the chart object is an instance, call it Chart, should implement a method, tostr(), that returns a multiline string containing all the edges of the chart, one per line, in a particular format. That format is automatically produced by the tostr() method of the Edge class in the starter code CFG.py, given out for Assignment 2.
  2. (20 points) The edges of the chart should be generated in the following general order. First, all special word edges should be generated. Then an outer loop should be run for j3 going from 1 to n. In each iteration of the outer loop, an inner loop should run in which all possible new edges are added working from complete edges that end at j3. These edges are added in a while loop that in each iteration tries to add some pure prediction edges, and then tries to add some edged using the fundamental rule. If any edges are successfully added, the while loop keeps going. When no new edges can be added, the outer loop increments j3 and the while loop starts again. The sample output (linked below) shows one particular order. We will not require that the edges be in this order as long as the correct set of edges is output, one per line.
  3. (10 points) The tostr method of the Chart class should have a keyword parameter that allows the output to be limited to complete edges only: tostr(completeEdgesOnly=False). Thus the default should be to return the string representations of all edges.
  4. Extra credit: (10 points) Provide a method tabularChart() that returns a string representation of the chart using "character graphics" to render the chart. There should be a vertical line (drawing with characters such as "|" and "+") for each vertex, and a horizontal segment for each edge. Each edge should have a label similar to that shown in the applet, and the label should be centered along the edge.
  5. Extra credit: (10 points) Provide a method allParseTrees() that returns a list of all the valid parse trees for the sentence represented in the chart. Each parse tree should be represented as a list of the form ['S', subtree1, subtree2, ..., subtreeK], where a terminal node is represented like this: ['around']. Thus each tree or subtree is a list whose first element is a string that tells what vocabulary symbol is associated with the root of that tree or subtree. If the node is an internal node, then there are list elements following the vocabulary symbol that represent the subtrees of the node.
The assignment is worth a nominal total of 100 points, plus up to 20 more points for doing the extra-credit options. The various features should be all implemented together in one version of your program.

The following additional resources are available for this assignment: chart parsing demonstration applet, Grammar file for the simple robot command language GoToSpeak. Sample output for the GoToSpeak grammar and the sentence, "Now go to the table past the door and halt !" (here).

Last updated: 25 January 2:02 PM (j3 can go from 1 to n instead of 0 to n); 24 Jan 8:49 PM (we will accept the output edges in any order.)