Assignment 3: Language Processing In Python
|
CSE 341: Programming Languages
The University of Washington, Seattle, Winter 2012
|
|
Reading:
The reading for this assignment is
Let's Parse.
|
Purposes:
The purposes of this assignment
are to give you experience developing a more complicated
application in Python, as well as helping you learn about
one of the most popular parsing methods in computational
linguistics.
|
Due:
Monday, January 30, at 5:00 PM via Catalyst CollectIt.
(The target date is Friday, January 27).
|
Teamwork.
Do this assignment in partnerships. You may use the same
partners as in Assignment 2 or you may change partners.
|
What to Turn In:
You should turn in the following files:
chartParser.py
ourCFG.py
Each Python file should begin with a multiline string that serves as
a comment and that gives the
name of the file, a program name (that is more descriptive and English-like
than the filename), the authors' (your) names,
followed by a brief description (2-5 lines) of what the program does.
The brief description in the file chartParser.py should also mention
what extra-credit options you have implemented, if any.
The file ourCFG.py should consist primarily of the starter code from CFG.py.
However, you will probably want to modify the definitions for Production and
Edge in that file, to support extra functionality. For example, you may
add a method getLeftCorner to the Production class, and you may add a method
isComplete to the Edge class. If you need to modify some of the existing
methods in these classes or the Grammar class, you are welcome to do that, too.
The Python file chartParser.py that you turn in for this assignment should contain
class definitions, function definitions and possibly variable assignments, but
should not contain any calls to the print function except in
test functions that you may optionally include.
Thus, when your program is imported
as a module, it should not cause anything to be printed. This will
facilitate grading of your work by allowing grading scripts to control
the execution of your functions.
|
Chart Parser: Create a Python program that implements a chart parser with
the following features:
- (70 points) There is a function parse(grammarFilename, sentenceFilename)
that can be called with a pair of strings that specify file names.
It should read in a grammar file and process it in a manner similar to
what you did in Assignment 2, Problem 6 for sentence generation.
It should read in a sentence from the file given by sentenceFilename.
It should construct a chart (see handout) and return the chart object.
The class of which the chart object is an instance, call it Chart, should
implement a method, tostr(), that returns a multiline string containing
all the edges of the chart, one per line, in a particular format.
That format is automatically produced by the tostr() method of the
Edge class in the starter code CFG.py, given out for Assignment 2.
- (20 points)
The edges of the chart should be generated in the following general order.
First, all special word edges should be generated.
Then an outer loop should be run for j3 going from 1 to n.
In each iteration of the outer loop, an inner loop should run
in which all possible new edges are added working from complete
edges that end at j3. These edges are added in a while loop
that in each iteration tries to add some pure prediction edges,
and then tries to add some edged using the fundamental rule.
If any edges are successfully added, the while loop keeps going.
When no new edges can be added, the outer loop increments j3 and
the while loop starts again.
The sample output (linked below) shows one particular order.
We will not require that the edges be in this order as long as
the correct set of edges is output, one per line.
- (10 points)
The tostr method of the Chart class should have a keyword
parameter that allows the output to be limited to
complete edges only:
tostr(completeEdgesOnly=False). Thus the default should
be to return the string representations of all edges.
-
Extra credit: (10 points) Provide a method tabularChart() that
returns a string representation of the chart using "character
graphics" to render the chart. There should be a vertical line
(drawing with characters such as "|" and "+") for each
vertex, and a horizontal segment for each edge.
Each edge should have a label similar to that shown in the
applet, and the label should be centered along the edge.
-
Extra credit: (10 points) Provide a method allParseTrees() that
returns a list of all the valid parse trees for the
sentence represented in the chart. Each parse tree
should be represented as a list of the form
['S', subtree1, subtree2, ..., subtreeK], where a terminal
node is represented like this: ['around']. Thus each
tree or subtree is a list whose first element is a string that
tells what vocabulary symbol is associated with the root of
that tree or subtree. If the node is an internal node, then
there are list elements following the vocabulary symbol that
represent the subtrees of the node.
The assignment is worth a nominal total of 100 points, plus up to 20 more points
for doing the extra-credit options. The various features should be all
implemented together in one version of your program.
The following additional resources are available for this assignment:
chart parsing demonstration applet,
Grammar file for the simple robot command language GoToSpeak.
Sample output for the GoToSpeak grammar and the sentence, "Now go to the table
past the door and halt !"
(here).
|
Last updated: 25 January 2:02 PM (j3 can go from 1 to n instead of 0 to n); 24 Jan 8:49 PM (we will accept the output edges in any order.)
|