A6 Project Option C: Parsing with Probabilistic Context-Free Grammars
CSE 473: Introduction to Artificial Intelligence
The University of Washington, Seattle, Spring 2023

Overview

This project option engages you in exploring the use of grammar in natural language analysis, both without and with probability involved.

1. Reading about CFGs, Parsing, and PCFGs

To become familiar with context-free grammars (CFGs) and the CKY algorithm for finding all legal parses of a sentence, read pp1-17 in Chapter 17 of Jurafsky and Martin 3rd ed. Then read up on the modifications to the CKY algorithm that allow computing a most probable parse, using PCFGs. This is in "Chapter C" of the same book. Concentrate on pp.1-11 in this chapter.

2. Implement the CKY Algorithm

Create a Python implementation of the CKY algorithm that correctly finds all possible parses of an ambiguous sentence or phrase generated by a CFG. You are allowed to use ChatGPT to help you in this, but you must analyze the code and fix any errors. You are responsible for completely understanding any code you turn in. If you do use ChatGPT, explain in your report how you used it and what you got back from it.

3. Implement Ney's Variant of the CKY Algorithm for Parsing with PCFGs

Create a Python implementation of the variation of the CKY algorithm that correctly finds a most probable parse for a phrase or sentence in the language generated by an ambiguous Probabilistic Context-Free Grammar. Once again, you are allowed to use ChatGPT to help you in this, but you must analyze the code and fix any errors. Once again, you are responsible for completely understanding any code you turn in, and if you use ChatGPT, you agree to include in your report a discussion of how you used it and what it gave you.

4. Demonstrate the CKY Algorithm

Find an existing example CFG in Chomsky Normal Form and use it to parse an ambiguous sentence of your choice. (Choose a sentence in the language generated by the CFG, such that the sentence has at least two possible parses and maybe more.)

Have your code print out the table of parsed constituents in a style similar to that used in Jurafsky and Martin to show the example.

5. Demonstrate the Probabilistic CFG Variant of the CKY Algorithm

Eiher find an existing example PCFG in Chomsky Normal Form or construct one following the hints in Jurafsky and Martin Chapter C. Then use it to parse an ambiguous sentence of your choice. (Again, choose a sentence in the language generated by the PCFG, such that the sentence has at least two possible parses and maybe more.)

As before, have your code print out the table of parsed constituents (and associated probabilities) in a style similar to that used in Jurafsky and Martin to show the example.

6. Option C-specific Reporting

In your report, include screen shots of the constituent tables for your demonstration parses. Also, draw and include images of parse trees for (a) at least 2 alternative parses found by the CKY algorithm for your demo example, and (b) the most probable parse for your PCFG demo example, with the probabilities of its constituents written next to the corresponding tree nodes.