Link Search Menu Expand Document

Grammar Solver

Generating correct sentences with recursion and randomness.

ed Workspace

Table of contents

  1. Grammar
  2. GrammarSolver
    1. Method summary
    2. Development strategy
    3. Splitting strings
    4. Trimming whitespace
  3. Complete example
  4. Grading
  5. Reflection
  6. Submission

The main goal of this assignment is to generate random valid sentences given a set of rules.

A formal language is a set of words and symbols along with a set of rules defining how those symbols may be used together. For example, “A boy threw the ball.” is a valid sentence, but “A threw boy ball the” makes no sense, because the words were put together in an invalid way.

Grammar

A grammar is a way of describing the syntax and symbols of a formal language. Grammars consist of two types of symbols (e.g., words, phrases, sentences): terminals and non-terminals. Any single word is a terminal symbol. When constructing sentences, we want to put words together in grammatically-correct ways using sentences, noun phrases, objects, etc. A non-terminal symbol represents a specific combinations of choices of terminals, such as a sentence.

For example, consider the following simple language consisting of the following terminals and non-terminals.

Terminals
the, a, boy, girl, runs, walks
Non-terminals
  • A sentence consists of three structures: article and object and verb.
  • An article consists of either “the” or “a”.
  • An object consists of either “boy” or “girl”.
  • A verb consists of either “runs” or walks”.

"the girl runs"

This language contains the following valid sentences.

  • the boy runs
  • the boy walks
  • a boy runs
  • a boy walks
  • the girl runs
  • the girl walks
  • a girl runs
  • a girl walks

Backus-Naur Form (BNF) is a format for specifying grammars. Each line in a BNF grammar specification follows this format:

nonterminal ::= rule | rule | ... | rule

Each “rule” is some sequence of terminals or non-terminals separated by whitespace. The | (pipe) character indicates a choice in possible rules. The grammar specified above would look like the following in BNF:

sentence::=article object verb
article::=the | a
object::=boy | girl
verb::=runs | walks

Notice that sentence has multiple non-terminals put together in a sequence (requiring an article then an object then a verb), whereas the others each consist of one of multiple choices.

In this assignment, we will use BNF to describe what constitutes a valid sentence according to the following formatting rules.

  • Each line will contain exactly one occurrence of ::= which is the separator between the name of the non-terminal and the choices.
  • A pipe (|) will separate each choice for the non-terminal. If there is only one rule (like with sentence above), there will be no pipe on that line.
  • Whitespace separates tokens but doesn’t haven any special meaning. There will be at least one whitespace character between each part of a single rule. Extra whitespace should be removed.
  • Case matters when comparing symbols. For example, <S> would not be considered the same non-terminal as <s>.
  • If a symbol on the right never appears on the left of ::=, it should be considered a terminal.
  • The text before ::= will not be empty, does not contain a pipe character, and it does not contain any whitespace.
  • The text after ::= will not be empty. Duplicate rules are allowed since including more than one copy of a rule would make it more likely to be selected.

GrammarSolver

GrammarMain reads a BNF grammar input text file and passes its entire contents to GrammarSolver as a list of strings. Implement the GrammarSolver class that reads an input file with a grammar in BNF and uses recursion to randomly generate elements of the grammar. Break each string from the input list into symbols and rules so that it can output random elements of the grammar.

Method summary

public GrammarSolver(List<String> rules)
Initializes a new grammar using the given BNF grammar rules where each rule corresponds to one line of text. Use regular expressions to break apart the rules and store them into a SortedMap1 where the keys are non-terminals and the values are rules. Do not modify the list of input rules. Throw an IllegalArgumentException if the list is empty or if there are two or more entries in the grammar for the same non-terminal.
public boolean grammarContains(String symbol)
Returns true if the given symbol is a non-terminal and false otherwise. For example, for the example grammar above, grammarContains("sentence") would return true. However, grammarContains("foo") or grammarContains("boy") (“boy” is a terminal in the language) would return false.
public String getSymbols()
Returns a string representation of the various non-terminal symbols from the grammar as a sorted, comma-separated list enclosed in square brackets. For example, calling getSymbols() for the example grammar above would return [article, object, sentence, verb].
public String[] generate(String symbol, int times)
Return times random occurrences of the given symbol in a String[]. Each string should be compact: there should be exactly one space between each terminal and there should be no leading or trailing spaces. Throw an IllegalArgumentException if times is negative or if the symbol is not a non-terminal. When generating a non-terminal symbol, each of its rules should be applied with equal probability. Use the Random class to randomly choose between rules.

Development strategy

The most challenging method is generate, so write it last. In the generate method, the main goal is to generate a random occurrence of a given non-terminal symbol using the following algorithm.

  1. Choose a random expansion rule for the given non-terminal symbol.
  2. For each of the symbols in the rule, generate a random occurrence of that symbol.

The base case will be when the symbols in the rule chosen are not non-terminals. Make a recursive call to generate non-terminals. Use a loop to make multiple recursive calls, one for each non-terminal and introduce a private method to handle parameters for the recursive algorithm.

Remember that it is helpful to tackle difficult methods using iterative development by solving simpler versions of the problem first. Random programs can be difficult to validate correctness, and the generate method you will implement uses randomness to decide which rule for a given non-terminal to use. To help you debug and validate your output, the Output Comparison Tool has a single example of what rules would be chosen if the program always used the first rule. This results in the sentence, “the big dog hit the big dog”.

"the big dog hit the big dog"

While this is helpful for development, make sure the final solution is random! Use the Grammar Verifier to check that sentence output meets the grammar rules.

Check for correct whitespace in generate by using some non-whitespace character, such as ~, instead of spaces and inspecting the output visually. Add debugging println statements to print parameter values to see the arguments passed to each recursive calls.

Splitting strings

Use the split method in the String class to split strings apart to handle rules containing the | (pipe) or ` ` (space) characters. The split method takes a string delimiter (what to split by) as an argument and returns a String[] of the substrings split on the delimiter.

The delimiter passed to split is considered a regular expression. Regular expressions are a category of small programming languages that match patterns of text and are often used for string manipulations. For instance, “abc” is a regular expression that matches “a followed by b followed by c”. The following regular expressions will be helpful.

Splitting non-terminals from rules
To split a string line based on where ::= occurs, use the regular expression ::=, which literally matches the characters ::=.
String line = "example::=foo bar |baz";
String[] pieces = line.split("::=");
// ["example", "foo bar |baz"]
Splitting different rules
Splitting a string rules on the | character is more complicated than the “abc” example since | is part of the regular expression syntax. In order to escape the regular expression syntax (like we do with \n or \t), use \\| in Java.
String rules = "foo bar|baz |quux mumble";
String[] pieces = rules.split("\\|");
// ["foo bar", "baz ", "quux mumble"]
Splitting a single rule
To split a string rule on whitespace, indicate “at least one whitespace” with the Java string \\s+. \\s in Java indicates “a single whitespace of any kind” while the + afterwards indicates “one or more instances of the preceding character”.
String rule = "the quick brown fox";
String[] pieces = rule.split("\\s+");
// ["the", "quick", "brown", "fox"]

Trimming whitespace

If the string we want to split begins with a whitespace character, we’ll often get an extra empty string at the front of the resulting array. Use the trim method to remove leading and trailing whitespace.

String str = "   lots   of  spaces    \t";
String trimmedString = str.trim();
// "lots   of  spaces"

Complete example

<sentence>::=<nounp> <verbp>
<nounp>::=<det> <adjs> <noun>|<propnoun>
<propnoun>::=John|Jane|Sally|Spot|Fred|Elmo
<adjs>::=<adj>|<adj> <adjs>
<adj>::=big|green|wonderful|faulty|subliminal|pretentious
<det>::=the|a
<noun>::=dog|cat|man|university|father|mother|child|television
<verbp>::=<transverb> <nounp>|<intransverb>
<transverb>::=hit|honored|kissed|helped
<intransverb>::=died|collapsed|laughed|wept

"fred honored the green wonderful child"

Welcome to the cse143 random sentence generator.

What is the name of the grammar file? sentence.txt

Available symbols are:
[<adj>, <adjs>, <det>, <intransverb>, <noun>, <nounp>, <propnoun>, <sentence>, <transverb>, <verbp>]
What do you want generated (return to quit)? <sentence>
How many do you want me to generate? 5
Sally hit Jane
Spot hit John
Jane died
the green mother wept
the subliminal green man laughed

Available symbols are:
[<adj>, <adjs>, <det>, <intransverb>, <noun>, <nounp>, <propnoun>, <sentence>, <transverb>, <verbp>]
What do you want generated (return to quit)? <sentence>
Longer sentence.txt example
Welcome to the cse143 random sentence generator.

What is the name of the grammar file? sentence.txt

Available symbols to generate are:
[<adj>, <adjs>, <det>, <intransverb>, <noun>, <nounp>, <propnoun>, <sentence>, <transverb>, <verbp>]
What do you want generated (return to quit)? <det>
How many do you want me to generate? 5
the
a
a
the
a

Available symbols to generate are:
[<adj>, <adjs>, <det>, <intransverb>, <noun>, <nounp>, <propnoun>, <sentence>, <transverb>, <verbp>]
What do you want generated (return to quit)? <nounp>
How many do you want me to generate? 5
Jane
the wonderful pretentious green big father
Fred
the wonderful cat
a subliminal pretentious dog

Available symbols to generate are:
[<adj>, <adjs>, <det>, <intransverb>, <noun>, <nounp>, <propnoun>, <sentence>, <transverb>, <verbp>]
What do you want generated (return to quit)? <sentence>
How many do you want me to generate? 20
Jane kissed Spot
the wonderful wonderful pretentious television collapsed
Jane hit Fred
Jane helped a wonderful dog
Elmo helped the green wonderful man
a pretentious university helped Elmo
a wonderful green subliminal father hit Fred
Fred kissed Spot
Spot laughed
the green wonderful father collapsed
the big man helped John
a pretentious green faulty dog collapsed
Jane honored a green subliminal green child
Elmo hit Elmo
a green university died
the pretentious child honored a faulty wonderful subliminal television
Jane died
the faulty dog hit John
Elmo helped Fred
Elmo honored the pretentious big green father

Available symbols to generate are:
[<adj>, <adjs>, <det>, <intransverb>, <noun>, <nounp>, <propnoun>, <sentence>, <transverb>, <verbp>]
What do you want generated (return to quit)?
sentence2.txt example
Welcome to the cse143 random sentence generator.

What is the name of the grammar file? sentence2.txt

Available symbols to generate are:
[E, F1, F2, OP, T]
What do you want generated (return to quit)? T
How many do you want me to generate? 5
42
- y
x
x
( ( 1 ) )

Available symbols to generate are:
[E, F1, F2, OP, T]
What do you want generated (return to quit)? E
How many do you want me to generate? 10
x - 1
0
sin ( 1 + 92 + - 1 / 42 )
max ( y , 92 )
42 \% 1
- 42
92
1
92
42 - sin ( 1 )

Available symbols to generate are:
[E, F1, F2, OP, T]
What do you want generated (return to quit)?

Grading

Appropriately utilize recursion to implement the algorithm as described in the method summary. Simplify the recursive logic as much as possible to reduce the number of special cases and redundant code needed to solve the problem. Make sure that any additional “helper” methods have private access.

Control structure errors
Method errors
Data structure errors
  • Excessive inefficiency when using collections. For example, using a for-each loop to find a value in a map when get would return the value directly.

Reflection

Create a new file reflection.txt and write a paragraph-long reflection answering the following questions (along with anything else you find appropriate):

  1. What did you learn this week?
  2. What did you enjoy in the course this week?
  3. What did you find challenging or frustrating this week?
  4. What did you find particularly helpful for your learning this week?

Submission

Submit GrammarSolver.java and reflection.txt to Grade-It!

  1. SortedMap is a more specialized version of the Map interface that is implemented by TreeMap