Grammar Solver
Generating correct sentences with recursion and randomness.
Table of contents
The main goal of this assignment is to generate random valid sentences given a set of rules.
A formal language is a set of words and symbols along with a set of rules defining how those symbols may be used together. For example, “A boy threw the ball.” is a valid sentence, but “A threw boy ball the” makes no sense, because the words were put together in an invalid way.
Grammar
A grammar is a way of describing the syntax and symbols of a formal language. Grammars consist of two types of symbols (e.g., words, phrases, sentences): terminals and non-terminals. Any single word is a terminal symbol. When constructing sentences, we want to put words together in grammatically-correct ways using sentences, noun phrases, objects, etc. A non-terminal symbol represents a specific combinations of choices of terminals, such as a sentence.
For example, consider the following simple language consisting of the following terminals and non-terminals.
- Terminals
- the, a, boy, girl, runs, walks
- Non-terminals
- A sentence consists of three structures: article and object and verb.
- An article consists of either “the” or “a”.
- An object consists of either “boy” or “girl”.
- A verb consists of either “runs” or walks”.
This language contains the following valid sentences.
- the boy runs
- the boy walks
- a boy runs
- a boy walks
- the girl runs
- the girl walks
- a girl runs
- a girl walks
Backus-Naur Form (BNF) is a format for specifying grammars. Each line in a BNF grammar specification follows this format:
nonterminal ::= rule | rule | ... | rule
Each “rule” is some sequence of terminals or non-terminals separated by whitespace. The |
(pipe) character indicates a choice in possible rules. The grammar specified above would look like the following in BNF:
sentence::=article object verb
article::=the | a
object::=boy | girl
verb::=runs | walks
Notice that sentence
has multiple non-terminals put together in a sequence (requiring an article
then an object
then a verb
), whereas the others each consist of one of multiple choices.
In this assignment, we will use BNF to describe what constitutes a valid sentence according to the following formatting rules.
- Each line will contain exactly one occurrence of
::=
which is the separator between the name of the non-terminal and the choices. - A pipe (
|
) will separate each choice for the non-terminal. If there is only one rule (like withsentence
above), there will be no pipe on that line. - Whitespace separates tokens but doesn’t haven any special meaning. There will be at least one whitespace character between each part of a single rule. Extra whitespace should be removed.
- Case matters when comparing symbols. For example,
<S>
would not be considered the same non-terminal as<s>
. - If a symbol on the right never appears on the left of
::=
, it should be considered a terminal. - The text before
::=
will not be empty, does not contain a pipe character, and it does not contain any whitespace. - The text after
::=
will not be empty. Duplicate rules are allowed since including more than one copy of a rule would make it more likely to be selected.
GrammarSolver
GrammarMain
reads a BNF grammar input text file and passes its entire contents to GrammarSolver
as a list of strings. Implement the GrammarSolver
class that reads an input file with a grammar in BNF and uses recursion to randomly generate elements of the grammar. Break each string from the input list into symbols and rules so that it can output random elements of the grammar.
Method summary
public GrammarSolver(List<String> rules)
- Initializes a new grammar using the given BNF grammar rules where each rule corresponds to one line of text. Use regular expressions to break apart the rules and store them into a
SortedMap
1 where the keys are non-terminals and the values are rules. Do not modify the list of inputrules
. Throw anIllegalArgumentException
if the list is empty or if there are two or more entries in the grammar for the same non-terminal.
public boolean grammarContains(String symbol)
- Returns
true
if the givensymbol
is a non-terminal andfalse
otherwise. For example, for the example grammar above,grammarContains("sentence")
would returntrue
. However,grammarContains("foo")
orgrammarContains("boy")
(“boy” is a terminal in the language) would returnfalse
. public String getSymbols()
- Returns a string representation of the various non-terminal symbols from the grammar as a sorted, comma-separated list enclosed in square brackets. For example, calling
getSymbols()
for the example grammar above would return[article, object, sentence, verb]
. public String[] generate(String symbol, int times)
- Return
times
random occurrences of the givensymbol
in aString[]
. Each string should be compact: there should be exactly one space between each terminal and there should be no leading or trailing spaces. Throw anIllegalArgumentException
iftimes
is negative or if thesymbol
is not a non-terminal. When generating a non-terminal symbol, each of its rules should be applied with equal probability. Use theRandom
class to randomly choose between rules.
Development strategy
The most challenging method is generate
, so write it last. In the generate
method, the main goal is to generate a random occurrence of a given non-terminal symbol
using the following algorithm.
- Choose a random expansion rule for the given non-terminal
symbol
. - For each of the symbols in the rule, generate a random occurrence of that symbol.
The base case will be when the symbols in the rule chosen are not non-terminals. Make a recursive call to generate non-terminals. Use a loop to make multiple recursive calls, one for each non-terminal and introduce a private method to handle parameters for the recursive algorithm.
Remember that it is helpful to tackle difficult methods using iterative development by solving simpler versions of the problem first. Random programs can be difficult to validate correctness, and the generate
method you will implement uses randomness to decide which rule for a given non-terminal to use. To help you debug and validate your output, the Output Comparison Tool has a single example of what rules would be chosen if the program always used the first rule. This results in the sentence, “the big dog hit the big dog”.
While this is helpful for development, make sure the final solution is random! Use the Grammar Verifier to check that sentence output meets the grammar rules.
Check for correct whitespace in generate
by using some non-whitespace character, such as ~
, instead of spaces and inspecting the output visually. Add debugging println
statements to print parameter values to see the arguments passed to each recursive calls.
Splitting strings
Use the split
method in the String
class to split strings apart to handle rules containing the |
(pipe) or ` ` (space) characters. The split
method takes a string delimiter (what to split by) as an argument and returns a String[]
of the substrings split on the delimiter.
The delimiter passed to split
is considered a regular expression. Regular expressions are a category of small programming languages that match patterns of text and are often used for string manipulations. For instance, “abc” is a regular expression that matches “a followed by b followed by c”. The following regular expressions will be helpful.
- Splitting non-terminals from rules
- To split a string
line
based on where::=
occurs, use the regular expression::=
, which literally matches the characters::=
.String line = "example::=foo bar |baz"; String[] pieces = line.split("::="); // ["example", "foo bar |baz"]
- Splitting different rules
- Splitting a string
rules
on the|
character is more complicated than the “abc” example since|
is part of the regular expression syntax. In order to escape the regular expression syntax (like we do with\n
or\t
), use\\|
in Java.String rules = "foo bar|baz |quux mumble"; String[] pieces = rules.split("\\|"); // ["foo bar", "baz ", "quux mumble"]
- Splitting a single rule
- To split a string
rule
on whitespace, indicate “at least one whitespace” with the Java string\\s+
.\\s
in Java indicates “a single whitespace of any kind” while the+
afterwards indicates “one or more instances of the preceding character”.String rule = "the quick brown fox"; String[] pieces = rule.split("\\s+"); // ["the", "quick", "brown", "fox"]
Trimming whitespace
If the string we want to split begins with a whitespace character, we’ll often get an extra empty string at the front of the resulting array. Use the trim
method to remove leading and trailing whitespace.
String str = " lots of spaces \t";
String trimmedString = str.trim();
// "lots of spaces"
Complete example
<sentence>::=<nounp> <verbp>
<nounp>::=<det> <adjs> <noun>|<propnoun>
<propnoun>::=John|Jane|Sally|Spot|Fred|Elmo
<adjs>::=<adj>|<adj> <adjs>
<adj>::=big|green|wonderful|faulty|subliminal|pretentious
<det>::=the|a
<noun>::=dog|cat|man|university|father|mother|child|television
<verbp>::=<transverb> <nounp>|<intransverb>
<transverb>::=hit|honored|kissed|helped
<intransverb>::=died|collapsed|laughed|wept
Welcome to the cse143 random sentence generator.
What is the name of the grammar file? sentence.txt
Available symbols are:
[<adj>, <adjs>, <det>, <intransverb>, <noun>, <nounp>, <propnoun>, <sentence>, <transverb>, <verbp>]
What do you want generated (return to quit)? <sentence>
How many do you want me to generate? 5
Sally hit Jane
Spot hit John
Jane died
the green mother wept
the subliminal green man laughed
Available symbols are:
[<adj>, <adjs>, <det>, <intransverb>, <noun>, <nounp>, <propnoun>, <sentence>, <transverb>, <verbp>]
What do you want generated (return to quit)? <sentence>
Longer sentence.txt example
Welcome to the cse143 random sentence generator.
What is the name of the grammar file? sentence.txt
Available symbols to generate are:
[<adj>, <adjs>, <det>, <intransverb>, <noun>, <nounp>, <propnoun>, <sentence>, <transverb>, <verbp>]
What do you want generated (return to quit)? <det>
How many do you want me to generate? 5
the
a
a
the
a
Available symbols to generate are:
[<adj>, <adjs>, <det>, <intransverb>, <noun>, <nounp>, <propnoun>, <sentence>, <transverb>, <verbp>]
What do you want generated (return to quit)? <nounp>
How many do you want me to generate? 5
Jane
the wonderful pretentious green big father
Fred
the wonderful cat
a subliminal pretentious dog
Available symbols to generate are:
[<adj>, <adjs>, <det>, <intransverb>, <noun>, <nounp>, <propnoun>, <sentence>, <transverb>, <verbp>]
What do you want generated (return to quit)? <sentence>
How many do you want me to generate? 20
Jane kissed Spot
the wonderful wonderful pretentious television collapsed
Jane hit Fred
Jane helped a wonderful dog
Elmo helped the green wonderful man
a pretentious university helped Elmo
a wonderful green subliminal father hit Fred
Fred kissed Spot
Spot laughed
the green wonderful father collapsed
the big man helped John
a pretentious green faulty dog collapsed
Jane honored a green subliminal green child
Elmo hit Elmo
a green university died
the pretentious child honored a faulty wonderful subliminal television
Jane died
the faulty dog hit John
Elmo helped Fred
Elmo honored the pretentious big green father
Available symbols to generate are:
[<adj>, <adjs>, <det>, <intransverb>, <noun>, <nounp>, <propnoun>, <sentence>, <transverb>, <verbp>]
What do you want generated (return to quit)?
sentence2.txt example
Welcome to the cse143 random sentence generator.
What is the name of the grammar file? sentence2.txt
Available symbols to generate are:
[E, F1, F2, OP, T]
What do you want generated (return to quit)? T
How many do you want me to generate? 5
42
- y
x
x
( ( 1 ) )
Available symbols to generate are:
[E, F1, F2, OP, T]
What do you want generated (return to quit)? E
How many do you want me to generate? 10
x - 1
0
sin ( 1 + 92 + - 1 / 42 )
max ( y , 92 )
42 \% 1
- 42
92
1
92
42 - sin ( 1 )
Available symbols to generate are:
[E, F1, F2, OP, T]
What do you want generated (return to quit)?
Grading
Appropriately utilize recursion to implement the algorithm as described in the method summary. Simplify the recursive logic as much as possible to reduce the number of special cases and redundant code needed to solve the problem. Make sure that any additional “helper” methods have private
access.
- Control structure errors
- Extra recursive cases or extra base cases that are not needed. For example, structuring a solution that handles a base case preemptively rather than leaving it for a future recursive call.
- Method errors
- Data structure errors
- Excessive inefficiency when using collections. For example, using a for-each loop to find a value in a map when
get
would return the value directly.
- Excessive inefficiency when using collections. For example, using a for-each loop to find a value in a map when
Reflection
Create a new file reflection.txt
and write a paragraph-long reflection answering the following questions (along with anything else you find appropriate):
- What did you learn this week?
- What did you enjoy in the course this week?
- What did you find challenging or frustrating this week?
- What did you find particularly helpful for your learning this week?
Submission
Submit GrammarSolver.java
and reflection.txt
to Grade-It!
SortedMap
is a more specialized version of theMap
interface that is implemented byTreeMap
. ↩