Link Search Menu Expand Document

Language Generator

Generating sentences with recursion and randomness.1

In syntactocentric linguistics, recursive syntax (embedding phrases within phrases) has been hypothesized to be the critical feature of human language capacity. Although this claim has been challenged in recent decades, generating understandable human language remains a key challenge for computer programming given the infinite expressiveness of human language. Can we define a program to generate all possible English sentences?

Language Generator

Oct 26
Recursive Programming
BJP 12.3, 12.4
  1. Apply the three-step outline to define programs with a single recursive call.
  2. Define recursive programs by passing parameters to a private helper method.
Oct 27
SectionRecursive Programming
Oct 28
Structural Recursion
  1. Define public/private paired recursive programs to traverse linked nodes.
  2. Apply the x = change(x) pattern to recursively change linked node references.
Oct 29
SectionStructural Recursion
Oct 30
Generative Recursion
  1. Trace the execution of programs with more than one recursive call.
  2. Define generative recursive programs that create new data on each recursive call.

Most (if not all) human languages involve combining words in certain ways that correspond to valid sentences. The rules that determine valid and invalid sentences are part of the language’s grammar.

Formal language
A set of words and symbols along with a set of production rules defining how those symbols may be used together. For example, “A boy threw the ball.” is a valid sentence, but “A threw boy ball the” makes no sense, because the words were put together in an invalid manner.
Grammar
A way of describing the syntax and symbols of a formal language. When constructing sentences, words are put together in grammatically-correct ways using structures such as sentences, noun phrases, and objects. Grammars consist of two types of symbols that represent either individual words (terminal) or combinations of words such as phrases and sentences (non-terminal).

Generative grammar is a linguistic theory that supposes a system of rules can be followed to generate every possible grammatical expression in a formal language. For example, consider this formal language with the following terminals and non-terminals.

Terminals
the, a, boy, girl, runs, walks
Non-terminals
  • A sentence consists of an article followed by an object followed by a verb.
  • An article consists of either “the” or “a”.
  • An object consists of either “boy” or “girl”.
  • A verb consists of either “runs” or walks”.

This language contains the following valid sentences.

  • the boy runs
  • the boy walks
  • a boy runs
  • a boy walks
  • the girl runs
  • the girl walks
  • a girl runs
  • a girl walks

Specification

Implement a LanguageGenerator class that generates valid strings according to the production rules defined by the given Grammar. A Grammar consists of a Supplier field with a get() method that returns a Map<String, String[]> representing the production rules.

LanguageGenerator(Grammar grammar)
Constructs a new LanguageGenerator for the given grammar.
LanguageGenerator(Grammar grammar, Random random)
Constructs a new LanguageGenerator for the given grammar and source of randomness.
String generate(String target)
Returns a string generated by following the production rule for the given target. The resulting string should be compact: there should be exactly one space between each terminal and no leading or trailing spaces. Choose between potential production rules for the given target by calling random.nextInt so that each possibility is equally likely to be chosen.

The base case occurs when the target is a terminal symbol. Otherwise, in the recursive case (a non-terminal symbol):

  1. Choose a random production rule for the given non-terminal target symbol.
  2. For each symbol in the production rule, recursively generate an occurrence of that symbol.

Split a production rule on one or more whitespace characters by calling split("\\s+").

String text = "lots of spaces";
String[] terms = text.split("\\s+");
// ["lots", "of", "spaces"]

Call the trim method to remove leading and trailing whitespace from the final result.

String text = "   lots   of  spaces    ";
String trimmed = text.trim();
// "lots   of  spaces"

Web app

To launch the web app, Open Terminal , paste the following command, and press Enter.

javac -cp ".:/course/lib/*" Server.java && java -cp ".:/course/lib/*" Server; rm *.class

Then, open the Network dropdown and select host 0.0.0.0:8000.

When you’re done, return to the terminal and enter the key combination Ctrl C to close the server.

Grading

Mark the program to submit it for grading. In addition to the automated tests, the final submission will also undergo code review for internal correctness on comments, variable names, indentation, data fields, and code quality.

Control structure errors
Extra recursive cases or extra base cases that are not needed. For example, introducing additional “arm’s length” base cases that solve the problem preemptively.
Method errors
Unnecessary parameters with redundant information when it could be accessed from another source.
Data structure errors
Excessive inefficiency when using collections. For example, using a for-each loop to find a value in a map when get would return the value directly.
  1. Julie Zelenski. 1999. Random Sentence Generator. In Nifty Assignments 1999. https://www-cs-faculty.stanford.edu/~zelenski/rsg/