Link Search Menu Expand Document

Letter Inventory

Implementing a character vector data type with arrays.

Natural language processing (NLP) is an interdisciplinary area of computer science focused on programming computers to work with natural languages used by humans everyday. Since all data in computers are ultimately represented as numbers, a key challenge in NLP is vectorization: the process of turning a natural language text into a meaningful numeric representation.

A letter inventory is a data type that represents a character-by-character count vectorization for letters in the English alphabet. Each letter inventory keeps track of how many a’s, how many b’s, etc. are contained in a given string. For example, the letter inventory for the string “Washington State” corresponds to [aaeghinnosstttw]. The case of the letter is ignored, so ‘s’ and ‘S’ are exactly the same.

Letter Inventory

Oct 5
ArrayIntList
BJP 10.1, 15.1, 15.2
  1. Define classes with an array (storing primitive data types or strings) as a field.
  2. Define methods that reassign field values while maintaining class invariants.
  3. Document pre/post conditions and throw exceptions to report problems to clients.
Oct 6
SectionArrayIntList
Oct 7
Stacks and Queues
BJP 14
  1. Define static methods that accept multiple objects as parameters.
  2. Loop over each item in a queue using the add, remove, and size methods.
  3. Loop over each item in a stack, storing popped items in another queue or stack.
Oct 8
SectionStacks and Queues
Oct 9
Algorithm Analysis
BJP 13.2
  1. Apply the runtime analysis process to formally describe an algorithm’s runtime.
  2. Describe an example where a queue or stack would be preferred over a list.
  3. Explain why ArrayList does not implement the Queue interface.

Specification

Implement a LetterInventory data type that represents an inventory of the 26 letters in the English alphabet. It should be represented with an array.

LetterInventory(String data)
Constructs an inventory of the letters in the given string, ignoring the case of letters and ignoring any non-alphabetic characters. Store the inventory (how many a’, how many b’s, …) as an array of 26 counters (one for each letter) along with any other necessary data fields.
int get(char letter)
Returns a count of how many of this letter are in the inventory. Letter might be lowercase or uppercase. If a nonalphabetic character is passed, throw an IllegalArgumentException.
void set(char letter, int value)
Sets the count for the given letter to the given value. The given letter might be lowercase or uppercase. If a nonalphabetic character is passed or if value is negative, throw an IllegalArgumentException.
int size()
Returns the sum of all of the counts in this inventory. This operation should be fast in that it should store the size rather than having to compute it each time this method is called.
boolean isEmpty()
Returns true if this inventory is empty (all counts are 0). This operation should be fast in that it should not need to examine each of the 26 counts when it is called.
String toString()
Returns a String representation of the inventory with the letters all in lowercase and in sorted order and surrounded by square brackets. The number of occurrences of each letter should match its count in the inventory. For example, an inventory of 4 a’s, 1 b, 1 l and 1 m would be represented as [aaaablm].
LetterInventory add(LetterInventory other)
Constructs and returns a new LetterInventory object that represents the sum of this letter inventory and the other given LetterInventory. The counts for each letter should be added together. The two LetterInventory objects being added together (this and other) should not be changed by this method.
LetterInventory washington = new LetterInventory("Washington");
LetterInventory state = new LetterInventory("State");
System.out.println(washington.add(state));
washington corresponds to [aghinnostw], state corresponds to [aestt], and the result of adding them corresponds to [aaeghinnosstttw].
LetterInventory subtract(LetterInventory other)
Constructs and returns a new LetterInventory object that represents the result of subtracting the other inventory from this inventory (i.e., subtracting the counts in the other inventory from this object’s counts). If any resulting count would be negative, return null. The two LetterInventory objects being subtracted (this and other) should not be changed by this method.

The values of type char have corresponding integer values. There is a character with value 0, a character with value 1, a character with value 2, and so on. Different values of type char can be compared using the less-than and greater-than operators.

char c = 'w';
if (c >= 'a') {
    // Since 'w' >= 'a', we'll take this branch
    ...
}

Use char arithmetic to determine array indices and offsets. char represents characters as numbers. The lowercase letters are all grouped together: ‘a’ then ‘b’ then ‘c’ and so on. Likewise, uppercase letters are also grouped together: ‘A’ then ‘B’ then ‘C’ and so on. This numbering system allows us to determine the distance between two letters using subtraction: 'c' - 'a' is 2 since ‘c’ is 2 letters after ‘a’ in the alphabet. To programmatically compute the letter 8 alphabet positions after ‘a’, use addition and then cast the result to char. (The result is the letter ‘i’.)

char result = (char) ('a' + 8);

Getting the lowercase version of some text depends on whether the text is a String or a char. Since strings are objects in Java, we can call the toLowerCase instance method. But since char values are just numbers, Java provides the Character.toLowerCase static method instead.

String s = "Washington State";
String l = s.toLowerCase();
char c = 'W';
char l = Character.toLowerCase(c);

Development strategy

One of the most important techniques for software professionals is to develop code in stages rather than trying to write it all at once. This development strategy is formally known as iterative enhancement or stepwise refinement. Writing code in stages enables software developers to test each stage independently for correctness.

First, add stubs for each required operation so that we can run tests without implementing the entire class. An example constructor could throw an UnsupportedOperationException.

public LetterInventory(String s) {
    throw new UnsupportedOperationException("not implemented yet");
}

Then, write the program in three stages.

Constructing a LetterInventory and examining its contents
First, implement the constructor, size, and isEmpty methods. Then, implement the get method. Finally, implement the toString method.
Methods for modifying a LetterInventory
Add the set method, which allows the client to change the number of occurrences of an individual letter.
Complex behavior
Implement the add method and then implement the subtract method.

Web app

To launch the web app, Open Terminal , paste the following command, and press Enter.

javac Server.java && java Server data/english.txt; rm *.class

Then, open the Network dropdown and select host 0.0.0.0:8000.

When you’re done, return to the terminal and enter the key combination Ctrl C to close the server.

Grading

Mark the program to submit it for grading. In addition to the automated tests, the final submission will also undergo code review for internal correctness on comments, variable names, indentation, data fields, and code quality.

Commenting errors
Class header missing or doesn’t describe both student and program.
Method header missing or does not document pre and post-conditions.
Method header does not describe exceptions that are thrown when preconditions are violated, including the specific type of exception and the conditions under which it is thrown.
Method header does not document important behavior including subtle cases like an empty structure or a search value not found.
Implementation details included in comments for a class or its public methods.
Control structure errors
Bad Boolean Zen.
Data structure errors
Extra data structures (including strings) that aren’t necessary.
Bad usage of arrays, e.g. unusual indexing/usage.
Class design errors
Extraneous data fields.
Initializing non-final data fields outside of a constructor.
Non-private data fields.
Using a specific numerical value when the value can be obtained in a more general way. For example, even if an array called data is expected to be of length 100, code to manipulate it after construction should use data.length instead of 100.