Letter Inventory
Implementing a character vector data type with arrays.
Natural language processing (NLP) is an interdisciplinary area of computer science focused on programming computers to work with natural languages used by humans everyday. Since all data in computers are ultimately represented as numbers, a key challenge in NLP is vectorization: the process of turning a natural language text into a meaningful numeric representation.
A letter inventory is a data type that represents a character-by-character count vectorization for letters in the English alphabet. Each letter inventory keeps track of how many a’s, how many b’s, etc. are contained in a given string. For example, the letter inventory for the string “Washington State” corresponds to [aaeghinnosstttw]
. The case of the letter is ignored, so ‘s’ and ‘S’ are exactly the same.
Letter Inventory
- Oct 5
- ArrayIntList
- BJP 10.1, 15.1, 15.2
- Define classes with an array (storing primitive data types or strings) as a field.
- Define methods that reassign field values while maintaining class invariants.
- Document pre/post conditions and throw exceptions to report problems to clients.
- Oct 6
- SectionArrayIntList
- Oct 7
- Stacks and Queues
- BJP 14
- Define static methods that accept multiple objects as parameters.
- Loop over each item in a queue using the
add
,remove
, andsize
methods. - Loop over each item in a stack, storing popped items in another queue or stack.
- Oct 8
- SectionStacks and Queues
- Oct 9
- Algorithm Analysis
- BJP 13.2
- Apply the runtime analysis process to formally describe an algorithm’s runtime.
- Describe an example where a queue or stack would be preferred over a list.
- Explain why
ArrayList
does not implement theQueue
interface.
Specification
Implement a LetterInventory
data type that represents an inventory of the 26 letters in the English alphabet. It should be represented with an array.
LetterInventory(String data)
- Constructs an inventory of the letters in the given string, ignoring the case of letters and ignoring any non-alphabetic characters. Store the inventory (how many a’, how many b’s, …) as an array of 26 counters (one for each letter) along with any other necessary data fields.
int get(char letter)
- Returns a count of how many of this letter are in the inventory. Letter might be lowercase or uppercase. If a nonalphabetic character is passed, throw an
IllegalArgumentException
. void set(char letter, int value)
- Sets the count for the given letter to the given value. The given letter might be lowercase or uppercase. If a nonalphabetic character is passed or if value is negative, throw an
IllegalArgumentException
. int size()
- Returns the sum of all of the counts in this inventory. This operation should be fast in that it should store the size rather than having to compute it each time this method is called.
boolean isEmpty()
- Returns true if this inventory is empty (all counts are 0). This operation should be fast in that it should not need to examine each of the 26 counts when it is called.
String toString()
- Returns a
String
representation of the inventory with the letters all in lowercase and in sorted order and surrounded by square brackets. The number of occurrences of each letter should match its count in the inventory. For example, an inventory of 4 a’s, 1 b, 1 l and 1 m would be represented as[aaaablm]
. LetterInventory add(LetterInventory other)
- Constructs and returns a new
LetterInventory
object that represents the sum of this letter inventory and the other givenLetterInventory
. The counts for each letter should be added together. The twoLetterInventory
objects being added together (this
andother
) should not be changed by this method.LetterInventory washington = new LetterInventory("Washington"); LetterInventory state = new LetterInventory("State"); System.out.println(washington.add(state));
washington
corresponds to[aghinnostw]
,state
corresponds to[aestt]
, and the result of adding them corresponds to[aaeghinnosstttw]
. LetterInventory subtract(LetterInventory other)
- Constructs and returns a new
LetterInventory
object that represents the result of subtracting the other inventory from this inventory (i.e., subtracting the counts in the other inventory from this object’s counts). If any resulting count would be negative, returnnull
. The twoLetterInventory
objects being subtracted (this
andother
) should not be changed by this method.
The values of type char
have corresponding integer values. There is a character with value 0, a character with value 1, a character with value 2, and so on. Different values of type char
can be compared using the less-than and greater-than operators.
char c = 'w';
if (c >= 'a') {
// Since 'w' >= 'a', we'll take this branch
...
}
Use char
arithmetic to determine array indices and offsets. char
represents characters as numbers. The lowercase letters are all grouped together: ‘a’ then ‘b’ then ‘c’ and so on. Likewise, uppercase letters are also grouped together: ‘A’ then ‘B’ then ‘C’ and so on. This numbering system allows us to determine the distance between two letters using subtraction: 'c' - 'a'
is 2 since ‘c’ is 2 letters after ‘a’ in the alphabet. To programmatically compute the letter 8 alphabet positions after ‘a’, use addition and then cast the result to char
. (The result is the letter ‘i’.)
char result = (char) ('a' + 8);
Getting the lowercase version of some text depends on whether the text is a String
or a char
. Since strings are objects in Java, we can call the toLowerCase
instance method. But since char
values are just numbers, Java provides the Character.toLowerCase
static method instead.
String s = "Washington State";
String l = s.toLowerCase();
char c = 'W';
char l = Character.toLowerCase(c);
Development strategy
One of the most important techniques for software professionals is to develop code in stages rather than trying to write it all at once. This development strategy is formally known as iterative enhancement or stepwise refinement. Writing code in stages enables software developers to test each stage independently for correctness.
First, add stubs for each required operation so that we can run tests without implementing the entire class. An example constructor could throw an UnsupportedOperationException
.
public LetterInventory(String s) {
throw new UnsupportedOperationException("not implemented yet");
}
Then, write the program in three stages.
- Constructing a
LetterInventory
and examining its contents - First, implement the constructor,
size
, andisEmpty
methods. Then, implement theget
method. Finally, implement thetoString
method. - Methods for modifying a
LetterInventory
- Add the
set
method, which allows the client to change the number of occurrences of an individual letter. - Complex behavior
- Implement the
add
method and then implement thesubtract
method.
Web app
To launch the web app, Open Terminal
javac Server.java && java Server data/english.txt; rm *.class
Then, open the Network
When you’re done, return to the terminal and enter the key combination Ctrl
C
to close the server.
Grading
Mark the program to submit it for grading. In addition to the automated tests, the final submission will also undergo code review for internal correctness on comments, variable names, indentation, data fields, and code quality.
- Commenting errors
- Class header missing or doesn’t describe both student and program.
- Method header missing or does not document pre and post-conditions.
- Method header does not describe exceptions that are thrown when preconditions are violated, including the specific type of exception and the conditions under which it is thrown.
- Method header does not document important behavior including subtle cases like an empty structure or a search value not found.
- Implementation details included in comments for a class or its public methods.
- Method header missing or does not document pre and post-conditions.
- Control structure errors
- Bad Boolean Zen.
- Data structure errors
- Extra data structures (including strings) that aren’t necessary.
- Bad usage of arrays, e.g. unusual indexing/usage.
- Class design errors
- Extraneous data fields.
- Initializing non-final data fields outside of a constructor.
- Non-private data fields.
- Using a specific numerical value when the value can be obtained in a more general way. For example, even if an array called
data
is expected to be of length 100, code to manipulate it after construction should usedata.length
instead of 100. - Initializing non-final data fields outside of a constructor.