for (int i = 0; i < list.size(); i++) { System.out.println(list.get(i)); }This code works, but it relies on a "get" method that can quickly access any element of the array. This is known as random access. If you knew that for the rest of your life, you'd always be working with arrays, then you'd have little use for iterators. You'd just call the get method because with arrays you get fast random access.
But many of the other data structures we will be looking at don't have this kind of quick access. Think of how a DVD works, quickly jumping to any scene, versus how a VHS tape works, requiring you to fast forward through every scene until you get to the one you want. For those other structures we will study, iterators will make a lot more sense. So iterators will seem a bit silly when tied to an array-based structure, but we'll eventually see much more interesting examples of iterators.
In general, we think of an iterator as having three basic operations:
ArrayIntListIterator i = list.iterator(); while (i.hasNext()) { System.out.println(i.next()); }This involves a new kind of object of type ArrayIntListIterator. We get one by calling a special method in the list class. Once we have our iterator, we can use it to go through the values in the list one at a time.
I also briefly mentioned that iterators often support a method called remove that allows you to remove the value that you most recently get from a call on next. For example, this variation of the code prints each value and removes any occurrences of the value 3:
ArrayIntListIterator i = list.iterator(); while (i.hasNext()) { int n = i.next(); System.out.println(n); if (n == 3) { i.remove(); } }This code examines each value in the list and removes all the occurrences of 3. We also looked at "tricky" cases for remove. What could cause it to fail? Someone mentioned that removing something twice might be a problem, as in:
while (i.hasNext()) { int n = i.next(); if (n == 3) { i.remove(); i.remove(); } }It would also be a problem to try removing before next has been called.
I then spent some time talking about the built-in ArrayList class. Remember that we're studying the ArrayIntList class as a way to understand the built-in ArrayList class. I first had to discuss relatively new feature of Java known as "generics." We know that for arrays, it is possible to construct arrays that store different types of data:
For example, suppose, we want an ArrayList of Strings. We describe the type as:
ArrayList<String>When we construct an ArrayIntList, we say:
ArrayIntList lst = new ArrayIntList();Imagine replacing both occurrences of "ArrayIntList" with "ArrayList<String>" and you'll see how to construct an ArrayList<String>:
ArrayList<String> lst = new ArrayList<String>();And in the same way that you would declare a method header for manipulating an ArrayIntList object:
public void doSomethingCool(ArrayIntList lst) { ... }You can use ArrayList<String> in place of ArrayIntList to declare a method that takes an ArrayList<String> as a parameter:
public void doSomethingCool(ArrayList<String> list) { ... }It can even be used as a return type if you want to have the method return an ArrayList:
public ArrayList<String> doSomethingCool(ArrayList<String> list) { ... }Once you have declared an ArrayList<String>, you can use manipulate it with the kinds of calls we have made on our ArrayIntList but using Strings instead of ints:
ArrayList<String> list = new ArrayList<String>(); list.add("hello"); list.add("there"); list.add(0, "fun"); System.out.println(list);which produces this output:
[fun, hello, there]All of the methods we have seen with ArrayIntList are defined for ArrayList: the appending add, add at an index, remove, size, get, etc. So we could write the following loop to print each String from an ArrayList<String>:
for (int i = 0; i < lst.size(); i++) { System.out.println(lst.get(i)); }I then spent a little time discussing the issue of primitive data versus objects. Even though we can construct an ArrayList<E> for any class E, we can't construct an ArrayList<int> because int is a primitive type, not a class. To get around this problem, Java has a set of classes that are known as "wrapper" classes that "wrap up" primitive values like ints to make them an object. It's very much like taking a candy and putting a wrapper around it. For the case of ints, there is a class known as Integer that can be used to store an individual int. Each Integer object has a single data field: the int that it wrapped up inside.
Java 5 also has quite a bit of support that makes a lot of this invisible to programmers. If you want to put int values into an ArrayList, you have to remember to use the type ArrayList<Integer> rather than ArrayList<int>, but otherwise Java does a lot of things for you. For example, you can construct such a list and add simple int values to it:
ArrayList<Integer> list = new ArrayList<Integer>(); list.add(18); list.add(34);In the two calls on add, we are passing simple ints as arguments to something that really requires an Integer. This is okay because Java will automatically "box" the ints for us (i.e., wrap them up in Integer objects). We can also refer to elements of this list and treat them as simple ints, as in:
int product = list.get(0) * list.get(1);The calls on list.get return references to Integer objects and normally you wouldn't be allowed to multiply two objects together. In this case Java automatically "unboxes" the values for you, unwrapping the Integer objects and giving you the ints that are contained inside.
Every primitive type has a corresponding wrapper class: Integer for int, Double for double, Character for char, Boolean for boolean, and so on.
Then I mentioned that I hoped people are aware of the array initializer syntax where you can use curly braces to specify a set of values to use for initializing an array:
int[] data = {8, 27, 93, 4, 5, 15, 206};This is a great way to define data to use for a testing program. I asked people how we'd find the product of this list and people suggested the standard approach that uses an int to index the array:
int product = 1; for (int i = 0; i < data.length; i++) { product *= data[i]; }This approach works, but there is a simpler way to do this. If all you want to do is to iterate over the values of an array one at a time, you can use what is called a for-each loop:
int product = 1; for (int n : data) { product *= n; }We generally read the for loop header as, "For each int n in data...". The choice of "n" is arbitrary. It defines a local variable for the loop. I could just as easily have called it "x" or "foo" or "value". Notice that in the for-each loop, I don't have to use any bracket notation. Instead, each time through the loop Java sets the variable n to the next value from the array. I also don't need to test for the length of the array. Java does that for you when you use a for-each loop.
There are some limitations of for-each loops. You can't use them to change the contents of the list. If you assign a value the variable n, you are just changing a local variable inside the loop. It has no effect on the array itself.
As with arrays, we can use a for-each loop for ArrayLists, so we could say:
String[] data2 = {"four", "score", "and", "seven", "years", "ago"}; ArrayList<String> lst = new ArrayList<String>(); for (String s : data2) { lst.add(s); } System.out.println(lst);which produces this output:
[four, score, and, seven, years, ago]I also mentioned that with the next programming assignment, we are asking you to start using more features from the Java class libraries. In particular, for this next programming assignment we are going to use a collection known as a SortedMap.
As an example, I asked people how we could write a program that would count all of the occurrences of various words in an input file. I had a copy of the text of Moby Dick that we looked at to think about this. I showed some starter code that constructs a Scanner object tied to a file:
import java.util.*; import java.io.*; public class WordCount { public static void main(String[] args) throws FileNotFoundException { Scanner console = new Scanner(System.in); System.out.print("What is the name of the text file? "); String fileName = console.nextLine(); Scanner input = new Scanner(new File(fileName)); while (input.hasNext()) { String next = input.next(); // process next } } }Notice that in the loop we use input.next() to read individual words and we have this in a while loop testing against input.hasNext(). I pointed out that we'll have trouble with things like capitalization and punctuation. I said that we should at least turn the string to all lowercase letters so that we don't count Strings like "The" and "the" as different words:
while (input.hasNext()) { String next = input.next().toLowerCase(); // process next }But I said that dealing with punctuation was more than I wanted to attempt in this program, so I decided that we'd live with the fact that Strings like "the" and "the," and "the." would be considered different words. We're looking for a fairly simple example here, so I didn't want to worry too much about punctuation.
To flesh out this code, we had to think about what kind of data structure to use to keep track of words and their frequencies. One person suggested that we use a hashtable. I said that this is related to the data abstraction known as a map.
The idea behind a map is that it keeps track of key/value pairs. In our case, we want to keep track of word/count pairs (what is the count for each different word). We often store data this way. For example, in the US we often use a person's social security number as a key to get information about them. I would expect that if I talked to the university registrar, they probably have the ability to look up students based on social security number to find their transcript.
In a map, there is only one value for any given key. If you look up a social security number and get three different student transcripts, that would be a problem. With the Java map objects, if you already have an entry in your map for a particular key, then any attempt to put a new key/value pair into the map will overwrite the old mapping.
We looked at an interface in the Java class libraries called Map that is a generic interface. That means that we have to supply type information. It's formal description is Map<K, V>. This is different from the Queue interface in that it has two different types. That's because the map has to know what type of keys you have and what type of values you have. In our case, we have some words (Strings) that we want to associated with some counters (ints). We can't actually use type int because it is a primitive type, but we can use type Integer.
We are going to use a slight variation of Map known as SortedMap. A SortedMap is one that keeps its keys in sorted order. For us, that would mean that the words from the file will be kept in sorted order, which is a nice feature to implement. More importantly, you'll need to use a SortedMap for your homework assignment, so we want to practice using that one.
So our map would be of type SortedMap<String, Integer>. In other words, it's a a map that keeps track of String/Integer pairs (this String goes to this Integer). SortedMap is the name of the interface, but it's not an actual implementation. The implementation we will use is TreeMap. So we can construct a map called "count" to keep track of our counts by saying:
SortedMap<String, Integer> count = new TreeMap<String, Integer>();There are only a few methods that we'll be using from the SortedMap interface. The most basic allow you to put something into the map (an operation called put) and to ask the map for the current value of something (an operation called get).
I asked what code we need to record the word in our map. Someone suggested using the put method to assign it to a count of 1. So our loop becomes:
SortedMap<String, Integer> wordCounts = new TreeMap<String, Integer>(); while (input.hasNext()) { String next = input.next().toLowerCase(); wordCounts.put(next, 1); }This doesn't quite work, but it's getting closer. Each time we encounter a word, it adds it to our map, associating it with a count of 1. This will figure out what the unique words are, but it won't have the right counts for them.
I asked people to think about what to do if a word has been seen before. In that case, we want to increase its count by 1. That means we have to get the old value of the count and add 1 to it:
wordCounts.get(next) + 1and make this the new value of the counter:
wordCounts.put(next, wordCounts.get(next) + 1);So we have two different calls on put. We want to call the first one when the word is first seen and call the second one if it's already been seen. Someone suggested using an if/else for this. The only question is what test to use. The SortedMap includes a method called containsKey that tests whether or not a certain value is a key stored in the map. Using this method, we modified our code to be:
SortedMap<String, Integer> wordCounts = new TreeMap<String, Integer>(); while (input.hasNext()) { String next = input.next().toLowerCase(); if (!wordCounts.containsKey(next)) { wordCounts.put(next, 1); } else { wordCounts.put(next, wordCounts.get(next) + 1); } }The first time we see a word, we call the put method and say that the map should associate the word with a count of 1. Later we call put again with a higher count. And we keep calling put every time the count goes up. What happens to the old values that we had put in the map previously? The way the map works, each key is associated with only one value. So when you call put a second or third time, you are wiping out the old association. The new key/value pair replaces the old key/value pair in the map.
Then we talked about how to print the results. Clearly we need to iterate over
the entries in the map. One way to do this is to request what is known as the
"key set". The key set is the set of all keys contained in the map. The Java
documentation says that it will be of type Set
One final point I made about the SortedMap interface is that you can associate
just about anything with just about anything. In the word counting program, we
associated strings with integers. You could also associate strings with
strings. One thing you can't do is to have multiple associations in a single
map. For example, if you decide to associate strings with strings, then any
given string can be associated with just a single string. But there's no
reason that you can't have the second value be structured in some way. You can
associate strings with arrays or strings with ArrayLists.
for (String word : wordCounts.keySet()) {
// process word
}
We would read this as, "for each String word that is in wordCounts.keySet()..."
To process the word, we simply print it out along with its count. How do we
get its count? By calling the get method of the map:
for (String word : wordCounts.keySet()) {
System.out.println(wordCounts.get(word) + "\t" + word);
}
I didn't try to print all of the words in Moby Dick because it would
have produced too much output. Instead, I had it show me the counts of words
in the program itself. Obviously for large files we want some mechanism to
limit the output. At that point I passed out the handout with my commented
solution. In that version, I include some extra code that asks for a minimum
frequency to use. We ran that on Moby Dick and saw this list of words
that occur at least 500 times:
What is the name of the text file? moby.txt
Minimum number of occurrences for printing? 500
4571 a
1354 all
587 an
6182 and
563 are
1701 as
1289 at
973 be
1691 but
1133 by
1522 for
1067 from
754 had
741 have
1686 he
552 him
2459 his
1746 i
3992 in
512 into
1555 is
1754 it
562 like
578 my
1073 not
506 now
6408 of
933 on
775 one
675 or
882 so
599 some
2729 that
14092 the
602 their
506 there
627 they
1239 this
4448 to
551 upon
1567 was
644 were
500 whale
552 when
547 which
1672 with
774 you
Although I show the output here as being lined up, it didn't look that way in
jGRASP. For some reason jGRASP is handling tab characters badly in output.
Stuart Reges
Last modified: Mon Apr 7 16:02:47 PDT 2008