I first discussed the built-in ArrayList class. We began by looking at this client code:
ArrayList<String> list = new ArrayList<String>();
list.add("four");
list.add("score");
list.add("seven");
list.add("years");
list.add("what was next?");
list.add("ago");
list.add(2, "and");
list.remove(5);
System.out.println("list = " + list);
System.out.println(list.indexOf("seven"));
which produces this output:
list = [four, score, and, seven, years, ago]
3
All of the methods we have seen with ArrayIntList are defined for ArrayList:
the appending add, add at an index, remove, size, get, etc. I asked what's
wrong with this code and someone pointed out that I should be using an
interface for the type:
List<String> list = new ArrayList<String>();
Then we talked about how to loop over the structure. We can use the size and
get methods to write a loop that looks a lot like an array processing loop:
for (int i = 0; i < list.size(); i++) {
System.out.println(list.get(i));
}
This is often a reasonable way to manipulate a list, but it relies on the "get"
method being able to quickly access any element of the structure. This
property is known as random access. We say that arrays and the
ArrayList and ArrayIntList classes that are built on top of them are random
access structures because we can quickly access any element of the structure.
If you knew that for the rest of your life, you'd always be working with
arrays, then you'd have little use for iterators. You'd just call the get
method because with arrays you get fast random access.But not all data structures have quick access. If we used a LinkedList instead of an ArrayList, the loop above would be very expensive because the get method requires starting at the front of the list each time to get to the appropriate value. So what would normally be an O(n) operation would become an O(n2) operation with the code above.
I said that I wanted to explore a different approach using what is known as an iterator. In general, we think of an iterator as having three basic operations:
Iterator<String> i = list.iterator();
while (i.hasNext()) {
System.out.println(i.next());
}
This involves a new kind of object of type Iterator<String>. Iterator<E> is an
interface in the java.util package. Notice that we ask the list to construct
the iterator for us by calling the method called "iterator". Once we have our
iterator, we use a while loop to print out the next value as long as there is a
next value to process.Then I discussed the for-each loop. It is implemented using an iterator but provides a simpler syntax for those situations where you simply want to go through all of the data in your collection from beginning to end. We can rewrite our printing loop as:
for (String s : list) {
System.out.println(s);
}
We generally read the for-each header as, "For each String s in list...". The
choice of "s" is arbitrary. It defines a local variable for the loop. I could
just as easily have called it "x" or "foo" or "value". This for-each loop is
implemented by constructing an iterator and executing the same code that we had
previously.There are some limitations of for-each loops. You can't use them to change the contents of the list. If you assign a value the variable s, you are just changing a local variable inside the loop. It has no effect on the list itself.
Next, I mentioned that we will be looking at a collection known as a Set. Java has an interface Set<E> that is implemented by HashSet<E> and TreeSet<E>. The HashSet is a bit faster, but doesn't keep the values in any particular order. The TreeSet keeps values in sorted order.
For example, to make a set of integers using an array of data, we can say:
int[] data = {18, 4, 97, 3, 4, 18, 72, 4, 42, 42, -3};
Set<Integer> s = new TreeSet<Integer>();
for (int n : data) {
s.add(n);
}
System.out.println("set = " + s);
This produced the following output:
set = [-3, 3, 4, 18, 42, 72, 97]
There are two major differences between a set and a list. Sets don't allow
duplicates. So the duplicate values like 42 and 4 in the array appear just
once in the set. Sets also don't allow the client to control the order of
elements. The TreeSet class keeps things in sorted order. So the numbers will
always be in that order. If you want to control the order, then you should use
a list instead.Sets have many of the same methods that lists do. You can add to a set, get its size, ask for an iterator, use it with a foreach loop. But it doesn't have a notion of indexing. So you can't remove at an index. Instead you remove a specific value. And you can't get at a specific index. Instead you use an iterator or a foreach loop.
We saw that we could name a specific value to remove from a set, and use it win a for-each loop, as in:
numbers.remove(97);
for (int n : numbers) {
System.out.println(n);
}
This produced the following output:
-3
3
4
18
42
72
Notice that this produced each of the numbers from the original version of the
set but without the value 97.Then we talked about how to remove values from a set using an iterator. You can remove specific values directly, as we did with 97, but more often we want to examine each value in the set and remove the values that have a certain property. We do this by calling the iterator's remove method. It removes the most recent value returned by a call on the iterator's method called next.
For example, we wrote this loop to remove the values from the list that are multiples of 3:
Iterator<Integer> i2 = numbers.iterator();
while (i2.hasNext()) {
int n = i2.next();
if (n % 3 == 0) {
System.out.println("removing " + n);
i2.remove();
}
}
System.out.println("set = " + numbers);
This produced the following output:
removing -3
removing 3
removing 18
removing 42
removing 72
set = [4]
In other words, we ended up removing everything but the value 4. We briefly
discussed the limitations of the remove method:
Then we began a discussion of a program to examine a file of words. As an example, I asked people how we could write a program that would count the number of unique words in an input file. I had a copy of the text of Moby Dick that we looked at to think about this. I showed some starter code that constructs a Scanner object tied to a file:
import java.util.*;
import java.io.*;
public class WordCount {
public static void main(String[] args) throws FileNotFoundException {
Scanner console = new Scanner(System.in);
System.out.print("What is the name of the text file? ");
String fileName = console.nextLine();
Scanner input = new Scanner(new File(fileName));
while (input.hasNext()) {
String next = input.next();
// process next
}
}
}
Notice that in the loop we use input.next() to read individual words and we
have this in a while loop testing against input.hasNext(). I pointed out that
we'll have trouble with things like capitalization and punctuation. I said
that we should at least turn the string to all lowercase letters so that we
don't count Strings like "The" and "the" as different words:
while (input.hasNext()) {
String next = input.next().toLowerCase();
// process next
}
So how do we count the words? Someone suggested that a Set would be the
perfect structure to solve this problem. It eliminates duplicates, so it will
keep track of how many different words there are. So we changed the loop to
be:
Set<String> words = new HashSet<String>();
while (input.hasNext()) {
String next = input.next().toLowerCase();
words.add(next);
}
System.out.println("Total words = " + words.size());
We could have used a TreeSet, but we decided to use the somewhat faster HashSet
because we didn't need to keep the words in sorted order.Here is a sample log of execution:
What is the name of the text file? moby.txt
Total words = 30368
I said that in the next lecture we would continue this example but we would
count the occurrences of the word, which will require a different kind of
collection called a map.