CSE143 Notes 4/3/06

Binary Search Performance; Iterators

We first need to spend a couple of minutes on compareTo() and equals(), which we glossed over last time.

Binary search evaluation. How fast is it?

Informally, it looks like we do a lot less work, measured by the number of items in a list we need to examine when we're searching for a value. Let's see if we can be a bit more precise about this.

What we'd like to figure out is, given a list of size n, how much work does it take to search for a value using binary search? One way to approach the problem is to look it the other way around: given a fixed number of comparisons, how big a list can we search?

                k = # comparisons                        n = list size

 

 

 

 

 

 

 

 

 

OK, given this information, what is the relationship between k and n? That is, what is k as a function of n?

 

 

 

 

 

 

Comparing linear and binary search. So now we have an idea of the cost of binary search in a (sorted) list of n items. How does that compare to linear search (while (k < n && item not found yet) k++)?

                Algorithm                       cost to search a list of size n

                linear search

                binary search

Graph:

 

 

 

 

 

 

 

Bottom line: binary search isn't just 2x or 5x or 10x faster than linear search. It's running time is a different kind of function than the running time of linear search - and that difference is independant of implementation details, the particular kind of processor we're using, how much memory we've got, or other details. This is a first example of a computational complexity result, comparing two algorithms abstractly without reference to lots of low-level coding specifics. We'll return to this idea throughout the course as a way of comparing and characterizing algorithms.

 

Iterators

One of our reasons for looking at simple examples like the StringList and SortedStringList classes is to get a concrete idea of the basics behind container or collection classes - general purpose classes that are used to hold collections of data. These are needed so often that most programming languages these days include a library of generally useful ones. Java, for example, includes things like ArrayList, which is a general version of our StringList that can hold objects of any type.

One operation that we often need once we have a collection of items is the ability to go through the collection and process the items in it one at a time. For collections like ArrayList and StringList, where the items are stored in an underlying data structure (array) that permits efficient access to individual elements by their location, we could always do this to access the items by their position:

  for (int k = 0; k < s.size(); k++) {
    process s.get(k)
  }

While this works fine for array-based collections, it isn't a general solution. In some data structures like a linked list (something we'll get to shortly), the get(k) operation is very inefficient - access to individual items in the list by their position is slow. For some other collections, the notion of position doesn't even make sense. A set of items is a collection, but there is no notion of a first item in a set, then a second, and so on.

What we want is a general mechanism to "process all the items in a collection one by one", in a way that works efficiently with any kind of collection. The mechanism to do this is an iterator. The idea is that we can ask a collection to give us an iterator object that can be used to access the items in the collection. Any iterator provides the following capabilities in some form or another:

The Java collection classes provide iterators for all of the collection classes, but in a slightly different form. Java's iterators combine the "return the current object" and "advance to the next object" operations into a single method. The Java versions of these operations are

The classic versions of the Java collections like ArrayList can store arbitrary objects, but as a result, you have to use a cast when you retrieve an item from the collection to specify the actual type of the object you've retrieved. It's a bit of extra noise, but not too bad. So using the classic version of ArrayList, here is how to create a list, fill it with a few strings, then retrieve them one by one.

  ArrayList lst = new ArrayList();
  lst.add("some strings");   // add some strings
  ...
  Iterator it = lst.iterator();   
  while (it.hasNext()) {        
    String nextString = (String)it.next();
	 ...process nextString...
  }

Picture: (in particular, what information does the iterator object need to know to do its job?)

 

 

 

 

 

 

 

 

Java 5 (the latest version) introduced the notion of generic containers (and generic types for all sorts of things besides containers). The idea is that instead of declaring something like a plain ArrayList, which can hold objects of any types, we can specify the kinds of objects we want the container to hold by putting the type name between angle brackets in the declaration.

  ArrayList<String> lst = new ArrayList<string>();
  lst.add("some strings");   // add some strings
  ...
  Iterator<String> it = lst.iterator();
  while (it.hasNext()) {
    String nextString = it.next();
	 ...process nextString...
  }

We'll come back to generic types and explore their significance later. For now, the idea is to get the hang of the iterator pattern.

One issue is what happens if changes are made to the underlying collection while an iteration is in progress? Say someone adds a new item to the list, or deletes something? While there are various ways to handle this, Java's answer is that these operations are not allowed, since changes to the underlying list could leave the iteration in a strange state where the meaning of hasNext() or next() is not clear. If someone changes a list and then an iterator operation is attempted, a ConcurrentModificationException is normally thrown.

Another feature of Java iterators: These iterators support one other method: remove(). This method allows us to remove the last item returned by next() during an iteration, and in that sense, is the one exception to the general rule that no changes may be made to a container in the middle of an iteration. For example, we can eliminate all occurrences of "bad words" from a list of strings as follows (using the old-style ArrayList collection):

  ArrayList lst = new ArrayList();
  lst.add("some strings");   // add some strings
  ...
  Iterator it = lst.iterator();                   // retrieve an iterator object
  while (it.hasNext()) {
    String s = (String)it.next();
    if (s.equals("bad words") {
      it.remove();
    }
  }
There are several restrictions on remove() that cause it to throw an exception. Attempting remove() before retrieving an object with next(), or two remove() operations without a next() between them are examples of this. For more details, it's worth looking at the description of the ArrayList iterator() method and the Iterator class in the standard library JavaDoc pages.