CSE143 Notes for Monday, 1/9/06

I began by mentioning that the following pithy statement summarizes several of the key concepts of object oriented programming: "An object encapsulates state and behavior." We had discussed what state and behavior are. For a radio, the states included on/off, volume setting, station setting, am/fm and so on. The behavior is that it plays music and that it allows us to change these settings.

In programming, state usually means variables (data) and behavior usually means methods (what are called functions or procedures or subprograms in other languages). In older programming languages, we didn't attempt to directly combine our data and our algorithms (our state and our behavior). With objects, we try to put the two together.

Then I turned back to the IntList class. I began by drawing some pictures for client code like the following:

        IntList list1 = new IntList(45);
        IntList list2 = new IntList(3495);
        list1.add(8);
        list2.add(37);
        ...

I reminded people that variables like list1 and list are of type IntList, which means that they can store a reference to an IntList but they are not actually IntList objects. As an analogy, imagine that each IntList object has a cell phone and you contact the IntList object by calling it on the phone. The variables list1 and list2 store the cell phone number. That's not the same thing as the object itself. But by storing the cell phone number, the variables know how to get in touch with the object.

So after the first two lines of code execute, we have a situation like the following:

                         +-------------------+          [0] [1] [2] ... [44]
              +---+      |             +---+ |         +---+---+---+---+---+
        list1 | +-+--->  | elementData | +-+-+-------> |   |   |   |...|   |
              +---+      |             +---+ |         +---+---+---+---+---+
                         |      +---+        |
                         | size | 0 |        |
                         |      +---+        |
                         +-------------------+

                         +-------------------+          [0] [1] [2] ...[3494]
              +---+      |             +---+ |         +---+---+---+---+---+
        list2 | +-+--->  | elementData | +-+-+-------> |   |   |   |...|   |
              +---+      |             +---+ |         +---+---+---+---+---+
                         |      +---+        |
                         | size | 0 |        |
                         |      +---+        |
                         +-------------------+

Two IntList objects have been constructed. Each time we construct one, we set up its data fields (the array and size). In the first case, we said that the location of the newly constructed IntList object should be stored in the variable list1 and in the second case we said that the location of the newly constructed IntList object should be stored in the variable list2. In constructing the two IntList objects, we also constructed arrays. These are separate objects that are referred to by the IntList objects.

It makes sense that after these objects have been constructed, we could refer to list1.size to get to the size data field in the first object and list2.size to refer to the size data field of the second object. The client isn't allowed to do this because the data fields are declared to be private (to properly encapsulate the object). But it would be theoretically possible to refer to these. I mentioned it mostly to draw a contrast to the way that the IntList code refers to this.size. Remember that "this" is a special Java word that refers to the implicit parameter. The client code above has two calls on add, the first call adding to list1 and the second call adding to list2. When the first call is made, Java sets "this" to be list1. So when the method add method says:

        this.size++;

That is incrementing list1.size (the size data field of the object that list1 is referring to). In the second call, because we said "list2.add(37);", Java sets "this" to be list2. So the line of code above causes list2.size to be incremented.

I repeated my point that we are using the IntList class as a case study to understand how objects work in Java and, in particular, how data structure objects work in Java (what are known as "collections" or "collections classes"). I reminded people of the idea of using IntList as a kind of "software cadaver" that we are cutting open to examine. We're trying to reach a point where we understand the Java class known as ArrayList. That class is sufficiently complex that I'm using IntList instead and I'm going to be showing successively more complex versions of it as we learn more about how these classes work.

I reminded people that we started out with two variables and five code fragments that we examined in our first discussion section. We turned the two variables into data fields and we turned the five code fragments into methods and we put all of this inside a class called IntList.

I took a moment to remind people that in Java you can "overload" a method. The idea is that you can have more than one method of the same name. For example, among those five original methods, we have two add methods:

        // post: appends the given value to the end of the list
        public void add(int value)

        // post: inserts the given value at the given index, shifting
        //       subsequent values right
        public void add(int index, int value)

The first is the appending add that takes just a value and that appends it to the end of the list. The second is the inserting add that takes both an index and a value and that inserts the value in the middle of the list. Both methods are called add. This is okay because they have different signatures. The signature of a method is defined by the method name and the number and types of its parameters. These methods both have the same name, but they still have different signatures because one takes a single int argument and the other takes two int arguments. The compiler can tell from the call on the method which one you want to use.

In addition to the five original methods, we added two constructor methods that would allow clients to create actual IntList objects. One is a "real" constructor that specifies how to initialize the data fields of the object:

        public IntList(int capacity) {
            this.elementData = new int[capacity];
            this.size = 0;
        }

The other constructor calls this constructor using the "this" keyword. Whenever Java sees the keyword "this" followed by parentheses, it interprets that as one constructor calling another constructor. You can include this only as the first line of a constructor, as in the other IntList constructor:

        public IntList() {
            this(DEFAULT_CAPACITY);
        }

Java can tell that this is a call on the other constructor because it includes an int inside parentheses. In other words, this is the zero-argument constructor calling the one-argument constructor. This is the normal pattern in Java classes. There is normally one "real" constructor that has the detailed code and any other constructors simply call this primary constructor.

To make sure that the IntList was properly encapsulated, we declared the two variables to be private data fields. This prevents the client from changing the variables directly. But what about the downside? The most obvious one is that by declaring the variables to be private, we have prevented any client from using them. But clients might have good reasons to want to manipulate these values. For example, we want clients to be able to find out how many elements are in the list. But instead of providing direct access to a variable like "size", we provide a method that allows the client to access the value. That's why the original IntList class had two extra methods called get and size that were included to allow the client to have limited access to the private data fields.

I spent some time talking about the concept of preconditions and postconditions. This is a way of describing the contract that a method has with the client. Preconditions are assumptions the method makes. They are a way of describing any dependencies that the method has ("this has to be true for me to do my work"). Postconditions describe what the method accomplishes assuming the preconditions are met ("I'll do this as long as the preconditions are met.").

I have included pre/post comments on all of my IntList methods. I encourage people to use this style of commenting. It is not required, but if you use a different style, be sure that you have addressed the preconditions and postconditions of each method in the comments for the method.

As an example, I pointed out that methods like "get" that are passed an index assume that the index is legal. The method wouldn't know how to get something that is at a negative index or at an index that is beyond the size of the list. Whenever you find a method that has this kind of dependence, you should document it as a precondition of the method.

Then I spent some time talking about the binary search algorithm. In the original version of IntList we had an indexOf method that searches for a value by starting at index 0 and going sequentially from element to element (0, 1, 2, 3, etc) until it finds it. This is known as linear search or sequential search. It works, but it can be very time consuming if the list has a lot of values in it.

The binary search algorithm is a better approach to use when the values appear in sorted order. For example, if you were searching for a name like "Smith" in the Manhattan phone book, you wouldn't start on page 1 and go sequentially through every value. Binary search is a better technique to use for sorted data.

So how does this work? We need a couple of variables to keep track of exactly what part of the array we're looking at. Initially we're looking at the entire list, but as we throw away one half or another half of the candidates, we have to keep track of what's left to look at. Each time through we want to look at the middle value in our range of values. For example, suppose we are searching for the location of the number 2398 and we find that the middle value is 4026. What does that mean? Because the numbers are sorted, if 2398 is anywhere in the list, must be in the first half. So that would allow us to throw away half of the candidates to search. What if the number in the middle had been 1034? Then we'd know that if 2398 is in the list, it must be in the second half.

That's the basic idea. Each time through the loop we manage to eliminate half of the possibilities by looking at the middle value and comparing it to the number we are searching for.

I asked people how many steps this might take. For example, imagine that there are 1000 values in the list. How many times would you have to divide 1000 in half to get it down to 1? The answer is that you would need around 10 divisions by 2 to get down to 1. That's because 2^10 is approximately equal to 10^3 (1024 versus 1000). What if the list had one million values? One million is the square of 10^3 (10^6), which means it would take around 20 times through the loop to find the value (the square of 2^10 is 2^20). It's obviously much faster to look at 20 different values than to look at a million different values, so the binary search technique is going to be much faster as the list becomes longer.

Stuart Reges

Last modified: Fri Jan 13 10:23:48 PST 2006