CSE143 Notes for Friday, 3/30/18

I first discussed the built-in ArrayList class. Remember that we're studying the ArrayIntList class as a way to understand the built-in ArrayList class. We know that for arrays, it is possible to construct arrays that store different types of data:

But arrays are a special case in Java. If they didn't already exist, we couldn't easily add them to Java. Instead, Java now allows you to declare generic classes and generic interfaces. For example, the ArrayList class is similar to an array. Instead of declaring ordinary ArrayList objects, we declare ArrayList<E> where E is some type (think of E as being short for "Element type"). The "E" is a type parameter that can be filled in with the name of any class.

For example, suppose, we want an ArrayList of Strings. We describe the type as:

        ArrayList<String>
When we construct an ArrayIntList, we say:
        ArrayIntList list = new ArrayIntList();
Imagine replacing both occurrences of "ArrayIntList" with "ArrayList<String>" and you'll see how to construct an ArrayList<String>:
        ArrayList<String> list = new ArrayList<String>();
Once you have declared an ArrayList<String>, you can manipulate it with the kinds of calls we have made on our ArrayIntList but using Strings instead of ints:

        ArrayList<String> list = new ArrayList<String>();
        list.add("four");
        list.add("score");
        list.add("seven");
        list.add("years");
        list.add("what was next?");
        list.add("ago");
        System.out.println("list = " + list);
which produces this output:

        list = [four, score, seven, years, what was next?, ago]
Obviously this isn't the correct quote. We could fix the individual calls on the appending add method, but it was a more interesting exercise to explore how to fix the list after it is constructed.

The first problem is that there is a missing word. The word "and" should appear between "score" and "seven". We want to include it at index 2 in the structure, so we used this line of code to do so:

        list.add(2, "and");
The other problem is that it includes the string "what was next?". This was originally at index 4 of the list, but now that we have inserted a new value, it has been shifted over so that it appears at index 5. So we added this line of code:

        list.remove(5);
With those two lines of code added after constructing the list, the output is now correct:

        list = [four, score, and, seven, years, ago]
All of the methods we have seen with ArrayIntList are defined for ArrayList: the appending add, add at an index, remove, size, get, etc.

Then I discussed the idea of interfaces. An interface is a description of a set of behaviors. For example, all of the behaviors we have just discussed that are included in the ArrayList<E> class are also included in an interface known as List<E>. The List<E> interface says that a list has to have an appending add method, a method to add at an index, a method to remove at an index, a get method, a size method, an indexOf method, and so on.

But there are different ways to implement a list. We have been looking at how to implement it using an array. Later we will look at how to implement it using something called a linked list. To make your programs flexible, you should declare your variables, parameters, fields, and method return types using interfaces. So instead of saying:

        ArrayList<String> list = new ArrayList<String>();
you should instead say:
        List<String> list = new ArrayList<String>();
With this declaration, the variable list is more flexible. It can store a reference to any list, not just an ArrayList. I mentioned that the best analogy I have for interfaces is that they are similar to how we use the concept of certification. You can't claim to be a certified doctor unless you have been trained to do certain specific tasks. Similarly, to be a certified teacher you have to know how to behave like a teacher, to be a certified nurse you have to know how to behave like a nurse, and so on. In Java, if you want to claim to be a certified List<E>, then you have to have several different methods. I then mentioned that this is an idea that has been used throughout the collections classes in Java (the java.util package). This idea is stressed by Joshua Bloch, the author of Effective Java. Joshua Bloch was the primary architect of the collections framework and has influenced much of the development work for Java.

In the collections framework, Bloch was careful to define data structure abstractions with interfaces. For example, there are interfaces for List, Set and Map which are abstractions that we'll be discussing in the coming weeks. His item 64 is, "Refer to objects by their interfaces," which begins by saying:

You should favor the use of interfaces over classes to refer to objects. If appropriate interface types exist, then parameters, return values, variables and fields should all be declared using interface types. The only time you really need to refer to an object's class is when you're creating it with a constructor.
The middle sentence was in bold face in the book, indicating how important Bloch thinks this is, and I've reproduced that here.

For now, this will mostly be a style issue for us. In a few weeks we will look at how interfaces are actually defined. For now, just realize that we will require you to use interface types for defining variables, fields, parameters, and method return types.

Then I said that I wanted to explore all of the ways to traverse the list, printing the values one per line. The most basic is to use the standard traversal loop for a list that uses calls on the size and get methods:

        for (int i = 0; i < list.size(); i++) {
            System.out.println(list.get(i));
        }
I mentioned that another way to do this is with a foreach loop. Chapter 7 describes how to use the foreach loop for arrays and chapter 10 describes how to use it for an ArrayList. It is the best structure to use when you are performing a "read only" operation on a list that doesn't require keeping track of the index.

The foreach loop syntax is fairly simple:

        for (String s : list) {
            System.out.println(s);
        }
We read this loop as, "for each string s that is in list..." The choice of "s" is arbitrary. It defines a local variable for the loop. I could just as easily have called it "x" or "foo" or "value". Each time through the loop Java sets the variable s to the next value from the list. You don't have to test for the size of the list or to use a call on the get method. Java does that for you when you use a for-each loop.

There are some limitations of for-each loops. You can't use them to change the contents of the list. If you assign a value the variable s, you are just changing a local variable inside the loop. It has no effect on the list itself.

Then we explored a different approach to traversing a list using what is known as an iterator. In Java there are three fundamental operations that an iterator performs:

We wrote the following code to use an iterator for printing our list elements:

        Iterator<String> itr = list.iterator();
        while (itr.hasNext()) {
            String s = itr.next();
            System.out.println(s);
        }
Then I then spent a little time discussing the issue of primitive data versus objects. Even though we can construct an ArrayList<E> for any class E, we can't construct an ArrayList<int> because int is a primitive type, not a class. To get around this problem, Java has a set of classes that are known as "wrapper" classes that "wrap up" primitive values like ints to make them an object. It's very much like taking a candy and putting a wrapper around it. For the case of ints, there is a class known as Integer that can be used to store an individual int. Each Integer object has a single data field: the int that it wrapped up inside.

Java also has quite a bit of support that makes a lot of this invisible to programmers. If you want to put int values into an ArrayList, you have to remember to use the type ArrayList<Integer> rather than ArrayList<int>, but otherwise Java does a lot of things for you. For example, you can construct such a list and add simple int values to it:

        List<Integer> numbers1 = new ArrayList<Integer>();
        numbers.add(18);
        numbers.add(34);
In the two calls on add, we are passing simple ints as arguments to something that really requires an Integer. This is okay because Java will automatically "box" the ints for us (i.e., wrap them up in Integer objects). We can also refer to elements of this list and treat them as simple ints, as in:

        int product = numbers.get(0) * numbers.get(1);
The calls on list.get return references to Integer objects and normally you wouldn't be allowed to multiply two objects together. In this case Java automatically "unboxes" the values for you, unwrapping the Integer objects and giving you the ints that are contained inside.

Every primitive type has a corresponding wrapper class: Integer for int, Double for double, Character for char, Boolean for boolean, and so on.

Then I mentioned that we will be looking at a kind of structure known as a Set. There is an interface that defines the behaviors of a set known as Set<E>. For now, all of the sets we will construct all of our sets using the TreeSet<E> class. For example, I used an array of data to initialize both a list and a set by adding values from the array to each:

        int[] data = {18, 4, 97, 3, 4, 18, 72, 4, 42, 42, -3};
        List<Integer> numbers1 = new ArrayList<Integer>();
        Set<Integer> numbers2 = new TreeSet<Integer>();

        for (int n : data) {
            numbers1.add(n);
            numbers2.add(n);
        }
        System.out.println("numbers1 = " + numbers1);
        System.out.println("numbers2 = " + numbers2);
This produced the following output:

        numbers1 = [18, 4, 97, 3, 4, 18, 72, 4, 42, 42, -3]
        numbers2 = [-3, 3, 4, 18, 42, 72, 97]
There are two major differences between a set and a list. Sets don't allow duplicates. So the duplicate values like 42 and 4 in the array appear just once in the set. Sets also don't allow the client to control the order of elements. The TreeSet class keeps things in sorted order. So the numbers will always be in that order. If you want to control the order, then you should use a list instead.

Sets have many of the same methods that lists do. You can add to a set, get its size, ask for an iterator, use it with a foreach loop. But it doesn't have a notion of indexing. So you can't remove at an index. Instead you remove a specific value. For example, we wrote this code to remove the value 42 from the set:

        numbers2.remove(42);
        System.out.println("numbers2 = " + numbers2);
After executing this line of code, the set no longer had 42 in it:

        numbers2 = [-3, 3, 4, 18, 72, 97]
If you don't know exactly what values you want to remove from a set, you typically use an iterator to do the removal. We began by writing this code as an attempt to remove all of the multiples of 3 from the set:

        Iterator<Integer> itr2 = numbers2.iterator();
        while (itr2.hasNext()) {
            int n = itr2.next();
            if (n % 3 == 0) {
                numbers2.remove(n);
            }
        }
This code doesn't work. It throws a ConcurrentModificationException. Java has a rule that you can't call a mutating method on a collection while you are iterating over it. You can potentially talk to two different objects: the set or the iterator. What Java doesn't want you to do is to ask the set to change its contents while you are also talking to an iterator.

The solution is to ask the iterator to do the removal so that all of your communication is with that one object:

        Iterator<Integer> itr2 = numbers2.iterator();
        while (itr2.hasNext()) {
            int n = itr2.next();
            if (n % 3 == 0) {
                itr2.remove();
            }
        }
        System.out.println("numbers2 = " + numbers2);
This code worked and produced the following output:
        now numbers2 = [4, 97]
This is the approach you need to take when you want to both examine and remove values from a set. Because you are not allowed to alter a set while you are iterating over it, you also can't modify it with a foreach loop. That is why the foreach loop is appropriate only if you are doing a "read only" operation.

An iterator is what we would call a lightweight object. You can use the iterator to gain access to everything in the structure, but it doesn't store the data itself. I gave the analogy that this is like going to a pharmacy and you'd really like to just jump over the counter and grab your prescription, but instead you have to talk to the a person behind the counter. The person behind the counter has access to everything in the pharmacy. But that person is not the pharmacy. The person has access to the pharmacy and you (the client) talk to the person behind the counter to get things done. That's how an iterator works. It has full access to the underlying structure and it keeps track of how much of the structure it has traversed, but that's not the same thing as being the structure.

I said that this would be much clearer in section when we practice writing code that manipulates sets. Chapter 13 also has a useful table of set operations.


Stuart Reges
Last modified: Fri Mar 30 16:22:40 PDT 2018