ArrayList<String> list = new ArrayList<String>();As we discussed in the previous lecture, we should use the interface List for declaring the variable's type:
List<String> list = new ArrayList<String>();Java also provides a shorthand that we will use. In defining the variable, we said that it is a list storing strings. It seems redundant to repeat the fact that the elements are of type String when we go to constrct the ArrayList. Obviously we want these to match. Java is willing to fill in the appropriate type for you on the right-hand side to make it match the type mentioned on the left-hand side:
List<String> list = new ArrayList<>();The two characters "<>" are known as the diamond operator. We will be using the diamond operator in all future examples and you are encouraged to use it as well. It greatly simplifies the code that you'll write.
Then I then spent a little time discussing the issue of primitive data versus objects. Even though we can construct an ArrayList<E> for any class E, we can't construct an ArrayList<int> because int is a primitive type, not a class. To get around this problem, Java has a set of classes that are known as "wrapper" classes that "wrap up" primitive values like ints to make them an object. It's very much like taking a candy and putting a wrapper around it. For the case of ints, there is a class known as Integer that can be used to store an individual int. Each Integer object has a single data field: the int that it wrapped up inside.
Java also has quite a bit of support that makes a lot of this invisible to programmers. If you want to put int values into an ArrayList, you have to remember to use the type ArrayList<Integer> rather than ArrayList<int>, but otherwise Java does a lot of things for you. For example, you can construct such a list and add simple int values to it:
List<Integer> numbers1 = new ArrayList<>(); numbers.add(18); numbers.add(34);In the two calls on add, we are passing simple ints as arguments to something that really requires an Integer. This is okay because Java will automatically "box" the ints for us (i.e., wrap them up in Integer objects). We can also refer to elements of this list and treat them as simple ints, as in:
int product = numbers.get(0) * numbers.get(1);The calls on list.get return references to Integer objects and normally you wouldn't be allowed to multiply two objects together. In this case Java automatically "unboxes" the values for you, unwrapping the Integer objects and giving you the ints that are contained inside.
Every primitive type has a corresponding wrapper class: Integer for int, Double for double, Character for char, Boolean for boolean, and so on.
Then I mentioned that we will be looking at a kind of structure known as a Set. There is an interface that defines the behaviors of a set known as Set<E>. For now, all of the sets we will construct all of our sets using the TreeSet<E> class. For example, I used an array of data to initialize both a list and a set by adding values from the array to each:
int[] data = {18, 4, 97, 3, 4, 18, 72, 4, 42, 42, -3}; List<Integer> numbers1 = new ArrayList<>(); Set<Integer> numbers2 = new TreeSet<>(); for (int n : data) { numbers1.add(n); numbers2.add(n); } System.out.println("numbers1 = " + numbers1); System.out.println("numbers2 = " + numbers2);This produced the following output:
numbers1 = [18, 4, 97, 3, 4, 18, 72, 4, 42, 42, -3] numbers2 = [-3, 3, 4, 18, 42, 72, 97]There are two major differences between a set and a list. Sets don't allow duplicates. So the duplicate values like 42 and 4 in the array appear just once in the set. Sets also don't allow the client to control the order of elements. The TreeSet class keeps things in sorted order. So the numbers will always be in that order. If you want to control the order, then you should use a list instead.
Sets have many of the same methods that lists do. You can add to a set, get its size, ask for an iterator, use it with a foreach loop. But it doesn't have a notion of indexing. So you can't remove at an index. Instead you remove a specific value. For example, we wrote this code to remove the value 42 from the set:
numbers2.remove(42); System.out.println("numbers2 = " + numbers2);After executing this line of code, the set no longer had 42 in it:
numbers2 = [-3, 3, 4, 18, 72, 97]If you don't know exactly what values you want to remove from a set, you typically use an iterator to do the removal. We began by writing this code as an attempt to remove all of the multiples of 3 from the set:
Iterator<Integer> itr2 = numbers2.iterator(); while (itr2.hasNext()) { int n = itr2.next(); if (n % 3 == 0) { numbers2.remove(n); } }This code doesn't work. It throws a ConcurrentModificationException. Java has a rule that you can't call a mutating method on a collection while you are iterating over it. You can potentially talk to two different objects: the set or the iterator. What Java doesn't want you to do is to ask the set to change its contents while you are also talking to an iterator.
The solution is to ask the iterator to do the removal so that all of your communication is with that one object:
Iterator<Integer> itr2 = numbers2.iterator(); while (itr2.hasNext()) { int n = itr2.next(); if (n % 3 == 0) { itr2.remove(); } } System.out.println("numbers2 = " + numbers2);This code worked and produced the following output:
now numbers2 = [4, 97]This is the approach you need to take when you want to both examine and remove values from a set. Because you are not allowed to alter a set while you are iterating over it, you also can't modify it with a foreach loop. That is why the foreach loop is appropriate only if you are doing a "read only" operation.
An iterator is what we would call a lightweight object. You can use the iterator to gain access to everything in the structure, but it doesn't store the data itself. I gave the analogy that this is like going to a pharmacy and you'd really like to just jump over the counter and grab your prescription, but instead you have to talk to the a person behind the counter. The person behind the counter has access to everything in the pharmacy. But that person is not the pharmacy. The person has access to the pharmacy and you (the client) talk to the person behind the counter to get things done. That's how an iterator works. It has full access to the underlying structure and it keeps track of how much of the structure it has traversed, but that's not the same thing as being the structure.
I said that this would be much clearer in section when we practice writing code that manipulates sets. Chapter 13 also has a useful table of set operations.
I said that in the next programming assignment we are going to return to being clients of Java's built-in collection classes and we are going to use a new kind of structure known as a map. The idea behind a map is that it keeps track of key/value pairs. We often store data this way. For example, in the US we often use a person's social security number as a key to get information about them. I would expect that if I talked to the university registrar, they probably have the ability to look up students based on social security number or student number or uwnetid to find their transcript.
In a map, there is only one value for any given key. If you look up a social security number and get three different student transcripts, that would be a problem. With the Java map objects, if you already have an entry in your map for a particular key, then any attempt to put a new key/value pair into the map will overwrite the old mapping.
We looked at an interface in the Java class libraries called Map that is a generic interface. That means that we have to supply type information. It's formal description is Map<K, V>. This is different from the List and Set interfaces in that it has two different types. That's because the map has to know what type of keys you have and what type of values you have.
I first showed a short program that constructs a map that associates courses with instructors.
import java.util.*; public class Instructor { public static void main(String[] args) { Map<String, String> instructors = new TreeMap<>(); instructors.put("cse143a", "Hunter Schafer"); instructors.put("cse143b", "Hunter Schafer"); instructors.put("cse143x", "Stuart Reges"); instructors.put("cse142a", "Brett Wortzman"); instructors.put("cse142b", "Brett Wortzman"); System.out.println("instructors = " + instructors); } }This map would allow you to quickly look up the name of an instructor given the course number. One limitation is that if we have two instructors for a course, we can't put two entries into the map. It wouldn't work to say something like:
instructors.put("cse143a", "Hunter Schafer"); instructors.put("cse143a", "Stuart Reges");In this case the map would only remember the second association. You would have to instead say something like:
instructors.put("cse143a", "Hunter Schafer, Stuart Reges");This program produced the following output:
instructors = {cse142a=Brett Wortzman, cse142b=Brett Wortzman, cse143a=Hunter Schafer, cse143b=Hunter Schafer, cse143x=Stuart Reges}It is useful to print maps as a way to debug your code, so learning to read this output is helpful. The map is described by a series of key/value pairs with an equals sign ("=") separating keys and values. Mathematicians would probably prefer a notation like "cse142a=>Brett Wortzman" because they often use an arrow to show a key mapping to a value.
As a second example, I asked how we could write a program that would count the number of unique words in an input file. I had a copy of the text of Moby Dick that we looked at to think about this. I showed some starter code that constructs a Scanner object tied to a file:
import java.util.*; import java.io.*; public class WordCount { public static void main(String[] args) throws FileNotFoundException { Scanner console = new Scanner(System.in); System.out.print("What is the name of the text file? "); String fileName = console.nextLine(); Scanner input = new Scanner(new File(fileName)); while (input.hasNext()) { String next = input.next(); // process next } } }Notice that in the loop we use input.next() to read individual words and we have this in a while loop testing against input.hasNext().
So how do we count the words? Someone suggested that a set would be the perfect structure to solve this problem. It eliminates duplicates, so it will keep track of how many different words there are. So we changed the code to be:
Set<String> words = new TreeSet<>(); while (input.hasNext()) { String next = input.next(); words.add(next); } System.out.println("Total words = " + words.size());Here is a sample log of execution:
What is the name of the text file? moby.txt Total words = 32019One limitation of this version is that it pays attention to capitalization. So it considers the words "whale" and "Whale" and "WHALE" to be different words. To fix that, we modified the code to read in a word so that it converts it to its lowercase equivalent:
String next = input.next().toLowerCase();It didn't make much difference, as we saw from this execution:
What is the name of the text file? moby.txt Total words = 30368Someone pointed out that we still haven't dealt with punctuation. It is considering "whale" and "whale." and "whale," to be different words. I didn't want to deal with it in this program, but I mentioned that the chapter 10 case study discusses this and shows you how to configure the Scanner so that it ignores those punctuation characters.
This program counts the number of unique words, but not the counts of the individual words. To keep track of word counts, we can use a map we have some words (Strings) that we want to associated with some counters (ints). We can't actually use type int because it is a primitive type, but we can use type Integer.
So our map would be of type Map<String, Integer>. In other words, it's a a map that keeps track of String/Integer pairs (this String goes to this Integer). Map is the name of the interface, but it's not an actual implementation. The implementation we will use is TreeMap. So we can construct a map called "count" to keep track of our counts by saying:
Map<String, Integer> count = new TreeMap<>();The most basic methods in the map interface are the ones that allow you to put something into the map (an operation called put) and to ask the map for the current value of something (an operation called get).
I asked what code we need to record a word in our map the first time we see it. Someone suggested using the put method to assign it to a count of 1. So our loop becomes:
Map<String, Integer> count = new TreeMap<>(); while (input.hasNext()) { String next = input.next().toLowerCase(); count.put(next, 1); }This doesn't quite work, but it's getting closer. Each time we encounter a word, it adds it to our map, associating it with a count of 1. This will figure out what the unique words are, but it won't have the right counts for them.
I asked people to think about what to do if a word has been seen before. In that case, we want to increase its count by 1. That means we have to get the old value of the count and add 1 to it:
count.get(next) + 1and make this the new value of the counter:
count.put(next, count.get(next) + 1);So we have two different calls on put. We want to call the first one when the word is first seen and call the second one if it's already been seen. Someone suggested using an if/else for this. The only question is what test to use. The Map includes a method called containsKey that tests whether or not a certain value is a key stored in the map. Using this method, we modified our code to be:
Map<String, Integer> count = new TreeMap<>(); while (input.hasNext()) { String next = input.next().toLowerCase(); if (!count.containsKey(next)) { count.put(next, 1); } else { count.put(next, count.get(next) + 1); } }The first time we see a word, we call the put method and say that the map should associate the word with a count of 1. Later we call put again with a higher count. And we keep calling put every time the count goes up. What happens to the old values that we had put in the map previously? The way the map works, each key is associated with only one value. So when you call put a second or third time, you are wiping out the old association. The new key/value pair replaces the old key/value pair in the map.
Then we talked about how to print the results. Clearly we need to iterate over
the entries in the map. One way to do this is to request what is known as the
"key set". The key set is the set of all keys contained in the map. The Java
documentation says that it will be of type Set
Then I mentioned that I wanted to explore a sample program that will constitute
a medium hint for the programming assignment. We will begin looking at the
program in this lecture and finish it up in the next lecture.
The sample program involves keeping track of friendships. You could think of
it as keeping track of Facebook friends. One of the first questions that comes
up is how do we represent friendships? For example, are friendships
bidirectional? If person A is friends with person B, does that mean that
person B is friends with person A? For our purposes, we will assume the answer
is yes. If we were trying to represent something like "is attracted to", then
we'd come to a different conclusion, but for friends, just like on Facebook and
other social networking sites, friendship goes both ways.
I said that a good way to visualize friendships is to draw a graph in which
each person is represented with a node (an oval) and each friendship is
represented by an edge connecting two nodes (a line drawn between two ovals).
I am using a program called Graphviz, which is an open-source graph viewer.. For example,
here is a sample friendship graph:
This information is stored in a file with lines that list pairs of friendships,
as in:
For example, here is a sample execution using our data file for finding the
connection between Ashley and Stuart:
Here is a sample execution where the connection is not found, asking for a
connection between Stuart and Bart:
We looked at one more example that involved a fairly long chain:
If we want a structure that keeps track of these kind of friendships, then we
want to use names as keys into the structure. We ask the structure, "Who are
the friends of Samantha?" or "Who are the friends of Ashley?". So a name, a
String, will be used as the key. But what should it return? If we map a
String to a String, then we can store only one friendship. We want to be able
to return more than one friendship. Someone suggested that we want to use a
set. That is exactly right.
The idea is that we want to have a map that converts a String into a Set of
String values. Given the name of a person, we can get a Set with the names of
that person's friends. For our sample file:
To fill up this structure, we need to process the input file. Remember that
the input file has lines that have two names separated by a "--", as in:
I mentioned that this is a good place to introduce an extra method because
we're going to do the same thing twice. So we replaced the comment above with
the following two lines of code:
Then we thought about the first call to addTo when name1 is "Ashley" and name2
is "Christopher". Someone pointed out that we need a set to keep track of
Ashley's friends, so we wrote this line of code:
The question is how to get back to the original set. The answer is to talk to
the map to ask for its entry for Ashley. We can store this information in a
variable by saying:
To figure out when to execute the three lines of code versus when to execute
the two lines of code we can use an if/else just as we did in the word counting
program to choose between the two cases by checking whether the map already
contains name1 as a key:
This completes the task of constructing the friends map. A simple way to see
what is in the map is to print it in main:
for (String word : count.keySet()) {
// process word
}
We would read this as, "for each String word that is in count.keySet()..."
To process the word, we simply print it out along with its count. How do we
get its count? By calling the get method of the map:
for (String word : count.keySet()) {
System.out.println(count.get(word) + "\t" + word);
}
I didn't try to print all of the words in Moby Dick because it would
have produced too much output. Instead, I had it show me the counts of words
in the program itself. Obviously for large files we want some mechanism to
limit the output. On the calendar I will put a version that includes some
extra code that asks for a minimum frequency to use. We ran that on Moby
Dick and saw this list of words that occur at least 500 times:
What is the name of the text file? moby.txt
Minimum number of occurrences for printing? 500
4571 a
1354 all
587 an
6182 and
563 are
1701 as
1289 at
973 be
1691 but
1133 by
1522 for
1067 from
754 had
741 have
1686 he
552 him
2459 his
1746 i
3992 in
512 into
1555 is
1754 it
562 like
578 my
1073 not
506 now
6408 of
933 on
775 one
675 or
882 so
599 some
2729 that
14092 the
602 their
506 there
627 they
1239 this
4448 to
551 upon
1567 was
644 were
500 whale
552 when
547 which
1672 with
774 you
One final point I made about the Map interface is that you can associate
just about anything with just about anything. In the word counting program, we
associated strings with integers. You could also associate strings with
strings. One thing you can't do is to have multiple associations in a single
map. For example, if you decide to associate strings with strings, then any
given string can be associated with just a single string. But there's no
reason that you can't have the second value be structured in some way. You can
associate strings with arrays or strings with Lists.
graph {
Ashley -- Christopher
Ashley -- Emily
Ashley -- Joshua
Bart -- Lisa
Bart -- Matthew
Christopher -- Andrew
Emily -- Joshua
Jacob -- Christopher
Jessica -- Ashley
JorEl -- Zod
KalEl -- JorEl
Kyle -- Lex
Kyle -- Zod
Lisa -- Marge
Matthew -- Lisa
Michael -- Christopher
Michael -- Joshua
Michael -- Jessica
Samantha -- Matthew
Samantha -- Tyler
Sarah -- Andrew
Sarah -- Christopher
Sarah -- Emily
Tyler -- Kyle
Stuart -- Jacob
}
Then I demonstrated what the Friends program is supposed to do. It is supposed
to use this data to find how far one person is from another. So starting with
a given person, it finds that person's friends, then the friends of those
friends, then the friends of the friends of the friends, and so on. It reports
how far it has to go to find a connection and if it runs out of people, it
simply reports that the connection couldn't be found.
Welcome to the cse143 friend finder.
starting name? Ashley
target name? Stuart
Starting with Ashley
1 away: [Christopher, Emily, Jessica, Joshua]
2 away: [Andrew, Jacob, Michael, Sarah]
3 away: [Stuart]
found at a distance of 3
It finds that Ashley has four direct friends (Christopher, Emily, Jessica, and
Joshua). Those friends have four friends (Andrew, Jacob, Michael, Sarah).
Those four friends have a friend named Stuart. So the program reports that it
found Stuart 3 away from Ashley.
Welcome to the cse143 friend finder.
starting name? Stuart
target name? Bart
Starting with Stuart
1 away: [Jacob]
2 away: [Christopher]
3 away: [Andrew, Ashley, Michael, Sarah]
4 away: [Emily, Jessica, Joshua]
5 away: []
not found
The program goes two levels farther than it did before, finding that it runs
out of people when it gets 5 away from Stuart. At that point it knows that
there is no connection between Stuart and Bart.
Welcome to the cse143 friend finder.
starting name? Bart
target name? JorEl
Starting with Bart
1 away: [Lisa, Matthew]
2 away: [Marge, Samantha]
3 away: [Tyler]
4 away: [Kyle]
5 away: [Lex, Zod]
6 away: [JorEl]
found at a distance of 6
I asked people what kind of structure would be useful for keeping track of this
kind of data and someone said a map. But what kind of map? Someone suggested
that it would be good to keep track of the neighbors for each person. The
neighbors are the friends. For example, Ashley's friends are Christopher,
Emily, Jessica, and Joshua.
"Andrew" => maps to => [Christopher, Sarah]
"Ashley" => maps to => [Christopher, Emily, Jessica, Joshua]
"Bart" => maps to => [Lisa, Matthew]
"Christopher" => maps to => [Andrew, Ashley, Jacob, Michael, Sarah]
"Emily" => maps to => [Ashley, Joshua, Sarah]
"Jacob" => maps to => [Christopher, Stuart]
"Jessica" => maps to => [Ashley, Michael]
"JorEl" => maps to => [KalEl, Zod]
"Joshua" => maps to => [Ashley, Emily, Michael]
"KalEl" => maps to => [JorEl]
"Kyle" => maps to => [Lex, Tyler, Zod]
"Lex" => maps to => [Kyle]
"Lisa" => maps to => [Bart, Marge, Matthew]
"Marge" => maps to => [Lisa]
"Matthew" => maps to => [Bart, Lisa, Samantha]
"Michael" => maps to => [Christopher, Jessica, Joshua]
"Samantha" => maps to => [Matthew, Tyler]
"Sarah" => maps to => [Andrew, Christopher, Emily]
"Stuart" => maps to => [Jacob]
"Tyler" => maps to => [Kyle, Samantha]
"Zod" => maps to => [JorEl, Kyle]
Our first challenge, then, is to write code to construct such a structure. If
it maps a String to a Set<String>, then it would be of this type:
Map<String, Set<String>>
To construct one, we have to ask for a new TreeMap of this type:
Map<String, Set<String>> friends = new TreeMap<>();
Notice that the diamond operator greatly simplifies this line of code.
Ashley -- Christopher
I showed the following code to read lines of input and find the ones that
contain names:
while (input.hasNextLine()) {
String line = input.nextLine();
if (line.contains("--")) {
Scanner lineData = new Scanner(line);
String name1 = lineData.next();
lineData.next(); // this skips the "--" token
String name2 = lineData.next();
// process name1 and name2
}
}
This was not the interesting part of the code because we saw file processing in
cse142. The interesting part is to think of how to process the two names. How
do we update our friends map given a new friendship? Friendships are
bidirectional, so we have to be careful to add the friendship in both
directions. If there is an Ashley--Christopher friendship, then we have to
make sure that Ashley's set of friends includes Christopher and we have
to make sure that Christopher's set of friends includes Ashley.
addTo(friends, name1, name2);
addTo(friends, name2, name1);
So then we turned to the task of writing the addTo method. It takes the map
and the two names as parameters, so it looks like this:
public static void addTo(Map<String, Set<String>> friends, String name1,
String name2) {
...
}
So far in our code we have constructed one object--the map. I asked the class
what the map's size is and the answer is 0. We constructed the map, but never
added anything to it.
Set<String> theFriends = new TreeSet<>();
The first thing we need to do is to set up the association in the map between
Ashley's name and this set:
friends.put(name1, theFriends);
Now the map has a size of 1. The set still has a size of 0. That's because we
never added anything to the set. We'll get there. The following diagram
indicates where we are now:
friends ==> {"Ashley" ==> []}
The map has one entry that keeps track of the fact that the name "Ashley" is
associated with an empty set. We don't want that set to be empty. We want to
add "Christopher" to that set, so we included this line of code:
theFriends.add(name2);
That leaves us in this situation:
friends ==> {"Ashley" ==> ["Christoper"]}
Putting the lines of code together, we have:
Set<String> theFriends = new TreeSet<>();
friends.put(name1, theFriends);
theFriends.add(name2);
We have added one entry to the map for Ashley and recorded her friend
Christopher. Then we make a second call on addTo with the names reversed. We
end up creating another set. We told the map to associate Christopher's name
with this new set. And we told the set to remember that Ashley is one of
Christopher's friends. That leaves us in this state:
friends ==> {"Ashley" ==> ["Christoper"],
"Christopher" ==> ["Ashley"]}
Think about what happens next. The input file has the pairing "Ashley" and
"Emma". Suppose we execute the exact same three lines of code again. The
first thing we would do is to create a brand new set for keeping track of
Ashley's friends. But we already have a set for keeping track of Ashley's
friends. Only that set knows that Christopher is one of Ashley's friends. If
we make a brand new set, we'll only know about the new friendship.
Set<String> theFriends = friends.get(name1);
We now want the set to also remember that Emma is a friend of Ashley. So we
say:
theFriends.add(name2);
This leaves us with the following situation:
friends ==> {"Ashley" ==> ["Christoper", "Emma"],
"Christopher" ==> ["Ashley"]}
That completes what we want to do for this call on addTo. Then we call addTo
again with the names reversed. That means that name1 is "Emma". The map has
no entry for Emma, so we go back to executing the three lines of code we had
before. We construct yet another set to keep track of Emma's friends. Then we
tell the map to associate "Emma" with this set. And then we tell the newly
constructed set to add "Ashley" to the set. That leaves us in this state:
friends ==> {"Ashley" ==> ["Christoper", "Emma"],
"Christopher" ==> ["Ashley"],
"Emma" ==> ["Ashley"]}
We still have just one object keeping track of all of this data. Our map is
filling that role. But inside the map, it keeps references to the three sets
we constructed. One set is keeping track of Ashley's friends and now has two
entries. A second set records the fact that Christopher has a friend named
Ashley. And the third set records the fact that Emma has a friend named
Ashley.
if (!friends.containsKey(name1)) {
Set<String> theFriends = new TreeSet<>();
friends.put(name1, theFriends);
theFriends.add(name2);
} else {
Set<String> theFriends = friends.get(name1);
theFriends.add(name2);
}
In the next lecture we will discuss how this can be simplified, but for now,
this is a reasonable way to write the code.
Map<String, Set<String>> friends = readFile(input);
System.out.println(friends);
The output is hard to read, but if you look closely, you'll see that it is
correctly capturing all of the friendship relationships:
{Andrew=[Christopher, Sarah], Ashley=[Christopher, Emily, Jessica,
Joshua], Bart=[Lisa, Matthew], Christopher=[Andrew, Ashley, Jacob,
Michael, Sarah], Emily=[Ashley, Joshua, Sarah], Jacob=[Christopher,
Stuart], Jessica=[Ashley, Michael], JorEl=[KalEl, Zod], Joshua=[Ashley,
Emily, Michael], KalEl=[JorEl], Kyle=[Lex, Tyler, Zod], Lex=[Kyle],
Lisa=[Bart, Marge, Matthew], Marge=[Lisa], Matthew=[Bart, Lisa,
Samantha], Michael=[Christopher, Jessica, Joshua], Samantha=[Matthew,
Tyler], Sarah=[Andrew, Christopher, Emily], Stuart=[Jacob],
Tyler=[Kyle, Samantha], Zod=[JorEl, Kyle]}
I included a version of this part of the code on the calendar. We will
complete this program in the next lecture.
Stuart Reges
Last modified: Wed Oct 30 18:52:10 PDT 2019