CSE143 Notes for Friday, 4/8/11

We continued our discussion of the sample program to find distances between friends. We first discussed how to fix the program for generating a file of friendhips. We had been using this code to pick pairs of names at random from our array of names:

        Random r = new Random();
        for (int count = 0; count < names.length; count++) {
            int i = r.nextInt(names.length);
            int j = r.nextInt(names.length);
            output.println("    " + names[i] + " -- " + names[j]);
        }
There were several problems with this approach. First, it allowed a person to be listed as being friends with themselves by picking values for i and j that are equal. Second, it generates duplicates that are difficult to notice because the names appear in opposite order. Both of these problems can be fixed by making sure that the first name being printed always appears before the second name:

        Random r = new Random();
        for (int count = 0; count < names.length; count++) {
            int i = r.nextInt(names.length);
            int j = r.nextInt(names.length);
            if (i < j)
                output.println("    " + names[i] + " -- " + names[j]);
            else if (i > j)
                output.println("    " + names[j] + " -- " + names[i]);
        }
But this version still has several problems. It still allows duplicates to occur because it might select the same two names at random. And whenever i and j are equal, it fails to produce a line of output.

This turns out to be an interesting case to use a set. With our array of 16 names, we are trying to produce 16 lines of output that have no duplicates. The way to do that is to build up a set of 16 lines of output and then to print those lines:

        Random r = new Random();
        Set<String> data = new TreeSet<String>();
        while (data.size() < names.length) {
            int i = r.nextInt(names.length);
            int j = r.nextInt(names.length);
            if (i < j)
                data.add("    " + names[i] + " -- " + names[j]);
            else if (i > j)
                data.add("    " + names[j] + " -- " + names[i]);
        }
        for (String s : data)
            output.println(s);
We ran this program to produce an initial set of data. We made some minor edits to the file to end up with this:

        graph {
            Ashley -- Christopher
            Ashley -- Emily
            Ashley -- Joshua
            Christopher -- Andrew
            Emily -- Joshua
            Jacob -- Christopher
            Jessica -- Ashley
            Michael -- Christopher
            Michael -- Joshua
            Michael -- Jessica
            Samantha -- Matthew
            Samantha -- Tyler
            Sarah -- Andrew
            Sarah -- Christopher
            Sarah -- Emily
            Tyler -- Kyle
            Stuart -- Jacob
        }
When we viewed the file in Graphviz, we found that it looked like this:

Then I demonstrated what the Friends program is supposed to do. It is supposed to use this data to find how far one person is from another. So starting with a given person, it finds that person's friends, then the friends of those friends, then the friends of the friends of the friends, and so on. It reports how far it has to go to find a connection and if it runs out of people, it simply reports that the connection couldn't be found.

For example, here is a sample execution using our data file for finding the connection between Ashley and Stuart:

        Welcome to the cse143 friend finder.
        starting name? Ashley
        target name? Stuart
        
        Starting with Ashley
            1 away: [Christopher, Emily, Jessica, Joshua]
            2 away: [Andrew, Jacob, Michael, Sarah]
            3 away: [Stuart]
        found at a distance of 3
It finds that Ashley has four friends. And those friends have four friends. And one of those friends (Jacob) is friends with Stuart. So the program reports that it found Stuart is 3 away.

Here is a sample execution where the connection is not found, asking for a connection between Ashley and Samantha:

        Welcome to the cse143 friend finder.
        starting name? Ashley
        target name? Samantha
        
        Starting with Ashley
            1 away: [Christopher, Emily, Jessica, Joshua]
            2 away: [Andrew, Jacob, Michael, Sarah]
            3 away: [Stuart]
            4 away: []
        not found
The program goes one level farther than it did before, finding that it runs out of people when it gets 4 away from Ashley. At that point it knows that there is no connection between Ashley and Samantha.

I asked people what kind of structure would be useful for keeping track of this kind of data and someone said a map. But what kind of map? Someone suggested that it would be good to keep track of the neighbors for each person. The neighbors are the friends. For example, Ashley's friends are Christopher, Emily, Jessica, and Joshua.

If we want a structure that keeps track of these kind of friendships, then we want to use names as keys into the structure. We ask the structure, "Who are the friends of Samantha?" or "Who are the friends of Ashley?". So a name, a String, will be used as the key. But what should it return? If we map a String to a String, then we can store only one friendship. We want to be able to return more than one friendship. Someone suggested that we want to use a set. That is exactly right.

The idea is that we want to have a map that converts a String into a Set of String values. Given the name of a person, we can get a Set with the names of that person's friends. For our sample file:

        "Andrew"        => maps to => [Christopher, Sarah]
        "Ashley"        => maps to => [Christopher, Emily, Jessica, Joshua]
        "Christopher"   => maps to => [Andrew, Ashley, Jacob, Michael, Sarah]
        "Emily"         => maps to => [Ashley, Joshua,Sarah]
        "Jacob"         => maps to => [Christopher, Stuart]
        "Jessica"       => maps to => [Ashley, Michael]
        "Joshua"        => maps to => [Ashley,Emily, Michael]
        "Kyle"          => maps to => [Tyler]
        "Matthew"       => maps to => [Samantha]
        "Michael"       => maps to => [Christopher,Jessica, Joshua]
        "Samantha"      => maps to => [Matthew, Tyler]
        "Sarah"         => maps to => [Andrew, Christopher,Emily]
        "Stuart"        => maps to => [Jacob]
        "Tyler"         => maps to => [Kyle, Samantha]}
Our first challenge, then, is to write code to construct such a structure. If it maps a String to a Set<String>, then it would be of this type:

        Map<String, Set<String>>
To construct one, we have to ask for a new TreeSet of this type:

        Map<String, Set<String>> friends = new TreeMap<String, Set<String>>();
That is a rather complex line of code, but the main complexity comes from what we are putting inside the "<" and ">" characters.

To fill up this structure, we need to process the input file. Remember that the input file has lines that have two names separated by a "--", as in:

    Ashley -- Christopher
We wrote the following code to read lines of input and find the ones that contain names:

        while (input.hasNextLine()) {
            String line = input.nextLine();
            if (line.contains("--")) {
                Scanner lineData = new Scanner(line);
                String name1 = lineData.next();
                lineData.next();  // this skips the "--" token
                String name2 = lineData.next();
                // process name1 and name2
            }
        }
This was not the interesting part of the code because we saw file processing in cse142. The interesting part is to think of how to process the two names. How do we update our friends map given a new friendship? Friendships are bidirectional, so we have to be careful to add the friendship in both directions. If there is an Ashley--Chritopher friendship, then we have to make sure that Ashley's set of friends includes Christopher and we have to make sure that Christopher's set of friends includes Ashley.

I mentioned that this is a good place to introduce an extra method because we're going to do the same thing twice. So we replaced the comment above with the following two lines of code:

        addTo(friends, name1, name2);
        addTo(friends, name2, name1);
So then we turned to the task of writing the addTo method. It takes the map and the two names as parameters, so it looks like this:

        public static void addTo(Map<String, Set<String>> friends, String name1, 
                                 String name2) {
            ...
        }
If we're trying to add name2 to the set for name1, then in general we want to:

        get the set for name1
        add name2 to that set
Here is a first attempt:

        Set<String> names = friends.get(name1);
        names.add(name2);
This is a good start. Remember that the whole point of the map is to associate a name with a set of names. So in the first line of code we ask the map to give us the set of names associated with name1. In the second line, we add to that set name2.

Although we can write the code in this way as two lines of code, most programmers would write this as one line of code. There is no need to introduce the local variable called names. So we can instead write this as:

        friends.get(name1).add(name2);
But there is a problem with this approach. It assumes that there is a set of names associated with name1. Initially the map is empty. And if we call get for a key that is not in the map, then we get the value null back. That would cause a NullPointerException if we tried to treat it as a set that we can add something to.

The very first time we see a name, we want to put it into the map. When we do that, we want to associate it with a brand new set that can be used to store the names of that person's friends:

        friends.put(name1, new TreeSet<String>());
But we only want to do this once. For example, if we did this every time we went to add a friendship for this person, then we would always have a set with just one name in it. The first time we see name1, we want to make this set. Then every other time we simply want to add a new name to the existing set. So we need to include a test that constructs the set only the first time we see name1:

        if (!friends.containsKey(name1)) {
            friends.put(name1, new TreeSet<String>());
        }
        friends.get(name1).add(name2);
This is the complete code for the addTo method. It constructs a new set each time it sees a name for the first time. And every time it executes, it adds name2 to the set for name1.

This completes the task of constructing the friends map. The challenge then is to use it to explore friends at various distances. To solve this problem, we will end up using several sets of names. At any given time, we will be exploring a new set of friends that are at the next distance away. We we will continue searching until we either find the target name or run out of people to search. So the overall structure of the method is as follows:

        Set<String> newFriends = new TreeSet<String>();
        newFriends.add(name1);
        int distance = 0;
        while (!newFriends.contains(name2) && !newFriends.isEmpty()) {
            distance++;
            // find friends one further away
        }
Inside the loop, we want to use the current set of newFriends to find the next group of newFriends. We can do so simply by adding all of the friends of these friends to a new set and then replacing newFriends with that new set:

        Set<String> newNewFriends = new TreeSet<String>();
        for (String friend : newFriends) {
            newNewFriends.addAll(friends.get(friend));
        }
        newFriends = newNewFriends;
This provides a pretty good solution to the problem. If we throw in some statements to print out what is happening, we end up with this solution:

        Set<String> newFriends = new TreeSet<String>();
        newFriends.add(name1);
        int distance = 0;
        System.out.println();
        System.out.println("Starting with " + name1);
        while (!newFriends.contains(name2) && !newFriends.isEmpty()) {
            distance++;
            Set<String> newNewFriends = new TreeSet<String>();
            for (String friend : newFriends) {
                newNewFriends.addAll(friends.get(friend));
            }
            newFriends = newNewFriends;
            System.out.println("    " + distance + " away: " + newFriends);
        }
        if (newFriends.contains(name2)) {
            System.out.println("found at a distance of " + distance);
        } else {
            System.out.println("not found");
        }
But notice what happens when we run this version of the program:

        Welcome to the cse143 friend finder.
        starting name? Stuart
        target name? Joshua
        
        Starting with Stuart
            1 away: [Jacob]
            2 away: [Christopher, Stuart]
            3 away: [Andrew, Ashley, Jacob, Michael, Sarah]
            4 away: [Andrew, Christopher, Emily, Jessica, Joshua, Sarah, Stuart]
        found at a distance of 4
It is getting the right answer, but the intermediate answers are not correct. It indicates, for example, that Stuart is 2 away from Stuart. That's because it is including the possibility of going from Stuart to Jacob and then from Jacob back to Stuart. In a similar way, it is saying that Christopher is 2 away and Christopher is 4 away. In this case it came up with the right answer, but allowing this kind of duplication makes the program run more slowly and it leads to an infinite loop when there is no connection between people. That's because when you allow duplicates, it just keeps finding more and more friends when it looks 5 away, 6 away, 7 away, and so on.

The solution is to introduce yet another set to keep track of people who have already been explored. Then when we form a new set of friends to consider, we remove the names of people who have already been explored. And we'll have to add the new people to the set of explored people so that we won't explore them in the future. The code below includes the extra lines of code indicated in bold face:

        Set<String> oldFriends = new TreeSet<String>();
        Set<String> newFriends = new TreeSet<String>();
        newFriends.add(name1);
        int distance = 0;
        System.out.println();
        System.out.println("Starting with " + name1);
        while (!newFriends.contains(name2) && !newFriends.isEmpty()) {
            distance++;
            oldFriends.addAll(newFriends);
            Set<String> newNewFriends = new TreeSet<String>();
            for (String friend : newFriends) {
                newNewFriends.addAll(friends.get(friend));
            }
            newNewFriends.removeAll(oldFriends);
            newFriends = newNewFriends;
            System.out.println("    " + distance + " away: " + newFriends);
        }
        if (newFriends.contains(name2)) {
            System.out.println("found at a distance of " + distance);
        } else {
            System.out.println("not found");
        }
This completes the program.


Stuart Reges
Last modified: Sun Apr 10 19:29:55 PDT 2011