CSE341 Notes for Wednesday, 5/22/24

I started by talking about this Java program:

        import java.util.*;
        
        public class Tree1 {
            public static void main(String[] args) {
                Set<Integer> s = new TreeSet<>();
                s.add(23);
                s.add(15);
                s.add(37);
                s.add(7);
                s.add(20);
                s.add(25);
                s.add(16);
                s.add(22);
                for (int n : s){
                    System.out.println(n);
                }
            }
        }

The TreeSet is implemented as a binary search tree, so it constructs something like the following (not exactly the same because it wants to ensure that the tree is balanced):

                                 +----+
                                 | 23 |
                                 +----+
                               /        \
                             /            \
                      +----+                +----+
                      | 15 |                | 37 |
                      +----+                +----+
                     /      \              /
                    /        \            /
                +----+      +----+    +----+
                |  7 |      | 20 |    | 25 | 
                +----+      +----+    +----+
                           /      \
                          /        \
                      +----+      +----+
                      | 16 |      | 22 |
                      +----+      +----+

I asked how the foreach loop is implemented. Someone mentioned that your class has to provide an iterator. So how does the iterator work? Someone said it needs to traverse the tree using an inorder traversal. That's true, but remember that the iterator gives values one at a time when you call its method next(). Someone suggested doing a complete traversal and storing the results to "flatten" the tree. We could do that, but it would require extra storage and it would be very inefficient, as we'll discuss below.

I asked what is the first value returned. Someone said 7. Why 7? Because it is the leftmost value in the tree. Okay, so the iterator finds the leftmost value in the tree and returns it. Then what? What happens when you call next again? Someone said it returns 15. How does it get there? Someone said it has to go to the parent of the current node. So that means we need extra parent links in the tree. Okay, so it goes to the parent and returns that value. Then what? Then it has to explore the righthand subtree. That requires finding the leftmost value in the right subtree of the node with 15.

This gets very complicated very quickly. Wouldn't it be nice if we could just use an inorder traversal of the tree? I said that I got the following code from the source code for Java's TreeSet and you can see the complex way it moves around the tree to find the next value to return in this "successor" method:

        private Entry successor(Entry t) {
            if (t == null)
                return null;
            else if (t.right != null) {
                Entry p = t.right;
                while (p.left != null)
                    p = p.left;
                return p;
            } else {
                Entry p = t.parent;
                Entry ch = t;
                while (p != null && ch == p.right) {
                    ch = p;
                    p = p.parent;
                }
                return p;
            }
        }

Ruby gives us a solution that is simple and efficient. I returned to the binary search tree code we discussed in the previous lecture. In particular, I wanted to focus on the print method that we implemented using a private helper method:

        def print
          print_helper(@overall_root)
        end

        def print_helper(root)
          if root
            print_helper(root.left)
            puts root.data
            print_helper(root.right)
          end
        end

I changed the name to "inorder" and "inorder_helper" and changed the call on puts in the helper method to a call on yield:

        class Tree
          ...
          def inorder
            inorder_helper(@overall_root)
          end
          ...
          private
          ...
          def inorder_helper(root)
            if root
              inorder_helper(root.left)
              yield root.data
              inorder_helper(root.right)
            end
          end
          ...
        end

We tried calling this method with a block:

        t = Tree.new
        t.inorder {|n| puts n}

This gave an error message. Remember that when you call a method that has yield, then it has to be called with a block. We're providing a block here, but in the class there are three places where we call the method without providing a block. That includes the initial call on the overall root and the two recursive calls. So we had to modify each of them to take a block. And what should be done with the values you get? We yield with that value:

        class Tree
          ...
          def inorder
            inorder_helper(@overall_root) {|n| yield n}
          end
          ...
          private
          ...
          def inorder_helper(root)
            if root
              inorder_helper(root.left) {|n| yield n}
              yield root.data
              inorder_helper(root.right) {|n| yield n}
            end
          end
          ...
        end

Now we were able to make the call on inorder passing it a block. But we weren't able to use a foreach loop with the tree, as in:

        for n in t
          puts n
        end

There was a simple fix for this. We simply renamed the public method from "inorder" to "each" and then we were able to also execute the foreach loop.

We then redefined print using a foreach loop:

        def print()
          for n in self
            puts n
          end
        end

We loaded this new version into irb and tested it out. First we create a tree and inserted 25 random values:

        >> t = Tree.new
        => #<Tree:0xb7eb0a5c @overallRoot=nil>
        >> 25.times{t.insert(rand(100))}
        => 25

We found that print still worked just fine:

But now we could specify variants of print by using the inorder iterator, like printing each number doubled:

        >> t.each {|n| puts 2 * n}
        4
        4
        30
        32
        38
        46
        46
        64
        76
        84
        86
        94
        102
        122
        128
        136
        140
        146
        154
        158
        160
        166
        176
        180
        192
        => nil

We were also able to use the iterator to find the sum of the numbers:


        >> sum = 0
        => 0
        >> t.each {|n| sum += n}
        => nil
        >> sum
        => 1282

I pointed out that not only was this iterator fairly easy to define, it is also highly efficient. We would describe it as lazy in the sense that it doesn't compute a value until it needs it. For example, we reset the sum to be 0 and wrote this variant that breaks out of the computation as soon as the sum becomes greater than 100:

       t.each do |n|
         puts n
         sum += n
         break if sum > 100
       end

When we ran it, it produced this output:

We found that it had set sum to 132 and then stopped. As we noted earlier, one approach is to precompute the entire traversal before it begins. For a computation like the one above that breaks out early, that would be very expensive.

I mentioned that Python has a yield method and that you can write very similar code in Python so that you can say things like:

        >>> import tree2
        >>> t = tree2.Tree()
        >>> for n in [23, 15, 37, 7, 20, 25, 16, 22]:
        ...     t.insert(n)
        ... 
        >>> for n in t.inorder():
        ...     print(n)
        ... 
        7
        15
        16
        20
        22
        23
        25
        37

I said that I would include the code for tree2-2.txt on the calendar for today.

While we were in the Python interpreter, I pointed out a funny capability. We know that in Ruby we can use the #{...} form to embed an expression in a string that should be evaluated and then turned into a string, as in:

        >> s = "(#{2 + 3}, #{3 + 4 * 5})"
        => "(5, 23)"

Python borrowed this feature starting with Python 6, although you just use the curly braces and don't include the number-sign:

        >>> s = f"({2 + 3}, {3 + 4 * 5})"
        >>> s
        '(5, 23)'

This is known as an "f string" (notice how there is an f before the opening quote). You will find that all of these languages borrow features from the other languages, so often even though you are studying a language like Ruby, you might be learning things that can be applied in other languages like Python.

Then I returned to our discussion of OO in Ruby. I pointed out an interesting behavior. I redefined the each method in the Range class to return twice the normal value:

        class Range2 < Range
	  def each
	    super {|n| yield 2 * n}
	  end
	end

This produced the expected behavior that a foreach loop now produces doubled values when you use a Range2 object:

        >> x = Range2.new(1, 10)
        => 1..10
        >> y = 1..10
        => 1..10
        >>         for n in x do puts n end
        2
        4
        6
        8
        10
        12
        14
        16
        18
        20
        => 1..10
        >> for n in y do puts n end
        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        => 1..10

But we also noticed that now map behaves differently:

        >> x.map {|n| n}
        => [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
        >> y.map {|n| n}
        => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

as does the any? member function:

        >> x.any? {|n| n % 2 == 1}
        => false
        >> y.any? {|n| n % 2 == 1}
        => true

What's going on? Obviously the map and any? functions are calling each. This is an important aspect of OO design that we didn't have time to discuss in CSE123 and CSE143. We talked about how to design your code for a client who will call your methods. We didn't talk about how to design your code for a client who will extend your class through inheritance.

I then talked about how this is implemented. How would the Range object's map function "know" to call the new version of "each"? We've spent so much time getting students to understand this notion in Java, that perhaps it seems obvious. This ability to override a method is what we refer to as polymorphism, dynamic dispatch, late binding, runtime binding, etc.

But what is the mechanism? In Java and Ruby the model is that each method has an additional unstated parameter. In Java we refer to it as "this". In Ruby we refer to it as "self". By knowing which object is calling a method, we can at runtime figure out which method to call. This extra parameter is often referred to as the "implicit parameter."

Then I briefly discussed Ruby blocks as closures. Blocks in Ruby provide the two key elements of a closure:

code to be executed at some later time
preserving the context in which that code appeared

They aren't true closures, though, because they can't be manipulated as first-class entities. We can't, for example, use a variable to store a block. But Ruby provides a mechanism for converting a block into an object using the "lambda" keyword:

        >> lambda {|n| 2 * n}
        => #<Proc:0xb76aad08@(irb):1>

This expression is constructing an object of type Proc, as we can see below:

        >> x = lambda {|n| 2 * n}
        => #<Proc:0xb76a441c@(irb):2>
        >> x.class
        => Proc

Once constructed, you can invoke the block using the "call" member function:

        >> x.call 13
        => 26
        >> x.call 25
        => 50

We then discussed what are known as mixins. This is one of the most interesting features of Ruby.

Before looking at Ruby mixins, I spent a few minutes discussing Java's inheritance model. I asked people what you get when class B extends class A in Java. The answer is that you get two different things:

code reuse: the methods and fields of the superclass are inherited
a subtype relationship: B objects are considered to be of type A

These are really two different things. Java also has interfaces, which allow you to define subtype relationships without any code reuse. So in Java you can have one code reuse relationship and any number of subtype relationships, but you can't get the code reuse relationship without the subtype.

C++ is an interesting contrast. C++ supports multiple inheritance. With multiple inheritance, you can get multiple code reuse relationships. But it turns out that multiple inheritance is rather messy. For example, Arthur Riel in his book Object-Oriented Design Heuristics includes as item 54:

54. If you have an example of multiple inheritance in your design, assume you have made a mistake and then prove otherwise.

C++ also has a notion of private inheritance where you have code reuse but no subtype relationship.

Ruby offers something in between. It has single inheritance, just as Java does. Subtyping doesn't matter in Ruby because it uses duck typing (Ruby doesn't care what kind of duck you are as long as you can quack in an appropriate manner when asked to do so). So the only issue in Ruby is code resuse. We've seen that inheritance of classes is similar in Ruby to what we saw in Java. Mixins offer an alternative. You can define a mixin by define a module and including a set of methods. For example, I wrote the following mixin that defines two methods that allow sequences to be stuttered:

        module Stutterable
          def stutter
            result = []
            for n in self
              result.push n
              result.push n
            end
            result
          end
        
          def stutter_each
            for n in self
              yield n
              yield n
            end
          end
        end

You use the word "module" instead of "class". Once you have defined this module, you can include it in classes by saying:

        include Stutterable

It is almost as if the actual code from the module is included. For example, we went into the interpreter and added this code to the Array class:

        >> class Array
        >>   include Stutterable
        >>   end
        => Array
        >> x = [1, 2, 3]
        => [1, 2, 3]
        >> x.stutter
        => [1, 1, 2, 2, 3, 3]
        >> x.stutter_each {|n| puts n}
        1
        1
        2
        2
        3
        3
        => [1, 2, 3]

and we added it to the Range class:

        >> class Range
        >>   include Stutterable
        >>   end
        => Range
        >> x = (1..5)
        => 1..5
        >> x.stutter
        => [1, 1, 2, 2, 3, 3, 4, 4, 5, 5]
        >> x.stutter_each {|n| puts n}
        1
        1
        2
        2
        3
        3
        4
        4
        5
        5
        => 1..5

I mentioned that the two most common mixins are Comparable and Enumerable. For example, we modified the Point class to implement a method called <=>, which is the Ruby equivalent of the java compareTo method (sometimes referred to as the "spaceship operator"). We had it find which point is closer to the origin. To make this more efficient, we introduced a class variable called @@origin. The double at-sign indicates that it's a class variable versus an instance variable (i.e., one shared value for the entire class, like a static field in Java):

        class Point
          include Comparable

          def initialize (x = 0, y = 0)
            @x = x
            @y = y
          end
        
          attr_reader :x, :y
          attr_writer :x, :y
        
          def to_s
            "(#{x}, #{y})"
          end
        
          def distance(other)
            return Math.sqrt((x - other.x) ** 2 + (y - other.y) ** 2)
          end
        
          @@origin = Point.new
        
          def <=> other
            return distance(@@origin) - other.distance(@@origin)
          end
        end

What the mixin gets us is six extra methods built on top of the <=> method. For example, now we can say:

        >> p1 = Point.new(3, 5)
        => #<Point:0xb8052298 @y=5, @x=3>
        >> p2 = Point.new(5, 3.1)
        => #<Point:0xb804e2b0 @y=3.1, @x=5>
        >> p1 < p2
        => true
        >> p1 <= p2
        => true
        >> p1 > p2
        => false
        >> p1 >= p2
        => false

So this is an example of code reuse without using the inheritance mechanism. Instead, we have defined five methods in terms of another method. This is a very convenient way to be able to build up new functionality.

I said that we would finish this discussion in the next lecture.

Stuart Reges

Last modified: Fri May 24 13:33:19 PDT 2024