CSE341 Notes for Monday, 3/5/07

I spent some more time discussing what you can do with blocks in Ruby because I think this is one of the most interesting aspects of the language. As an example, I asked people to consider what you'd have to do to implement an iterator for a binary tree in a language like Java. For example, if you have a binary search tree, you might want to traverse the data using an inorder traversal. We often write code that prints the nodes in this order because it gives the values in sorted order.

So how would we implement a binary tree inorder iterator for a Java binary tree? There were several suggestions. Someone said that we could store parent links in the tree and we could move around in the tree from node to node. That works, but it requires keeping those extra parent links and the code to move from node to node is a bit tricky. This is the approach that Sun takes in implementing the TreeMap and TreeSet classes. Here is a bit of source code from TreeMap.java that has the code that moves from one node to the next using an inorder traversal:

        private Entry<K,V> successor(Entry<K,V> t) {
            if (t == null)
                return null;
            else if (t.right != null) {
                Entry<K,V> p = t.right;
                while (p.left != null)
                    p = p.left;
                return p;
            } else {
                Entry<K,V> p = t.parent;
                Entry<K,V> ch = t;
                while (p != null && ch == p.right) {
                    ch = p;
                    p = p.parent;
                }
                return p;
            }
        }

Another suggestion was to keep a stack and to simulate the call stack ourselves. That could work as well, although that is also rather tricky. Someone suggested that we could keep a counter of the most recent value we had generated and then each time we were asked for the next value, we could traverse from the beginning to the appropriate spot. That would be highly inefficient and would turn an O(n) process into an O(n²) process. One final idea was to do the complete traversal in advance and store the result in some kind of data structure like an ArrayList and then we could iterate over the ArrayList.

All of these solutions work, but none of them is simple and efficient. Ruby gives us a solution that is simple and efficient. We had developed this code in section for a binary search tree:

        class Node
          def initialize(data = nil, left = nil, right = nil)
            @data = data
            @left = left
            @right = right
          end
        
          def left
            return @left
          end
        
          def left= l
            @left = l
          end
        
          def right
            return @right
          end
        
          def right= r
            @right = r
          end
        
          def data
            return @data
          end
        
          def data= d
            @data = d
            end
        end
        
        class Tree
          def initialize()
            @overallRoot = nil
          end
        
          def insert(v)
            @overallRoot = insert_helper(v, @overallRoot)
          end
        
          def print()
            print_helper(@overallRoot)
          end
        
          private
          def insert_helper(v, root)
            if root == nil
              root = Node.new(v)
            elsif v < root.data then
              root.left = insert_helper(v, root.left)
            else
              root.right = insert_helper(v, root.right)
            end
            return root
          end
        
          def print_helper(root)
            if root != nil then
              print_helper root.left
              puts root.data
              print_helper root.right
            end
          end
        end

We can define an iterator called inorder that looks a lot like the current print and print_helper methods. The big difference is that instead of calling puts to print values, they will call yield to generate values:

        class Tree
          ...
        
          def inorder
            inorder_helper(@overallRoot) {|n| yield n}
          end
        
          private
          def inorder_helper(root)
            if root then
              inorder_helper(root.left) {|n| yield n}
              yield root.data
              inorder_helper(root.right) {|n| yield n}
            end
          end
        
          ...
        end

Once this method is written, we can call it with a block. In fact, print can now be redefined as a call on this iterator:

        def print()
          inorder {|n| puts n}
        end

We loaded this new version into irb and tested it out. First we create a tree and inserted 25 random values:

        >> t = Tree.new
        => #<Tree:0xb7eb0a5c @overallRoot=nil>
        >> 25.times{t.insert(rand(100))}
        => 25

We found that print still worked just fine:

But now we could specify variants of print by using the inorder iterator, like printing each number doubled:

        >> t.inorder {|n| puts 2 * n}
        4
        4
        30
        32
        38
        46
        46
        64
        76
        84
        86
        94
        102
        122
        128
        136
        140
        146
        154
        158
        160
        166
        176
        180
        192
        => nil

We were also able to use the iterator to find the sum of the numbers:


        >> sum = 0
        => 0
        >> t.inorder {|n| sum += n}
        => nil
        >> sum
        => 1282

I pointed out that not only was this iterator fairly easy to define, it is also highly efficient. We would describe it as lazy in the sense that it doesn't compute a value until it needs it. For example, we reset the sum to be 0 and wrote this variant that breaks out of the computation as soon as the sum becomes greater than 100:

       t.inorder do |n|
         puts n
         sum += n
         break if sum > 100
       end

When we ran it, it produced this output:

We found that it had set sum to 132 and then stopped. As we noted earlier, one approach is to precompute the entire traversal before it begins. For a computation like the one above that breaks out early, that would be very expensive.

I gave one other quick example of this kind of computation in Ruby. There is a library known as "mathn" that has some interesting math extensions. For example, it has a class called Prime that can be used to generate the prime numbers in sequence:

        >> require "mathn"
        => true
        >> p = Prime.new
        => #
        >> p.next
        => 2
        >> p.next
        => 3
        >> p.next
        => 5
        >> p.next
        => 7
        >> 10.times {puts p.next}
        11
        13
        17
        19
        23
        29
        31
        37
        41
        43
        => 10

It has an each method that can compute an arbitrary number of primes. Obviously it doesn't precompute them. It computes them only as it needs them. We would have to include a call on break or return if we want to use it, as in this code, which computes the sum of the primes up to 10000:

        require "mathn"
        p = Prime.new
        sum = 0
        p.each do |n|
          break if n > 10000
          sum += n
        end
        puts sum

This reports that the sum of the primes up to 10000 is equal to 5736396.

We spent the rest of the class discussing the idea of a continuation. This is one of the most confusing but also one of the most powerful ideas we have discussed. Continuations are more commonly used in Scheme, but since we have been spending so much time in Ruby, I decided to show them in Ruby instead.

I started by talking about the idea of handling errors. In Java we have the ability to throw an exception and the ability to add try/catch code to handle exceptions. Ruby has similar constructs, but we could instead use a continuation. The idea is to capture the state of the computation at a particular point in time.

For example, suppose we wrote the following code to read in a file that is supposed to have a series of integers, one per line. If it encounters something that doesn't look like an integer, it generates an error message. Notice that I had to do something more than just calling the to_i method because the Ruby to_i method is very forgiving and returns a 0 if it encounters a string that can't be converted to an integer. Instead, I check to see if a string matches the to_s of the integer I get from to_i (the strip method eliminates any leading or trailing white space--think of it as a more powerful version of chomp):

        def process
          infile = File.open("data.txt")
          sum = 0
          for line in infile
            value = line.strip.to_i
            if value.to_s == line.strip then
              sum += value
            else
              print "Error: ", line
            end
          end
          sum
        end

I included the following in data.txt:

When I ran it, it generated three error messages, as expected:

        >> process
        Error: foo
        Error: bar
        Error: baz
        => 1168

Then I made a change to the code. I said that instead of just printing an error message, I would stop the execution. When you throw an exception in Java, that throws away the state of the computation. Instead, I called a method known as callcc ("call with current continuation") as a way to preserve the state of the computation. You provide the code to execute as a block. The parameter to the block is an object of type Continuation that represents the current state of the computation. I asked people how we could get out of the method and someone said we could call return:

        def process
          infile = File.open("data.txt")
          sum = 0
          for line in infile
            value = line.strip.to_i
            if value.to_s == line.strip then
              sum += value
            else
              print "Error: ", line
              value = callcc {|cont| return cont}
              sum += value
              print "reentering with ", value, "\n"
            end
          end
          sum
        end

When we went to execute this in irb, it reached the first error and printed it, but then it exited the function, returning a Continuation object:

        >> x = process
        Error: foo
        => #<Continuation:0xb7f59ad0>

The idea is that the computation has been suspended exactly at the point where we called the method callcc. All of the state of the computation is captured in the Continuation object. It knows that we were inside a for loop, that we were reading a file, that we were computing a sum, that we had just generated an error message, and that we then captured the state of the computation with the call on callcc.

We were able to go back into the computation by using a method named "call":

        >> x.call 17
        reentering with 17
        Error: bar
        => #<Continuation:0xb7f58298>

Notice that it generated the message about "reentering". This was generated by the print statement in the original code. It then went back into the computation and processed more values until it reached the second error condition when it read "bar". It again suspended the computation and reset the Continuation object to represent the state of the computation at that point in time. From irb, we were again able to restart the computation and to supply a new value to replace the faulty value:

        >> x.call 29
        reentering with 29
        Error: baz
        => #<Continuation:0xb7f56ad8>

We again saw the message about reentering the computation and it went back to adding up the values until it hit the third illegal value "baz". That caused it to again suspend because of the call on callcc and it updated the Continuation object to keep track of the computation at that point in time. We reentered one last time:

        >> x.call 3
        reentering with 3
        => 1217

This time the code finished executing normally rather than hitting the call on callcc, so we instead got the sum of the values in the list along with the values we supplied when we reentered the computation.

This is a very powerful mechanism. I said that we wouldn't have time to explore all of the interesting ways in which it is used, but I said that Scheme programmers in particular use this technique often to implement what is known as continuation passing style.

I gave one last example just to underscore the idea that a continuation can be used to capture a computation in the middle of the computation. I said to consider a method called weird that would print the value of 2 to the third power:

        def weird
          puts 2 ** 3
        end

This method says to call carry 2 to the power of 3 and then pass that value to the puts method. Imagine capturing the computation in the middle, right where the value 3 appears. Instead of having a 3, we can put a call on callcc that still evaluates to 3:

        def weird
          puts 2 ** (callcc do |cont|
                       3
                     end)
        end

The callcc block ends with the value 3, so that's what it evaluates to. But it also captures the computation at that moment. We can access that Continuation object using the parameter called "cont". I surrounded this method with a class definition that set an instance variable @x to this Continuation object and that included an accessor for @x:

        class Weird
          def initialize
            @x = nil
          end
        
          def x
            @x
          end
        
          def weird
            puts 2 ** (callcc do |cont|
              @x = cont
              3
            end)
            puts "weird?"
          end
        end

I added an extra call on puts with the message "weird?" to underscore what we'd see in a moment. I opened the file in irb, created a Weird object and called the weird method:

        >> w = Weird.new
        => #<Weird:0xb7f842e4 @x=nil>
        >> w.weird
        8
        weird?
        => nil

None of that was particularly surprising. The code still prints the value of 2 to the power of 3, although now it also prints "weird?" afterwards. But in our code, we set the instance variable @x to be the value of the Continuation object at the point where we were supplying the exponent for 2 and we have an accessor method to get at this instance variable:

        >> w.x
        => #<Continuation:0xb7f83240>

What does this variable represent? It was constructed just before we said to carry 2 to the 3 power. As a result, we can supply a different power to carry 2 to:

        >> w.x.call 4
        16
        weird?
        => nil

This causes the computation to pick up where it left off but using the value 4 rather than using the value 3 as the exponent of 2. Notice that it still produces the "weird?" message because that was part of the code left to be executed after the call on callcc. We were able to have it compute other exponents as well or the original exponent of 3:

        >> w.x.call 5
        32
        weird?
        => nil
        >> w.x.call 3
        8
        weird?
        => nil
        >> w.x.call 2
        4
        weird?
        => nil
        >> w.x.call 18
        262144
        weird?
        => nil

I've heard it said that callcc is the functional equivalent of a goto. That's an interesting way to look at it. The key idea is that a continuation captures the full state of a computation at a particular point in the computation.

Stuart Reges

Last modified: Mon Mar 12 17:05:17 PDT 2007