CSE341 Notes for Friday, 12/4/09

I began by discussing I/O in Ruby. As we saw last time, just as there is a "puts" method to write a line of output, there is a "gets" method that reads a line of input from the user:

        >> x = gets
        hello there
        => "hello there\n"

You can define a variable that is tied to an external input file by calling File.open:

        >> infile = File.open("utility.sml")
        => #<File:utility.sml>

Here I'm reading our file of ML utility functions that we used earlier in the quarter. To get a line of text, you can call gets on this object, as in:

        >> infile.gets
        => "(* Stuart Reges *)\n"
        >> infile.gets
        => "(* 1/17/07      *)\n"
        >> 3.times {puts infile.gets}
        (*              *)
        (* Collection of utility functions *)
        
        => 3

You can also use a for-each loop, as in:

        for str in infile
            puts str
            puts str
        end

This will echo each line of the input file twice. Or you can read the whole thing into an array by saying:

        lines = infile.readlines

Keep in mind, though, that the file object keeps track of where it is in the file, so you might need to open the file again to read it more than once.

Then we reviewed console and file-reading operations. I mentioned that I particularly like the readlines method, as in:

        lst = File.open("hamlet.txt").readlines

This read in the entire contents of Hamlet into an array of strings. We were then able to ask questions like how many lines there are in the file or what the 101st line is:

        irb(main):002:0> lst.length
        => 4463
        irb(main):003:0> lst[100]
        => "  Hor. Well, sit we down,\r\n"

I asked people how we could write code to count the number of occurrences of various words in the file. We'd want to split each line using whitespace, which you can get by calling the string split method, as in:

        irb(main):004:0> lst[100].split
        => ["Hor.", "Well,", "sit", "we", "down,"]

To store the counts for each word, we need some kind of data structure. In Java we'd use a Map to associate words with counts. We can do that in Ruby with a hashtable:

        irb(main):005:0> count = Hash.new
        => {}

As we saw in an earlier lecture, we can use the square bracket notation to refer to the elements of the table. For example, to increment the count for the word "hamlet", we're going to want to execute a statement like this:

        count["hamlet"] += 1

Unfortunately, when we tried this out, it generated an error:

        irb(main):006:0> count["hamlet"] += 1
        NoMethodError: undefined method `+' for nil:NilClass
                from (irb):6
                from :0

That's because there is no entry in the table for "hamlet". But Ruby allows us to specify a default value for table entries that gets around this:

        irb(main):007:0> count = Hash.new 0
        => {}
        irb(main):008:0> count["hamlet"] += 1
        => 1
        irb(main):009:0> count
        => {"hamlet"=>1}

Using this approach, it was very easy to count the occurrences of the various words in the lst array:

        irb(main):007:0> count = Hash.new 0
        => {}
        irb(main):010:0> for line in lst do
        irb(main):011:1*     for word in line.split do
        irb(main):012:2*       count[word.downcase] += 1
        irb(main):013:2>     end
        irb(main):014:1>   end

After doing this, we could ask for the number of words in the file and the count for individual words like "hamlet":

        irb(main):022:0> count.length
        => 7234
        irb(main):023:0> count["hamlet"]
        => 28

The File object can be used with a foreach loop, so we could have written this same code without setting up the array called lst:

        irb(main):024:0> count = Hash.new 0
        => {}
        irb(main):025:0> for line in File.open("hamlet.txt") do
        irb(main):026:1*     for word in line.split do
        irb(main):027:2*       count[word.downcase] += 1
        irb(main):028:2>     end
        irb(main):029:1>   end
        => #<File:hamlet.txt>
        irb(main):030:0> count.length
        => 7234
        irb(main):031:0> count["hamlet"]
        => 28

The key point here is that it is possible to write just a few lines of Ruby code to express a fairly complex operation to be performed. We'd expect no less from a popular scripting language.

I then spent a few minutes showing people the Bagels and Jotto programs that are included in homework 9.

I then discussed the idea of writing an iterator for the binary tree class. How would we implement a binary tree inorder iterator for a Java binary tree? There were several suggestions. One idea was to do the complete traversal in advance and store the result in some kind of data structure like an ArrayList and then we could iterate over the ArrayList. Another suggestion was to keep a stack and to simulate the call stack ourselves. That could work as well, although that is also rather tricky.

What Sun does is to keep track of parent links in the tree and then you move around in the tree from node to node. That works, but it requires keeping extra parent links and the code to move from node to node is a bit tricky. Here is a bit of source code from TreeMap.java that has the code that moves from one node to the next using an inorder traversal:

        private Entry<K,V> successor(Entry<K,V> t) {
            if (t == null)
                return null;
            else if (t.right != null) {
                Entry<K,V> p = t.right;
                while (p.left != null)
                    p = p.left;
                return p;
            } else {
                Entry<K,V> p = t.parent;
                Entry<K,V> ch = t;
                while (p != null && ch == p.right) {
                    ch = p;
                    p = p.parent;
                }
                return p;
            }
        }

All of these solutions work, but none of them is simple and efficient. Ruby gives us a solution that is simple and efficient. At that point we ran out of time, so I said that we'd complete it in the next lecture.

Ruby gives us a solution that is simple and efficient. We had developed this code in section for a binary search tree

        class Tree
          def initialize()
            @overallRoot = nil
          end
        
          def insert(v)
            @overallRoot = insert_helper(v, @overallRoot)
          end
        
          def print()
            print_helper(@overallRoot)
          end
        
          private # beginning of private definitions
        
          class Node
            def initialize(data = nil, left = nil, right = nil)
              @data = data
              @left = left
              @right = right
            end
        
            attr_reader :data, :left, :right
            attr_writer :data, :left, :right
          end
        
          def insert_helper(v, root)
            if root == nil
              root = Node.new(v)
            elsif v < root.data then
              root.left = insert_helper(v, root.left)
            else
              root.right = insert_helper(v, root.right)
            end
            return root
          end
        
          def print_helper(root)
            if root != nil then
              print_helper root.left
              puts root.data
              print_helper root.right
            end
          end
        end

We can define an iterator called each that looks a lot like the current print and print_helper methods. The big difference is that instead of calling puts to print values, they will call yield to generate values:

        class Tree
          ...
        
          def each
            inorder_helper(@overallRoot) {|n| yield n}
          end
        
          private
            def inorder_helper(root)
              if root then
                inorder_helper(root.left) {|n| yield n}
                yield root.data
                inorder_helper(root.right) {|n| yield n}
              end
            end
          ...
        end

Given this method, we can call it with a block. In fact, print can now be redefined as a call on this iterator:

        def print()
          each {|n| puts n}
        end

We loaded this new version into irb and tested it out. First we create a tree and inserted 25 random values:

        >> t = Tree.new
        => #<Tree:0xb7eb0a5c @overallRoot=nil>
        >> 25.times{t.insert(rand(100))}
        => 25

We found that print still worked just fine:

But now we could specify variants of print by using the inorder iterator, like printing each number doubled:

        >> t.inorder {|n| puts 2 * n}
        4
        4
        30
        32
        38
        46
        46
        64
        76
        84
        86
        94
        102
        122
        128
        136
        140
        146
        154
        158
        160
        166
        176
        180
        192
        => nil

We were also able to use the iterator to find the sum of the numbers:


        >> sum = 0
        => 0
        >> t.inorder {|n| sum += n}
        => nil
        >> sum
        => 1282

I pointed out that not only was this iterator fairly easy to define, it is also highly efficient. We would describe it as lazy in the sense that it doesn't compute a value until it needs it. For example, we reset the sum to be 0 and wrote this variant that breaks out of the computation as soon as the sum becomes greater than 100:

       t.inorder do |n|
         puts n
         sum += n
         break if sum > 100
       end

When we ran it, it produced this output:

We found that it had set sum to 132 and then stopped. As we noted earlier, one approach is to precompute the entire traversal before it begins. For a computation like the one above that breaks out early, that would be very expensive.

I gave one other quick example of this kind of computation in Ruby. There is a library known as "mathn" that has some interesting math extensions. For example, it has a class called Prime that can be used to generate the prime numbers in sequence:

        >> require "mathn"
        => true
        >> p = Prime.new
        => #<Prime:0xb7cfa794 @counts=[], @primes=[], @seed=1>
        >> p.next
        => 2
        >> p.next
        => 3
        >> p.next
        => 5
        >> p.next
        => 7
        >> 10.times {puts p.next}
        11
        13
        17
        19
        23
        29
        31
        37
        41
        43
        => 10

It has an each method that can compute an arbitrary number of primes. Obviously it doesn't precompute them. It computes them only as it needs them. We would have to include a call on break or return if we want to use it, as in this code, which computes the sum of the primes up to 10000:

        require "mathn"
        p = Prime.new
        sum = 0
        p.each do |n|
          break if n > 10000
          sum += n
        end
        puts sum

This reports that the sum of the primes up to 10000 is equal to 5736396.

I also briefly mentioned that wikipedia has a nice entry on probabilistic primality testing using a technique known as Miller-Rabin. I had considered giving this as a Ruby assignment, but I was thwarted by the fact that the wikipedia page includes sample Ruby code. I copied it and pasted it into irb and we found that we could compute the same sum of primes using the new prime? method:

        >> sum = 0
        => 0
        >> for n in 1..10000 do
        >>     sum += n if n.prime?
        >>   end
        => 1..10000
        >> sum
        => 5736396

Stuart Reges

Last modified: Fri Dec 4 15:25:34 PST 2009