Solutions to HW #2
CSE 326, Winter 2001
Donald Chinn


1. The idea is that the functions should each have
two components.  One component ensures that the functions
are growing toward infinity (any such function will do).
The second component is one that oscillates in such a way
that it dominates the first component for a while.  The two
functions are constructed in way so that one function
dominates for a while, then the other dominates for a while.

Here's one possible pair of functions:
f(N) = (1 + log N) + N * (sin N + 1)
g(N) = (1 + log N) + N * (cos (N + pi/2) + 1)

To see that f(N) = O(g(N)) is NOT true, suppose it were.
Then there exist c and n_0 such that f(N) <= c g(N) for
all N > n_0.

Note that, starting at N = pi/2, sine and cosine oscillate
in way such that (sin N + 1) and (cos(N + pi/2) + 1) have
values of 0 and 2, respectively, every 2*pi radians.  They
2 and 0, respectively, staring at N = 3*pi / 2 and repeating
every 2*pi radians.

Thus, for *any* number n, there is a number M > n where
(sin M + 1) = 2 and cos (M + pi/2) + 1) = 0.  We can choose
such an M that is greater than n_0 and such that
f(M) = (1 + log M) + 2*M
     > (1 + log M) + (c-1)*(1 + log M)
     = c*(1 + log M) = c*g(M).
This choice of M contradicts our assumption that f(N) = O(g(N)).

A symmetrical argument proves that g(N) = O(f(N)) is not true.

Techincally speaking, N and n_0 are integers, so we might not
be able to find an integer M so that, for example, sin N = 2.
However, we can easily change the function from using sin N
to using sin N * pi/2.

Finally, there was nothing special about log N and N, except
that log N = o(N).  Any two functions with that property
could have been used to define f(N) and g(N).


2.
Assumptions:
a. In the 10000 name table, assume that the frequent
and infrequent names are distributed randomly throughout
the table.  (It might be the case that the frequent items
just happen to be at locations in the array that cause
binary search to end quickly, but we will NOT make that
optimistic assumption here.  Similarly, we will not make
the pessimistic assumption that the frequent names
are in "bad" locations in the array.)

b. In both the one table or two table scheme,
assume that half of the entries in the table will require
the worst case number of iterations in binary search,
that a quarter will go the worst case minus 1, etc.
This will greatly simplify the calculations (and will
be only slightly inaccurate, anyway).

In binary search on the single 10000 item array, 5000
items will take about log 10000 iterations, 2500 items
will take log 10000 - 1 iterations, etc.  That is, the
average number of iterations is:

     log 10000
       ----
  1    \       10000
-----  /       ----- *  (log 10000 - i + 1)
10000  ----     2^i
       i=1

  log 10000
    ----
=   \       log 10000     i
    /       ---------  - ---
    ----       2^i       2^i
    i=1

~= log 10000 - 1
 = 12.29

In the two table case, the number of iterations it takes
is about:
0.8 * (log 2000 - 2 + 1) + 0.2 * (log 2000 + (log 8000 - 2 + 1))
~= 0.8 * (9.97) + 0.2 * (10.97 + 11.97)
 = 12.56

So, the two table scheme will in fact require more iterations
through the loop of binary search.  Therefore, the extra
effort to code it is not justified.

Even if we used ceiling(log 10000), ceiling(log 8000), and
ceiling(log 2000) in the calculations above, the two table
scheme would be only slightly better (about 0.5 iterations)
than the one table scheme, which would not justify the extra
code.  The extra code involves checking whether the search
in the frequent (small) table was successful and if not,
then searching on the infrequent (big) table.  The extra
work in that will offset the 0.5 iterations in savings.

The point here is that the calculations themselves are
not as important as the method used to calculate/estimate
the running times.


3. We use two iterators, one that advances one link per
iteration (call it A) and one that advances two links per
iteration (call it B).

If there is no cycle in the linked list, then B will
encounter NULL eventually.
If there is a cycle, then A & B will get caught in the
cycle and will eventually point to the same ListNode, since
B is advancing one node faster than A is per iteration.

So, the algorithm is to have two iterators, one advancing
one link per iteration (A) and one advancing two links per
iteration (B).  If B is NULL, then return FALSE.  If A and
B ever point to the same ListNode, then we return TRUE.
Otherwise, we advance the iterators.

The code would look like this:

bool List<Object>::HasCycle() const
{
  ListItr<Object> a;
  ListItr<Object> b;

  a = first();
  b = first();
  while (true)
  {
    a.advance();
    b.advance();
    b.advance();

    if (b.IsPastEnd())
      return false;
    if (a == b)
      return true;
  }
}

where we have

bool ListItr<Object>::operator== (const ListItr<Object> & rhs) const
{
  return (current == rhs.current);
}


4.
Claim: Every N-node binary tree has N+1 NULL links as children.
Proof: The proof of the claim is by induction on N.
  Basis step:  N=1.  The tree has one node and two NULL links
    as children.
  Induction step: Suppose the claim is true for all 1 <= m <= N,
    N >= 1.  Then consider any (N+1)-node binary tree T.  It has
    q NULL links.  If we remove one leaf from T, then the
    resulting binary tree T' has two fewer NULL links (from the
    removed leaf), but its parent now has a NULL link it did not
    have in T.  So, T has q-2+1 = q-1 NULL links as children.

    T' has N nodes, so by the induction hypothesis, T' has N+1
    NULL links as children.  Therefore, q-1 = N+1, and so
    q = (N+1)+1.  Thus, every (N+1)-node binary tree has N+2
    NULL links as children.


5. The idea is that preorder, postorder, and inorder traversals
won't work, at least not by themselves.  But, a breadth-first
search (using a queue) will. Breadth-first search works
by visiting all nodes at depth i before any node at depth i+1,
for i >= 0.

void BinaryTree<Object>::PrintInLevelOrder()
{
  Queue<BinaryNode *> q;
  BinaryNode<Object> node;

  if (root != NULL)
    q.Enqueue(root);

  while (!q.IsEmpty())
  {
    node = q.Dequeue();
    node.LevelOrder(q);
  }
}

void BinaryNode<Object>::LevelOrder(Queue<BinaryNode *> & q)
{
  Print();
  if (left != NULL)
    q.Enqueue(left);
  if (right != NULL)
    q.Enqueue(right);
}

The way the algorithm works is that we start at the root
and insert it into the queue.  Then we process nodes in
the queue until there are no more nodes in the queue.
Whenever we process a node, we print out its name and then
insert its children into the queue.

Each node enters the queue exactly once (after the parent
is processed) and exits exactly one (only children of nodes
processed can enter the queue -- since the node's parent
enters the queue only once, its child enters only once).

Note that at any point in the algorithm, the queue consists
of zero or more pointers to nodes at level i (for some i)
at the head of the queue, followed by zero or more pointers
to nodes at level i+1, and followed by no other items.
Thus, for any i >= 0, all nodes of level i are printed before
any node of level i+1 is.

To process each node requires O(1) steps (enqueue, dequeue,
and print are all constant time operations), so the entire
algorithm takes O(N) time for an N-node binary tree.


Another, more direct approach to this problem is to do
a postorder search in order to determine the height h of
the tree.  Then we allocate an array of size h, where
each element of the array is a pointer to a linked list
of pointers to nodes of the tree.

We now do any kind of traversal to visit every node again,
keeping track along the way what depth we are at.  When
we encounter a node, we insert it into the linked list of
the appropriate depth.  This can be done in constant time
per node because we know what depth we are at (and so we
know how to index into the array) and we can insert into
the head of the list.

Finally, we iterate through our array of linked lists,
starting at index 0, printing out each node.

So, the first postorder traversal takes O(N) time and the
second traversal visits each node in the tree once, taking
a constant amount of time per visit.  The final step take
O(N) time.  This gives a total of O(N) time for the
entire algorithm.