Solutions to HW #2 CSE 326, Winter 2001 Donald Chinn 1. The idea is that the functions should each have two components. One component ensures that the functions are growing toward infinity (any such function will do). The second component is one that oscillates in such a way that it dominates the first component for a while. The two functions are constructed in way so that one function dominates for a while, then the other dominates for a while. Here's one possible pair of functions: f(N) = (1 + log N) + N * (sin N + 1) g(N) = (1 + log N) + N * (cos (N + pi/2) + 1) To see that f(N) = O(g(N)) is NOT true, suppose it were. Then there exist c and n_0 such that f(N) <= c g(N) for all N > n_0. Note that, starting at N = pi/2, sine and cosine oscillate in way such that (sin N + 1) and (cos(N + pi/2) + 1) have values of 0 and 2, respectively, every 2*pi radians. They 2 and 0, respectively, staring at N = 3*pi / 2 and repeating every 2*pi radians. Thus, for *any* number n, there is a number M > n where (sin M + 1) = 2 and cos (M + pi/2) + 1) = 0. We can choose such an M that is greater than n_0 and such that f(M) = (1 + log M) + 2*M > (1 + log M) + (c-1)*(1 + log M) = c*(1 + log M) = c*g(M). This choice of M contradicts our assumption that f(N) = O(g(N)). A symmetrical argument proves that g(N) = O(f(N)) is not true. Techincally speaking, N and n_0 are integers, so we might not be able to find an integer M so that, for example, sin N = 2. However, we can easily change the function from using sin N to using sin N * pi/2. Finally, there was nothing special about log N and N, except that log N = o(N). Any two functions with that property could have been used to define f(N) and g(N). 2. Assumptions: a. In the 10000 name table, assume that the frequent and infrequent names are distributed randomly throughout the table. (It might be the case that the frequent items just happen to be at locations in the array that cause binary search to end quickly, but we will NOT make that optimistic assumption here. Similarly, we will not make the pessimistic assumption that the frequent names are in "bad" locations in the array.) b. In both the one table or two table scheme, assume that half of the entries in the table will require the worst case number of iterations in binary search, that a quarter will go the worst case minus 1, etc. This will greatly simplify the calculations (and will be only slightly inaccurate, anyway). In binary search on the single 10000 item array, 5000 items will take about log 10000 iterations, 2500 items will take log 10000 - 1 iterations, etc. That is, the average number of iterations is: log 10000 ---- 1 \ 10000 ----- / ----- * (log 10000 - i + 1) 10000 ---- 2^i i=1 log 10000 ---- = \ log 10000 i / --------- - --- ---- 2^i 2^i i=1 ~= log 10000 - 1 = 12.29 In the two table case, the number of iterations it takes is about: 0.8 * (log 2000 - 2 + 1) + 0.2 * (log 2000 + (log 8000 - 2 + 1)) ~= 0.8 * (9.97) + 0.2 * (10.97 + 11.97) = 12.56 So, the two table scheme will in fact require more iterations through the loop of binary search. Therefore, the extra effort to code it is not justified. Even if we used ceiling(log 10000), ceiling(log 8000), and ceiling(log 2000) in the calculations above, the two table scheme would be only slightly better (about 0.5 iterations) than the one table scheme, which would not justify the extra code. The extra code involves checking whether the search in the frequent (small) table was successful and if not, then searching on the infrequent (big) table. The extra work in that will offset the 0.5 iterations in savings. The point here is that the calculations themselves are not as important as the method used to calculate/estimate the running times. 3. We use two iterators, one that advances one link per iteration (call it A) and one that advances two links per iteration (call it B). If there is no cycle in the linked list, then B will encounter NULL eventually. If there is a cycle, then A & B will get caught in the cycle and will eventually point to the same ListNode, since B is advancing one node faster than A is per iteration. So, the algorithm is to have two iterators, one advancing one link per iteration (A) and one advancing two links per iteration (B). If B is NULL, then return FALSE. If A and B ever point to the same ListNode, then we return TRUE. Otherwise, we advance the iterators. The code would look like this: bool List::HasCycle() const { ListItr a; ListItr b; a = first(); b = first(); while (true) { a.advance(); b.advance(); b.advance(); if (b.IsPastEnd()) return false; if (a == b) return true; } } where we have bool ListItr::operator== (const ListItr & rhs) const { return (current == rhs.current); } 4. Claim: Every N-node binary tree has N+1 NULL links as children. Proof: The proof of the claim is by induction on N. Basis step: N=1. The tree has one node and two NULL links as children. Induction step: Suppose the claim is true for all 1 <= m <= N, N >= 1. Then consider any (N+1)-node binary tree T. It has q NULL links. If we remove one leaf from T, then the resulting binary tree T' has two fewer NULL links (from the removed leaf), but its parent now has a NULL link it did not have in T. So, T has q-2+1 = q-1 NULL links as children. T' has N nodes, so by the induction hypothesis, T' has N+1 NULL links as children. Therefore, q-1 = N+1, and so q = (N+1)+1. Thus, every (N+1)-node binary tree has N+2 NULL links as children. 5. The idea is that preorder, postorder, and inorder traversals won't work, at least not by themselves. But, a breadth-first search (using a queue) will. Breadth-first search works by visiting all nodes at depth i before any node at depth i+1, for i >= 0. void BinaryTree::PrintInLevelOrder() { Queue q; BinaryNode node; if (root != NULL) q.Enqueue(root); while (!q.IsEmpty()) { node = q.Dequeue(); node.LevelOrder(q); } } void BinaryNode::LevelOrder(Queue & q) { Print(); if (left != NULL) q.Enqueue(left); if (right != NULL) q.Enqueue(right); } The way the algorithm works is that we start at the root and insert it into the queue. Then we process nodes in the queue until there are no more nodes in the queue. Whenever we process a node, we print out its name and then insert its children into the queue. Each node enters the queue exactly once (after the parent is processed) and exits exactly one (only children of nodes processed can enter the queue -- since the node's parent enters the queue only once, its child enters only once). Note that at any point in the algorithm, the queue consists of zero or more pointers to nodes at level i (for some i) at the head of the queue, followed by zero or more pointers to nodes at level i+1, and followed by no other items. Thus, for any i >= 0, all nodes of level i are printed before any node of level i+1 is. To process each node requires O(1) steps (enqueue, dequeue, and print are all constant time operations), so the entire algorithm takes O(N) time for an N-node binary tree. Another, more direct approach to this problem is to do a postorder search in order to determine the height h of the tree. Then we allocate an array of size h, where each element of the array is a pointer to a linked list of pointers to nodes of the tree. We now do any kind of traversal to visit every node again, keeping track along the way what depth we are at. When we encounter a node, we insert it into the linked list of the appropriate depth. This can be done in constant time per node because we know what depth we are at (and so we know how to index into the array) and we can insert into the head of the list. Finally, we iterate through our array of linked lists, starting at index 0, printing out each node. So, the first postorder traversal takes O(N) time and the second traversal visits each node in the tree once, taking a constant amount of time per visit. The final step take O(N) time. This gives a total of O(N) time for the entire algorithm.