CSE326 Spring 2002 Quiz Section

CSE326 Spring 2002 Quiz Section

GROUP WORK: Scenarios for 5/23/02

1. The system you are designing requires a sort routine. Unfortunately, memory usage is at such a high cost on this machine (say it is a small, embedded device) that you cannot afford to have such (if any overhead) in your sorting algorithm.

Which sorting routine would you choose?

a) Quick Sort

b) Insertion Sort

c) Merge Sort

d) Heap Sort

Discussion Points:

Quick Sort:

- very fast in practice

- recursion costs memory, but tail recursion or iterative quicksort could make this cost acceptable

Insertion Sort:

- simple mathematical operations

- only constant memory overhead

- no recursion

- is O(n²) in the worst-case, but maybe our data comes presorted

- if n is small, this might still beat an n log n algorithm

Merge Sort:

- uses a lot of memory

- if done in-place, very efficient (guaranteed O(n log n)).

Heap Sort:

- simple mathematical operations (only compares, bit shifts, and increments)

- guaranteed O(n log n)

- only constant memory overhead

- no recursion

2. You need a data structure with the following functionality. Items are inserted, looked up, and removed on a fairly frequent basis. More importantly, every hour, in order to save space, you remove the least used data from the structure. For example, if data A was accessed only once in the past hour, you might want to remove it.

Assuming it is up to you to decide what removing old data means, which data structure is a good starting point for you to work from?

a) AVL Tree

b) Sorted List

c) Splay Tree

d) Hash Table

Discussion Points:

- This depends a lot on how we define what old data is. Is it what we accessed most recently or just data that is old.

- Splay Trees intrinsically place the least used data deeper in the tree. When cleaning, one could just delete all nodes beneath a certain depth.

- Hash Tables allow for very efficient basic operations (insert, find, remove, etc.) The only issue is removing the last used items. Assuming each is timestamped with the last time it was used, one could traverse across every item in the table and remove what is “old.” While this is costly, it’s a rare operation.

- AVL trees allow for efficient operation, but maintaining the balance after deleting several nodes at once could be costly.

3. We propose a new operation to the dictionary ADT. RangeRestrict(x,y) should remove all items in the dictionary whose values are less than x or greater than y. For example, if the dictionary contains real numbers, after a call to RangeRestrict, only the numbers in the closed interval [x,y] would be in the dictionary. You may assume that x and y are in the dictionary.

Which data structure would provide an efficient implementation for both the traditional dictionary operations as well as RangeRestriction?

a) Sorted List

b) Hash Table

c) Splay Trees

d) AVL Trees

Discussion Points:

- In a tree with the binary search tree property, the following is true: the left subtree of x and the right subtree of y are what you want to delete.

- Balanced trees offer fast implementation of the traditional dictionary operations.

- With a Splay Tree, one can do a Find(x), DeleteLeftSubtree(x), Find(y), DeleteRightSubtree(y). Very efficient.

- With an AVL tree, recovering the balance of the tree after a RangeRestrict does not seem easily accomplishable (unless we do a lazy removal).

- Hashtables offer fast basic operations, but a RangeRestrict would require traversing across the entire table. This remove could also seriously slow down probes if open addressing is used.