Link

Data Structures

Table of contents

  1. Partial sums
  2. Dynamic connectivity
  3. Describing algorithms

Submit your solutions to the following questions as a PDF on Gradescope.

Partial sums

Given a double[] A containing N real numbers as input, the partial sums problem specifies the following API describing the basic operations that we need.

add(int i, double y)
Add the value y to the ith number.
partialSum(int i)
Return the sum of the first i numbers, A[0] through A[i - 1].

There are no insertions or deletions; the only change is to the values of the numbers. You may introduce one additional array of size N. Assume for simplicity that N is a power of 2.

For each of the following conditions, give a crisp and concise English description of a partial sums algorithm that meets the requirements.

  1. Runtime for add is in O(1) while partialSum is in O(N).
  2. Runtime for add is in O(N) while partialSum is in O(1).
  3. Runtime for add is in O(log N) while partialSum is in O(log N).

Dynamic connectivity

Given an int N specifying the total number of sites (indexed 0 through N - 1), the dynamic connectivity problem specifies the following API describing the basic operations that we need.

connect(int p, int q)
Add a connection between sites p and q.
isConnected(int p, int q)
Returns true if and only if sites p and q are in the same component.

The connect and isConnected operations model an equivalence relation, which means that it is:

  • Reflexive: p is connected to p.
  • Symmetric: If p is connected to q, then q is connected to p.
  • Transitive: If p is connected to q and q is connected to r, then p is connected to r.

You may introduce 2 arrays of size N.

Hint
Consider initializing each index of an array to the value of the corresponding site, e.g. initialize array index 17 to the value 17.

For each of the following conditions, give a crisp and concise English description of a dynamic connectivity algorithm that meets the requirements.

  1. Runtime for connect is in O(N) while isConnected is in O(1).
  2. Runtime for connect is in O(log N) while isConnected is in O(log N).

    Try first designing an algorithm that runs in O(N) for both operations. Then, add an invariant to ensure efficient connection of two components and eliminate worst-case linear-time array state.


Describing algorithms

A crisp and concise description of an algorithm provides enough detail for another CSE 332 student to implement the algorithm.

As an example of how to describe an algorithm, consider the following data type for implementing a least recently used (LRU) cache.1

An LRU cache is a data type that stores up to N distinct keys. If the data type is full when adding a new key to the cache, the LRU cache first removes the key that was least recently cached.

cache(Key key)
If there are N keys in the cache and the given key is not already in the cache, remove the key that was least recently used as an argument to cache and add the given key to the LRU cache.
inCache(Key key)
Returns true if and only if the key is in the LRU cache.

Suppose we want to design an LRU cache implementation where both operations usually complete in constant time, with the assumption of a good hash function that distributes keys evenly.

It’s not sufficient just to say, “Use a hash table and a linked list.” While this hints at a high-level approach, it’s not clear how the reader could implement the operations in the given time requirements.

A better description explains how the data structures can be applied to solve the problem.

We use two data structures: a doubly-linked list and a separate-chaining hash table.

  • A doubly-linked list containing the N keys in the cache, with the most-recently cached key at the front and the least-recently cached key at the end.
  • A separate-chaining hash table containing the N keys in the cache, where the value is a reference to the node in the doubly-linked list containing that key.

To implement inCache(Key key), use the hash table to determine whether the key is in the cache.

Implement cache(Key key) in three cases.

  1. If the key is already in the cache, move the corresponding linked list node from its current position to the front of the list.
  2. If the LRU cache is not yet full, add a new linked list node at the front of the list and create a corresponding entry in the hash table.
  3. Otherwise, remove the last node from the linked list and the corresponding entry from the hash table. Then, add a new node to the front of the linked list with the given key and a corresponding entry in the hash table.

Note that the complete case-by-case breakdown for cache isn’t entirely necessary. The reader can infer the second case from the description of the invariants defining the relationship between the two data structures. It can also be inferred that removing an item from the linked list should also remove the corresponding entry from the hash table, though here we choose to include it for clarity. If this clause is needed in many places, we can mention it once at the beginning when declaring the data structure invariants.

  1. Kevin Wayne. COS 226 Fall 2012 Midterm Exam. 2012. LRU cache. https://www.cs.princeton.edu/courses/archive/fall19/cos226/exams/mid-f12.pdf#page=8