Object Composition
Hash tables and affordance analysis.
Hash Tables
- Explain and trace hash table algorithms such as insertion and search.
- Evaluate how a hash table will behave in response to a given data type.
- Analyze the runtime of a hash table with a given bucket data structure.
Java’s TreeSet
and TreeMap
classes are implemented using red-black trees that guarantee logarithmic time for most individual element operations. This is a massive improvement over sequential search: a balanced search tree containing over 1 million elements has a height on the order of about 20!
But it turns out that we can do even better by studying Java’s HashSet
and HashMap
classes, which are implemented using hash tables. Under certain conditions, hash tables can have constant time for most individual element operations—regardless of whether your hash table contains 1 million, 1 billion, 1 trillion, or even more elements.
Data-indexed integer set case study
Arrays store data contiguously in memory where each array index is next to each other. Unlike linked lists, arrays do not need to follow references throughout memory. Indexing into an array is a constant time operation as a result.
DataIndexedIntegerSet
uses this idea of an array to implement a set of integers. It maintains a boolean[] present
of length 2 billion, one index for almost every non-negative integer. The boolean value stored at each index present[x]
represents whether or not the integer x
is in the set.
void add(int x)
- Adds
x
to the set by assigningpresent[x] = true
. boolean contains(int x)
- Returns the value of
present[x]
. void remove(int x)
- Removes
x
from the set by assigningpresent[x] = false
.
While this implementation is simple and fast, it takes a lot of memory to store the 2-billion length present
array—and it doesn’t even work with negative integers or other data types. How might we generalize this concept to support strings?
Data-indexed string set case study
Let’s design a DataIndexedStringSet
using a 27-element array: 1 array index for each lowercase letter in the English alphabet starting at index 1. A word starting with the letter ‘a’ goes to index 1, ‘b’ goes to index 2, ‘c’ goes to index 3, etc. To study the behavior of this program, let’s consider these three test cases.
DataIndexedStringSet set = new DataIndexedStringSet();
set.add("cat");
System.out.println(set.contains("car"));
DataIndexedStringSet set = new DataIndexedStringSet();
set.add("cat");
set.add("car");
set.remove("cat");
System.out.println(set.contains("car"));
DataIndexedStringSet set = new DataIndexedStringSet();
set.add("cat");
set.add("dog");
System.out.println(set.contains("dog"));
Some of the test cases work as we would expect a set to work, but other test cases don’t work.
Which of these test cases will behave correctly?
Only the third test case works as expected. “cat” and “car” collide because they share the same index, so changes to “cat” will also affect “car” and vice versa. This is unexpected because the set should treat the two strings as different from each other, but it turns out that changing the state of one will also change the state of the other.
A collision occurs when two or more elements share the same index. One way we can avoid collisions in a DataIndexedStringSet
is to be more careful about how we assign each English word. If we make sure every English word has its own unique index, then we can avoid collisions!
Suppose the string “a” goes to 1, “b” goes to 2, “c” goes to 3, …, and “z” goes to 26. Then the string “aa” should go to 27, “ab” to 28, “ac” to 29, and so forth. Since there are 26 letters in the English alphabet, we call this enumeration scheme base 26. Each string gets its own number, so as strings get longer and longer, the number assigned to each string grows larger and larger as well.
Using base 26 numbering, we can compute the index for “cat” as (3 ∙ 262) + (1 ∙ 261) + (20 ∙ 260) = 2074. Breaking down this equation, we see that “cat” is a 3-letter word so there are 3 values added together:
- (3 ∙ 262)
- 3 represents the letter ‘c’, which is the 3rd letter in the alphabet.
- 26 is raised to the 2nd power for the two subsequent letters.
- (1 ∙ 261)
- 1 represents the letter ‘a’, which is the 1st letter in the alphabet.
- 26 is raised to the 1st power for the one subsequent letter.
- (20 ∙ 260)
- 20 represents the letter ‘t’, which is the 20th letter in the alphabet.
- 26 is raised to the 0th power for the zero subsequent letters.
So long as we pick a base that’s at least 26, this hash function is guaranteed to assign each lowercase English word a unique integer. But it doesn’t work with uppercase characters or punctuation. Furthermore, if we want to support other languages than English, we’ll need an even larger base. For example, there are 40,959 characters in the Chinese language alone. If we wanted to support all the possible characters in the Java built-in char
type—which includes emojis, punctuation, and characters across many languages—we would need base 65536. Hash values generated this way will grow exponentially with respect to the length of the string!
Separate chaining hash tables
In practice, collisions are unavoidable. Instead of trying to avoid collisions by assigning a unique number to each element, separate chaining hash tables handle collisions by replacing the boolean[] present
with an array of buckets, where each bucket can store zero or more elements.
Since separate chaining addresses collisions, we can now use smaller arrays. Instead of directly using the hash code as the index into the array, take the modulus of the hash code to compute an index into the array. Separate chaining turns the process from a single step to multiple steps:
- Call the hash function on the given element to get a hash code, aka the number for that element.
- Compute the bucket index by taking the modulus of the hash code by the array length.
- Search the bucket for the given element.
Separate chaining comes with a cost. The worst case runtime for adding, removing, or finding an item is now in Θ(Q) where Q is the size of the largest bucket.
In general, the runtime of a separate chaining hash table is determined by a number of factors. We’ll often need to ask a couple of questions about the hash function and the hash table before we can properly analyze or predict the data structure performance.
- What is the hash function’s likelihood of collisions?
- What is the strategy for resizing the hash table.
- What is the runtime for searching the bucket data structure.
To make good on our promise of a worst case constant time data structure for sets and maps, here’s the assumptions we need to make.
- Assume the hash function distributes items uniformly.
- Assume the hash table resizes by doubling, tripling, etc. upon exceeding a constant load factor.
- Assume we ignore the time it takes to occasionally resize the hash table.
Affordance Analysis
- Describe how abstractions can embody values and structure social relations.
- Identify the affordances of a program abstraction such as a class or interface.
- Evaluate affordances by applying the 3 value-sensitive design principles.
In the previous module focusing on implementing tree data structures and studying sorting algorithms, we learned methods for runtime analysis in order to help us evaluate the quality of an idea. The goal of runtime analysis was to produce a (1) predictive and (2) easy-to-compare description of the running time of an algorithm.
For the remaining half of the course, we’ll be studying a different category of data structures and algorithms called graphs. Unlike the study of tree data structures and sorting algorithms that focused on abstract inputs, the study of graph data structures and graph algorithms is often closely coupled with discussions about real-world data and problems. Runtime analysis will remain an important tool, but we also need methods of analysis to address questions at the intersection of algorithm design and the social responsibility that comes with it.
Affordance analysis is a method of analysis that measures which actions and outcomes that an abstraction makes more likely (affordances), and then evaluates the effects of these affordances on people and power hierarchies.
- Affordances
- Properties of objects that make specific outcomes more likely through the possible actions that it presents to users. In the context of designed physical objects, for example, a screwdriver affords turning screws (what it was designed for) but also affords other uses like digging narrow holes for planting seeds.
- Likewise, disaffordances are properties that make certain outcomes less likely through the possible actions that it presents to users. A screwdriver disaffords ladling soup.
Affordance analysis is a process composed of two steps. First, identify affordances for the specific outcomes that they make more likely. Then, evaluate affordances according to how they shape social relations in particular applications.
Identify affordances
In Java, a class
or interface
defines public methods that afford certain actions. Many of these actions will seem quite value-neutral when defined in isolation from potential applications. For example:
- We can
add
,remove
, or check for an item at a given index in a list. - We can
add
,removeMin
, orpeekMin
from a priority queue according to their priority values. - In
Autocomplete
, theallMatches
method returns a list of all terms that match the given prefix.
In “The Limits of Correctness” (1985), Brian Cantwell Smith observes that a “correct” program is really only correct with respect to the programmer’s model of the problem:
When you design and build a computer system, you first formulate a model of the problem you want it to solve, and then construct the computer program in its terms. For example, if you were to design a medical system to administer drug therapy, you would need to model a variety of things: the patient, the drug, the absorption rate, the desired balance between therapy and toxicity, and so on and so forth. The absorption rate might be modelled as a number proportional to the patient’s weight, or proportional to body surface area, or as some more complex function of weight, age, and sex.
The choice of which model to use—such as (1) patient weight, (2) body surface area, or (3) weight + age + sex—encodes ideas about what is relevant or irrelevant. The results of software programs are determined by what and how we choose to model the problem.
Programming is the manipulation of this model by using calculation to change the rules and representations “at some particular level of abstraction […] So the drug model mentioned above would probably pay attention to the patients’ weight, but ignore their tastes in music.” When we choose to use certain data structures or algorithms, we are choosing to the model the problem in a certain way at a particular level of abstraction. We decide what is relevant or irrelevant in the model.
For example, we might ask: What are some affordances of comparison sorting algorithms? In comparison sorting, elements are rearranged according to an ordering relation defined by the compareTo
method. It is assumed that these pairwise comparisons can be used to order all the elements on a line from least to greatest.
A Science and Technology Studies (STS) perspective might even suggest that sorting in and of itself—before any specific examples are introduced—is value-laden because sorting implies a way to order items on a line. Implementing compareTo
is like assigning student grades: it takes all the complex dimensions of a human being and reduces them to a number compared against other peoples’ numbers. Using a sorting algorithm requires a programmer to select the aspects of an object that are most desirable for the task. This can be especially morally problematic when it is done without consent.
Value-sensitive design
Data is a key part of affordance analysis. But you might have noticed that your computing education rarely actually considers data on its own merit:
- When verifying the correctness of your programs, we write code that generates dummy data or reads an example dataset without explanation about the data, who or what it aims to represent, and why the data was chosen.
- When analyzing the running time of your programs, data matters to the extent that it helps us in case analysis, asymptotic analysis, or experimental analysis. Data is reduced to variables like N or abstract inputs.
To evaluate affordances, we will need to think about how each affordance interacts with the world by thinking through more realistic data and problems.
It helps to understand how managers, designers, and engineers are already thinking about the social implications of technologies. One framework for thinking through design is value-sensitive design (VSD), a “theory and method that accounts for human values in a principled and structured manner throughout the design process.”
Technologies that benefit some people can harm others. Batya Friedman, a Professor in the iSchool and one of the inventors of the method, surfaces this in a value scenario for SafetyNet, an imaginary service that helps people avoid “unsafe” areas.
SafetyNet is not just a hypothetical either. David Ingram and Cyrus Farivar write for NBC News: “Inside Citizen: The public safety app pushing surveillance boundaries.” It wouldn’t be hard to imagine a future where UW Alert partners with a tech company to create an app with functions similar to SafetyNet.
This SafetyNet value scenario imagines what the future could hold for different peoples and how the introduction of a technology changes how people behave. Batya’s value scenario highlights several different perspectives in evaluating a system by:
- foregrounding human values (what is important to people in their lives),
- considering the impacts of pervasive uptake (adoption, usage) of a technology,
- and identifying direct and indirect stakeholders (people with concern or interest).
What kinds of human values could we consider? Batya suggests, as a starting point: privacy, trust, security, safety, community, freedom from bias, autonomy, identity, ownership, freedom of expression, dignity, calmness, compassion, respect, peace, wildness, sustainability, healing.
One key idea in this framework is the concept of value tensions: that human values do not exist in isolation and instead involve a balance between values that can come into tension when considering different individuals, groups, or societies. The goal of thinking through value tensions is to recognize the diverse values that exist in tension and work toward solutions that can engage with all of them.
Evaluate affordances
Whereas Batya’s SafetyNet value scenario focuses more on the large-scale overall product design, in this class, we focus on the details of how programs are implemented. As designers and implementers of interfaces, we might wonder about our responsibility in thinking through social implications.
Earlier, we identified that in the Autocomplete
interface, the allMatches
method returns a list of all terms that match the given prefix. Note that the method does not guarantee any particular order of results.
List<CharSequence> allMatches(CharSequence prefix)
How might we evaluate autocomplete in Husky Maps using the value-sensitive design perspectives?
- Human values
- The list of results produced by autocomplete provides users with everything that matches their search query. But the
Autocomplete
interface does not personalize these results at all: given a search query, everyone sees the same results. There is no parameter or field for storing the personalization information. Results could be more helpful if they were tuned to your profile.- Personalization, however, is not always a win. Personalization requires personal data that, in some cases, people may not actually want to disclose even if they “Agree” to the terms and conditions of using an app. Also, the value of convenience and helpfulness to the individual user can conflict with the goal of everyone receiving the same information. Hyper-personalized content feeds can lead to two people receiving drastically different information about the world.
- Pervasive uptake
- When everyone sees the same results, there can be material impacts if money and attention is directed in ways that reinforce existing social inequities. If everyone follows the same first-few suggestions, then we drive money and attention to those few places. This can also have downstream effects on public policy if it changes our perceptions of what we think of as important or valuable places. For example, public spaces like parks might lose our attention (and, therefore, public funding) if our app never recommended them.
- Direct and indirect stakeholders
- We can certainly imagine different situations around who might receive benefits or harms from the pervasive uptake of this technology. You can draw different examples by thinking about how different datasets could lead to different outcomes. Another perspective is to think about who does the work of managing or inputting the data. Is the data on places provided by the government, by individual owners of each place, by volunteers gathering data from the world? How might people want to change how places are represented or named in the app? Who is given the right to make these decisions?
You could also apply this same set of perspectives but instead focus on sorting the output list. Sorting requires selecting an abstraction for objects so that they can be ordered on a line. Husky Maps sorts autocomplete suggestions using importance values from the OpenStreetMap project. Combining these two pieces of information (the affordances of sorting and the example dataset) can lead to another affordance analysis.
Thinking through each perspective separately can help you assemble a more cohesive argument that considers the different potential impacts of software design. But in your final affordance analysis, you might find that your arguments combine ideas from 2 or all 3 perspectives. This recognizes the interconnectedness between the different ideas in your analysis.
Beyond value-sensitive design
Value-sensitive design, however, is just one theory or method for evaluating a technology. In Design Justice, Sasha Costanza-Chock critiques VSD:
VSD is descriptive rather than normative: it urges designers to be intentional about encoding values in designed systems but does not propose any particular set of values at all, let alone an intersectional understanding of racial, gender, disability, economic, environmental, and decolonial justice. VSD never questions the standpoint of the professional designer, doesn’t call for community inclusion in the design process (let alone community accountability or control), and doesn’t require an impact analysis of the distribution of material and symbolic benefits that are generated through design. Values are treated as disembodied abstractions, to be codified in libraries from which designers might draw to inform project requirements. In other words, in VSD we are meant to imagine that incorporating values into design can be accomplished largely by well-meaning expert designers.
VSD helped us think about data and problems on their own merit rather than as abstract concepts. But, in presenting evaluation as a process for designers to just think-through and analyze on their own—without consulting real people—values are treated as abstract concepts rather than situated in real human experiences.
If you’d like to learn more approaches to evaluating affordances, consider taking courses in (1) human–computer interaction, (2) qualitative research methods, or (3) community-engaged research.