Optimizing algorithms with a composition of data structures.
- Learning objectives
Social media platforms rely on algorithms to draw attention to the user-generated content that is most likely to engage. Content moderation plays an understated but integral role in determining which pieces of content will be shown to the platform’s users. Unlike traditional publishers, online platforms are protected under Section 230 from civil liability for user-generated content.
No provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another information content provider.
Furthermore, if platforms choose to moderate, they are not held liable for their moderation decisions. Despite the centrality of content moderation, many social media platforms have marginalized and outsourced content moderation to third-party vendors. Content moderators review content flagged by machine learning systems and reported by users.
In this project, we will compare 4 implementations of priority queues for moderating a stream of user-generated content. To get started, join your project team on Canvas. One team member should Fork the starter workspace to create a new workspace and Share it with other team members’ UW emails.
- Design and compose multiple data structures to solve complex problems.
- Analyze the affordances and values embodied within abstractions.
- Critique the implications of content moderation on information society.
java.util standard library includes a binary heap
PriorityQueue class. Unlike the Java collection interfaces
PriorityQueue is a class rather than an interface!
The elements of the priority queue are ordered according to their natural ordering, or by a
Comparatorprovided at queue construction time, depending on which constructor is used. A priority queue does not permit
nullelements. A priority queue relying on natural ordering also does not permit insertion of non-comparable objects (doing so may result in
The head of this queue is the least element with respect to the specified ordering. If multiple elements are tied for least value, the head is one of those elements – ties are broken arbitrarily. The queue retrieval operations
elementaccess the element at the head of the queue.
For this project, we will introduce the
ExtrinsicMinPQ interface: a more complex version of the
MinPQ interface presented in the lesson. Priority values are extrinsic to each item: rather than relying on the
compareTo method, the priority value must be given as an argument to the
void add(T item, double priority)
- Adds an item with the given priority value.
boolean contains(T item)
- Returns true if this priority queue contains the given item; false otherwise.
- Returns the item with the lowest priority value.
- Removes and returns the item with the lowest priority value.
void changePriority(T item, double priority)
- Updates the priority of the given item.
- Returns the number of items in this priority queue.
- Returns true if this priority queue is empty; false otherwise. This is a default method provided by the interface that calls the
MinPQ interface presented in the lesson,
ExtrinsicMinPQ can only contain unique items. There cannot be two copies of the same item, though it’s possible for multiple items to be assigned the same priority value. In this case, the priority queue may break ties arbitrarily.
PriorityNode class represents an item together with its extrinsic priority value. This is useful for working with Java collections that need to specify particular data types. Rather than maintain two lists (one for each element and the other for each priority value), we can instead use a single list of priority nodes.
List<PriorityNode<T>> items = new ArrayList<>(); items.add(new PriorityNode<>(item, 0));
PriorityNode instances are considered equal if their items are equal. (Their
priority values do not need to be equal!) The following
contains call will return
true, and the
remove call will successfully remove the priority node added earlier.
items.contains(new PriorityNode<>(item, 1)); items.remove(new PriorityNode<>(item, 2));
- Each unique priority value is associated with the set of items that share that priority value in a red-black tree. Retrieving a lowest-priority item involves searching for the leftmost set of items in the tree and then choosing any element from that set. The red-black tree is supported by a
HashMapthat associates each item with its priority value in order to speed-up
Design and implement 3 other implementations of the
- Items are added to an array or
ArrayListin an arbitrary order. Most operations need to search over the entire array or
- A priority queue that delegates all method calls to the Java
PriorityQueue. Each instance of
HeapMinPQonly contains one field assigned to a Java
PriorityQueueinstance, and every
ExtrinsicMinPQoperation is implemented by calling the underlying
- An enhanced binary heap priority queue supported by a
HashMapthat associates each item with its heap index to speed-up
changePriority. In order to access the heap array, it is necessary to implement our own custom binary heap. Identify the relevant portions of the
MinPQclass and modify the code to efficiently implement the
ExtrinsicMinPQinterface. Each heap array update will require a corresponding
- Explain one part of the project implementation that you’re particularly proud of developing.
Open Terminal and enter the following command to compile and run the
Moderator class. Use the keyboard shortcut Ctrl+C to stop program execution at any time. Once the
Moderator class has finished execution, the
rm command removes the compiled class files. The
clear command clears the console.
- The provided datasets contain text that may be considered profane, vulgar, or offensive. The
Moderatorclass prioritizes the most toxic content.
javac Moderator.java && java Moderator; rm *.class clear
By default, the
Moderator class uses the
DoubleMapMinPQ reference implementation. Modify the class to test your own implementation for debugging purposes and check that it lines up with the matches returned by
For debugging, use print statements to output the state of important variables. Study this 4-minute video on debugging with print statements. Note how they spend almost all of the time diagnosing the problem and developing a thorough explanation for why the program is not working as expected. The actual bugfix only takes a few seconds of typing. This is normal practice for all programmers, though programmers who have seen similar bugs before will be able to identify and resolve problems more quickly.
To compare several implementations at once, run the provided
ModeratorMultiTest, which validates the implementations against
DoubleMapMinPQ. If some of your implementations are not ready for testing, comment them out of the
javac ModeratorMultiTest.java && java ModeratorMultiTest; rm *.class
Prioritizing review of the most “toxic” content alone is not a universal good for everyone. How does the prioritization of certain content affect the way society defines truth? What does that prioritization emphasize, and who does it benefit? Consider the positionality of these actors that interact with the platform.
- Users of the platform, particularly the ways in which certain identities may be further marginalized.
- Content moderators who are each reviewing several hundred pieces of toxic content every day.
- Legal teams who want to mitigate government regulation and legislation that are not aligned with corporate interests.
- Users who want to leverage the way certain content is prioritized in order to shape public opinion.
- Give an affordance analysis of priority queues for content moderation. Pick a couple conflicts from among these actors and describe potential modifications to the priority values (or the priority queue as a whole) that would help resolve the conflict by combining their values. Are modifications to the priority values or priority queue alone sufficient to address the conflicts? If they’re not sufficient, briefly describe another approach that might be better.
- Give a big-theta bound for the worst-case runtime of the
changePrioritymethods of each implementation, including
DoubleMapMinPQ, with respect to N, the size of the priority queue. Explain the runtime of each implementation in 1 or 2 sentences while referencing the code.
PriorityQueue relies on a resizing array-representation binary heap.
TreeMap rely on a self-balancing red-black tree.
HashMap rely on a resizing separate-chaining hash table with a tree bucket optimization that is applied when the size of a linked list bucket exceeds the
TREEIFY_THRESHOLD (8 elements).
* This map usually acts as a binned (bucketed) hash table, but * when bins get too large, they are transformed into bins of * TreeNodes, each structured similarly to those in * java.util.TreeMap. [...]
For hash tables, assume the hash function distributes elements evenly so that the size of each bin is constant. (Consider how this assumption affects the tree bin analysis.) For all array-representation data structures, assume that a
changePriority call will not require a resize.
Connect content moderation to discussions and opinions occurring in real-world contexts. Here are a few articles and essays to begin your research.
- To Apply Machine Learning Responsibly, We Use It in Moderation
- The New York Times comment moderation team raises important concerns about implicit bias in the machine-learning systems that rate content. How can the New York Times leverage algorithms that rely on machine learning outputs while mitigating risks?
- Custodians of the Internet
- What are platforms’ legal obligations to the public interest? If platforms operate an attention economy, then “content moderation constitutes the platform.”
- Who Moderates the Social Media Giants? A Call to End Outsourcing
- Without content moderation, “platforms would be inundated by harmful content.” What are the consequences of outsourcing content moderation to third-party vendors?
- Synthesize at least one of the above articles together with a couple additional sources. Examine the positionality of the various articles by considering some of the following questions (and any of your own questions). How does the author’s lived experiences lead them to their point of view? How do the distribution of benefits and harms affect different people and society?
The final deliverable for your project is a video no longer than about 10 minutes that addresses all of the tasks labeled Video
All project team members must be present in the video, which you can do by recording it over a Zoom meeting. The presentation video must be organized around presentation slides. Upload your video to the Canvas assignment. After the submission is complete, add a submission comment with two links:
- Your Ed Workspace.
- Your presentation slides.