Link Search Menu Expand Document

Content Moderation

Optimizing algorithms with a composition of data structures.

Table of contents

  1. Learning objectives
  2. Design
    1. Interface
    2. Reference implementation
    3. Implementation task
    4. Run, check, debug
  3. Analyze
    1. Affordance analysis
    2. Asymptotic analysis
  4. Critique
  5. Submission

Social media platforms rely on algorithms to draw attention to the user-generated content that is most likely to engage. Content moderation plays an understated but integral role in determining which pieces of content will be shown to the platform’s users. Unlike traditional publishers, online platforms are protected under Section 230 from civil liability for user-generated content.

No provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another information content provider.

Furthermore, if platforms choose to moderate, they are not held liable for their moderation decisions. Despite the centrality of content moderation, many social media platforms have marginalized and outsourced content moderation to third-party vendors. Content moderators review content flagged by machine learning systems and reported by users.

In this project, we will compare 4 implementations of priority queues for moderating a stream of user-generated content. To get started, join your project team on Canvas. One team member should Fork the starter workspace to create a new workspace and Share it with other team members’ UW emails.

Learning objectives

  • Design and compose multiple data structures to solve complex problems.
  • Analyze the affordances and values embodied within abstractions.
  • Critique the implications of content moderation on information society.

Design

Interface

The java.util standard library includes a binary heap PriorityQueue class. Unlike the Java collection interfaces Set, Map, and Queue, PriorityQueue is a class rather than an interface!

The elements of the priority queue are ordered according to their natural ordering, or by a Comparator provided at queue construction time, depending on which constructor is used. A priority queue does not permit null elements. A priority queue relying on natural ordering also does not permit insertion of non-comparable objects (doing so may result in ClassCastException).

The head of this queue is the least element with respect to the specified ordering. If multiple elements are tied for least value, the head is one of those elements – ties are broken arbitrarily. The queue retrieval operations poll, remove, peek, and element access the element at the head of the queue.

For this project, we will introduce the ExtrinsicMinPQ interface: a more complex version of the MinPQ interface presented in the lesson. Priority values are extrinsic to each item: rather than relying on the compareTo method, the priority value must be given as an argument to the add and changePriority methods.

void add(T item, double priority)
Adds an item with the given priority value.
boolean contains(T item)
Returns true if this priority queue contains the given item; false otherwise.
T peekMin()
Returns the item with the lowest priority value.
T removeMin()
Removes and returns the item with the lowest priority value.
void changePriority(T item, double priority)
Updates the priority of the given item.
int size()
Returns the number of items in this priority queue.
boolean isEmpty()
Returns true if this priority queue is empty; false otherwise. This is a default method provided by the interface that calls the size method.

Unlike the MinPQ interface presented in the lesson, ExtrinsicMinPQ can only contain unique items. There cannot be two copies of the same item, though it’s possible for multiple items to be assigned the same priority value. In this case, the priority queue may break ties arbitrarily.

The PriorityNode class represents an item together with its extrinsic priority value. This is useful for working with Java collections that need to specify particular data types. Rather than maintain two lists (one for each element and the other for each priority value), we can instead use a single list of priority nodes.

List<PriorityNode<T>> items = new ArrayList<>();
items.add(new PriorityNode<>(item, 0));

Two PriorityNode instances are considered equal if their items are equal. (Their priority values do not need to be equal!) The following contains call will return true, and the remove call will successfully remove the priority node added earlier.

items.contains(new PriorityNode<>(item, 1));
items.remove(new PriorityNode<>(item, 2));

Reference implementation

DoubleMapMinPQ
Each unique priority value is associated with the set of items that share that priority value in a red-black tree. Retrieving a lowest-priority item involves searching for the leftmost set of items in the tree and then choosing any element from that set. The red-black tree is supported by a HashMap that associates each item with its priority value in order to speed-up contains and changePriority.

Implementation task

Design and implement 3 other implementations of the ExtrinsicMinPQ interface.

UnsortedArrayMinPQ
Items are added to an array or ArrayList in an arbitrary order. Most operations need to search over the entire array or ArrayList.
HeapMinPQ
A priority queue that delegates all method calls to the Java PriorityQueue. Each instance of HeapMinPQ only contains one field assigned to a Java PriorityQueue instance, and every ExtrinsicMinPQ operation is implemented by calling the underlying PriorityQueue.
OptimizedHeapMinPQ
An enhanced binary heap priority queue supported by a HashMap that associates each item with its heap index to speed-up contains and changePriority. In order to access the heap array, it is necessary to implement our own custom binary heap. Identify the relevant portions of the MinPQ class and modify the code to efficiently implement the ExtrinsicMinPQ interface. Each heap array update will require a corresponding HashMap update.
Video
Explain one part of the project implementation that you’re particularly proud of developing.

Run, check, debug

Open Terminal and enter the following command to compile and run the Moderator class. Use the keyboard shortcut Ctrl+C to stop program execution at any time. Once the Moderator class has finished execution, the rm command removes the compiled class files. The clear command clears the console.

Warning
The provided datasets contain text that may be considered profane, vulgar, or offensive. The Moderator class prioritizes the most toxic content.
javac Moderator.java && java Moderator; rm *.class
clear

By default, the Moderator class uses the DoubleMapMinPQ reference implementation. Modify the class to test your own implementation for debugging purposes and check that it lines up with the matches returned by DoubleMapMinPQ.

For debugging, use print statements to output the state of important variables. Study this 4-minute video on debugging with print statements. Note how they spend almost all of the time diagnosing the problem and developing a thorough explanation for why the program is not working as expected. The actual bugfix only takes a few seconds of typing. This is normal practice for all programmers, though programmers who have seen similar bugs before will be able to identify and resolve problems more quickly.

To compare several implementations at once, run the provided ModeratorMultiTest, which validates the implementations against DoubleMapMinPQ. If some of your implementations are not ready for testing, comment them out of the implementations map.

javac ModeratorMultiTest.java && java ModeratorMultiTest; rm *.class

Analyze

Affordance analysis

Prioritizing review of the most “toxic” content alone is not a universal good for everyone. How does the prioritization of certain content affect the way society defines truth? What does that prioritization emphasize, and who does it benefit? Consider the positionality of these actors that interact with the platform.

  • Users of the platform, particularly the ways in which certain identities may be further marginalized.
  • Content moderators who are each reviewing several hundred pieces of toxic content every day.
  • Legal teams who want to mitigate government regulation and legislation that are not aligned with corporate interests.
  • Users who want to leverage the way certain content is prioritized in order to shape public opinion.
Video
Give an affordance analysis of priority queues for content moderation. Pick a couple conflicts from among these actors and describe potential modifications to the priority values (or the priority queue as a whole) that would help resolve the conflict by combining their values. Are modifications to the priority values or priority queue alone sufficient to address the conflicts? If they’re not sufficient, briefly describe another approach that might be better.

Asymptotic analysis

Video
Give a big-theta bound for the worst-case runtime of the removeMin and changePriority methods of each implementation, including DoubleMapMinPQ, with respect to N, the size of the priority queue. Explain the runtime of each implementation in 1 or 2 sentences while referencing the code.

PriorityQueue relies on a resizing array-representation binary heap. TreeSet and TreeMap rely on a self-balancing red-black tree. HashSet and HashMap rely on a resizing separate-chaining hash table with a tree bucket optimization that is applied when the size of a linked list bucket exceeds the TREEIFY_THRESHOLD (8 elements).

 * This map usually acts as a binned (bucketed) hash table, but
 * when bins get too large, they are transformed into bins of
 * TreeNodes, each structured similarly to those in
 * java.util.TreeMap. [...]

For hash tables, assume the hash function distributes elements evenly so that the size of each bin is constant. (Consider how this assumption affects the tree bin analysis.) For all array-representation data structures, assume that a removeMin or changePriority call will not require a resize.

Critique

Connect content moderation to discussions and opinions occurring in real-world contexts. Here are a few articles and essays to begin your research.

To Apply Machine Learning Responsibly, We Use It in Moderation
The New York Times comment moderation team raises important concerns about implicit bias in the machine-learning systems that rate content. How can the New York Times leverage algorithms that rely on machine learning outputs while mitigating risks?
Custodians of the Internet
What are platforms’ legal obligations to the public interest? If platforms operate an attention economy, then “content moderation constitutes the platform.”
Who Moderates the Social Media Giants? A Call to End Outsourcing
Without content moderation, “platforms would be inundated by harmful content.” What are the consequences of outsourcing content moderation to third-party vendors?
Video
Synthesize at least one of the above articles together with a couple additional sources. Examine the positionality of the various articles by considering some of the following questions (and any of your own questions). How does the author’s lived experiences lead them to their point of view? How do the distribution of benefits and harms affect different people and society?

Submission

The final deliverable for your project is a video no longer than about 10 minutes that addresses all of the tasks labeled Video

All project team members must be present in the video, which you can do by recording it over a Zoom meeting. The presentation video must be organized around presentation slides. Upload your video to the Canvas assignment. After the submission is complete, add a submission comment with two links:

  1. Your Ed Workspace.
  2. Your presentation slides.