Link Search Menu Expand Document

Binary Trees

Table of contents

  1. Text Classifier
  2. Binary trees
  3. Traversals

Text Classifier

Online abuse and harassment stops people from engaging in conversation. One area of focus is the study of negative online behaviors, such as toxic comments: user-written comments that are rude, disrespectful or otherwise likely to make someone leave a discussion. Platforms struggle to effectively facilitate conversations, leading many communities to limit or completely shut down user comments. In 2018, the Conversation AI team, a research initiative founded by Jigsaw and Google (both part of Alphabet), organized a public competition called the Toxic Comment Classification Challenge to build better machine learning systems for detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate.

Warning

Machine learning models are trained on human-selected and human-generated datasets. Such models reproduce the bias inherent in the data. The included data representation algorithms also encode overly-simplistic assumptions about natural language by treating each occurrence of a word as independent from all other words in the text. Any usage of a word, no matter the context, is considered equally toxic. Don’t use this in a real system!

When the Conversation AI team first built toxicity models, they found that the models incorrectly learned to associate the names of frequently attacked identities with toxicity. In 2019, the Conversation AI team ran another competition about Unintended Bias in Toxicity Classification, focusing on building models that detect toxicity across a range of diverse conversations.

The provided datasets contain text that may be considered profane, vulgar, or offensive.

Text Classifier

Nov 9
Binary Trees
BJP 17.1, 17.2
  1. Run pre-order, in-order, and post-order traversals on a binary tree.
  2. Define methods that recursively traverse binary trees.
  3. Define methods that recursively modify binary tree node data values.
Nov 10
SectionBinary Trees
Nov 11
Holiday
Nov 12
SectionMore Binary Trees
Nov 13
Binary Search Trees
BJP 17.3, 17.4
  1. Apply the binary search tree invariant to search for values and add new values.
  2. Apply the x = change(x) pattern to recursively change binary tree references.
  3. Explain why binary search trees would be preferred over binary search on arrays.

Toxic comment classification is a special case of a more general problem in machine learning known as text classification. Discussion forums use text classification to determine whether comments should be flagged as inappropriate. Email software uses text classification to determine whether incoming mail is sent to the inbox or filtered into the spam folder.1

Spam email classifier

The simplest possible text classifier returns true for every input string: every string is classified as toxic (or spammy).

public static boolean classify(String text) {
    return true;
}

One way to add more nuance is by introducing conditional statements so that the behavior of the method depends on the input string.

public static boolean classify(String text) {
    if (text.toLowerCase().contains("free")) {
        if (text.toLowerCase().contains("won")) {
            ...
        } else {
            ...
        }
    } else {
        if (...) {
            ...
        } else {
            ...
        }
    }
}

But if there are infinitely many sentences, then there are also potentially infinitely many ways in which text can be interpreted as toxic (or spammy). To manage this complexity, we’ll introduce a new recursive data type known as a decision tree.

  1. Google Developers. Oct 1, 2018. Text classification. In Machine Learning Guides. https://developers.google.com/machine-learning/guides/text-classification 

Binary trees

Decision trees represent a specific application of a more general data structure known as a binary tree. If linked nodes are analogous to single structural recursion, then binary trees are analogous to multiple (degree 2) structural recursion.

Binary tree
A tree where each node has either 0, 1, or 2 children.

Tree Terminology

public class IntTree {
    private IntTreeNode overallRoot;

    private static class IntTreeNode {
        public int data;
        public IntTreeNode left;
        public IntTreeNode right;
    }
}

Like linked nodes, each IntTreeNode is recursively defined. A binary tree is either empty or non-empty.

Empty binary tree
The root IntTreeNode is null.
Non-empty binary tree
The root IntTreeNode contains some data, a left subtree, and a right subtree. Each subtree is also a binary tree.

Traversals

Since each node in a binary tree has two children, recursive traversal is the best way to explore the values in a tree. Similar to recursive linked list traversal, we’ll want to introduce a public/private pair to maintain a reference to the root of the current subtree. There are three ways of traversing a binary tree depending on when the current node is processed in comparison to its children. Consider the following IntTree.

overallRoot 17 41 9 40 29 81
Pre-order traversal
Process the current root node then process the left and right children.
public void print() {
    print(overallRoot);
}

private void print(IntTreeNode root) {
    if (root != null) {
        System.out.print(root.data + " ");
        print(root.left);
        print(root.right);
    }
}
17 41 29 9 81 40
In-order traversal
Process the left child, then process the current root node, and then process the right child.
public void print() {
    print(overallRoot);
}

private void print(IntTreeNode root) {
    if (root != null) {
        print(root.left);
        System.out.print(root.data + " ");
        print(root.right);
    }
}
29 41 17 81 9 40
Post-order traversal
Process the left and right children and then process the current root node.
public void print() {
    print(overallRoot);
}

private void print(IntTreeNode root) {
    if (root != null) {
        print(root.left);
        print(root.right);
        System.out.print(root.data + " ");
    }
}
29 41 81 40 9 17