Link

Autocomplete

Find all terms beginning with a given prefix, sorted by weight.1

Learning Goals

  • Develop Java programs independently.

    One of the themes for the following assignments is not only writing your own tests, but also writing your own programs. In previous homeworks, we provided significant guidance to help you identify what subproblems to solve and how to solve them. In this homework, we will fade away some of this scaffolding and encourage greater independence.

  • Practice reading through an existing codebase.

    Future assignments (and other future programming endeavors) often start from some pre-existing codebase—it’s very rare that you start any real-world project completely from scratch. To provide context for the interface that you will implement in this homework, we’ve included a small part of the framework that we’ll use in our HuskyMaps project, along with a simple-but-inefficient implementation of the same interface.

  • Formally introduce asymptotic runtime requirements and exception handling.

    A program can be correct on all inputs, but too slow to be useful. In the previous homework, we informally introduced timing and memory tests as a means of providing feedback. For this assignment, we will enforce runtime requirements since our implementation needs to be fast if we want to use it as part of HuskyMaps. Additionally, as our programs begin to take real world inputs, they also need to handle malformed data.

Table of contents

  1. Getting Started
  2. Introduction
  3. Term
  4. ArraySearcher API
  5. LinearSearcher
  6. BinaryRangeSearcher
  7. Testing
  8. Command Line Interface
  9. Interactive GUI
  10. Submission

Getting Started

In IntelliJ, pull the skeleton repository to get the autocomplete assignment. If IntelliJ doesn’t give you the option to run tests for some reason, try looking back at the instructions from Week 2 to refresh Gradle.

Introduction

Write a program to implement autocomplete for a given set of terms, where a term is a query string and an associated non-negative weight. That is, given a prefix, find all queries that start with the given prefix, in descending order of weight.

Autocomplete is pervasive in modern applications. As the user types, the program predicts the complete query (typically a word or phrase) that the user intends to type. IMDB uses it to display the names of movies as the user types; search engines use it to display suggestions as the user enters web search queries; cell phones use it to speed up text input.

In these examples, the application predicts how likely it is that the user is typing each query and presents to the user a list of the top-matching queries, in descending order of weight. These weights are determined by historical data, such as box office revenue for movies, frequencies of search queries from other Google users, or the typing history of a cell phone user. For the purposes of this assignment, you will have access to a set of all possible queries and associated weights (and these queries and weights will not change).

The performance of autocomplete functionality is critical in many systems. For example, consider a search engine which runs an autocomplete application on a server farm. According to one study, the application has only about 50ms to return a list of suggestions for it to be useful to the user. Moreover, in principle, it must perform this computation for every keystroke typed into the search bar and for every user!

In this assignment, you will implement autocomplete by sorting the terms by query string; binary searching to find all query strings that start with a given prefix; and sorting the matching terms by weight.

Term

In the autocomplete/ folder, implement a class DefaultTerm that represents an autocomplete term—a query string and an associated integer weight. You must implement the following API, which supports comparing terms in a few different ways:

SignatureDescription
DefaultTerm(String query, long weight)Initializes a term with the given query string and weight.
String query()Returns this term’s query.
long weight()Returns this term’s weight.
int queryOrder(Term that)Compares to another term in lexicographic order by query.
int reverseWeightOrder(Term that)Compares to another term in descending order by weight.
int matchesPrefix(String prefix)Compares this term’s query to the given prefix.

See the interface’s method comments for more details on edge cases and exception handling, and see Java’s documentation for more details on the comparison methods and the int values they return.

Hint
The Java standard library has some methods that may be able to do the heavy-lifting for you. Look for methods named compare in classes like String and Long.

For testing purposes, it will also be helpful to write toString and equals methods. IntelliJ can automatically generate these methods for you through the code generation dialog (Code > Generate…), though you’ll need to first declare the class’s instance variables. Note that the generated code might not pass style checks: in this case, add additional braces or clean-up the code so that it’s presentable.

The string comparison functions must take time proportional to the number of characters needed to resolve the comparison.

ArraySearcher API

The ArraySearcher<T, U> interface allows clients to search through an array for all items that match some target pattern.

Navigate to the arrayutils folder. In this project, we’ll use this interface to search our Term objects for terms that match some String prefix, so you can think of T as Term and U as String. While two generic types might look scary at first, there will be plenty of guidance throughout the spec and the skeleton code to help you work with them.

You will develop a class that implements the ArraySearcher interface.

SignatureDescription
T[] findAllMatches(U target)Finds all items that match the target and returns an object representing them.

LinearSearcher

Read through the provided implementation LinearSearcher<T, U> of the ArraySearcher<T, U> interface. It has the following method and constructor in addition to the findAllMatches method:

SignatureDescription
public static <T, U> LinearSearcher<T, U>
forArray(T[] array, Matcher<T, U> matchUsing)
Creates a LinearSearcher for the given array of items that matches items using the Matcher matchUsing.
protected LinearSearcher(T[] array, Matcher<T, U> matcher)(Only meant to be called by forArray, to be consistent with BinaryRangeSearcher.)

When findAllMatches is called, LinearSearcher scans the entire array and collects all of the items that match target according to the matcher. We allow the use of a matcher to make our code flexible about what it considers to be equal (for example, for Seattle and Si'ahl to match the same target, even if they aren’t equal strings).

This code simple and fairly easy to test as a result, but it’s also not very fast, particularly on large arrays.

BinaryRangeSearcher

Recall that every linear search through NN items has a worst case runtime of Θ(N)\Theta(N), in which case your algorithm has to look at every element in the array. How can we improve our searcher? Turns out that if we put in the one-time effort to sort the input, we can use a binary search instead to reduce the time complexity of each query to Θ(log2(N))\Theta(\log_2(N)).

Develop a class BinaryRangeSearcher<T, U> that implements ArraySearcher<T, U> with a much faster but harder-to-verify binary search algorithm. As with LinearSearcher, your class should have the following method and constructor, and the findAllMatches method:

SignatureDescription
public static <T, U> BinaryRangeSearcher<T, U>
forUnsortedArray(T[] array, Comparator<T> sortUsing,
Matcher<T, U> matchUsing)
Creates a BinaryRangeSearcher for the given array of items that matches items using the Matcher matchUsing. First sorts the array in place using the Comparator sortUsing.
protected BinaryRangeSearcher(T[] array, Matcher<T, U> matcher)(Only meant to be called by forUnsortedArray, which has a more descriptive name.)

The method forUnsortedArray emphasizes that the class first takes in an unsorted array and sorts it according to the given Comparator. (Remember that we can’t run a binary search on an unsorted array.) Afterwards, when the findAllMatches method is called, the BinaryRangeSearcher runs two binary searches to find the first and last items that match the target.

Tip
If you don’t remember how to write code for a binary search, feel free to consult other resources (e.g., past coursework, internet). We allow you to consult online code snippets, and in exchange ask that you cite all sources that you have used as a comment.
Tip
Since you’ll be writing code to find the first and last of a sequence of items, your search method won’t be able to return early and will need to search all the way until the search interval is empty.

Your implementation must achieve each of the following runtime requirements.

  • The runtime of the constructor must be in O(N)\mathcal{O}(N), where NN is the number of items in the array.
  • The runtime of findAllMatches must be in O(logN)\mathcal{O}(\log N), where NN is the number of items in the array.

Testing

We provide tests for this assignment that cover most of the edge cases and cases where your code is expected to throw exceptions, but we don’t provide many tests that can check whether your code works as expected on regular inputs.

For LinearSearcher, the provided tests are reasonably effective, since its code is so simple; however, for BinaryRangeSearcher, you’ll probably want a larger test suite to check that all branches of your code work correctly. We’ll definitely be running additional tests on your code during grading, so try writing some of your own tests in the AbstractArraySearcherTests class. These tests won’t be graded, but should be very useful for debugging your own code. (By writing tests in the abstract test class, you can also run the same tests on LinearSearcher, which you can use to sanity check the correctness of your own tests.)

You may not share tests with any other students—all code in your possession, including tests, should be your own work. You can discuss different edge cases and share ideas about test cases, but test code falls under the same academic honesty policy as any other homework code.

Command Line Interface

The Autocomplete class provides a command line interface for a user to find prefix matches from autocomplete/data/cities.txt. Feel free to check out the code to see how your BinaryRangeSearcher gets used; or, you can run it to sanity check your code.

Interactive GUI

The AutocompleteGUI class provides a GUI for Autocomplete. It presents the top K matching terms in real time. When the user selects a term, the GUI opens up the results from a Google search for that term in a browser.

AutocompleteGUI Demo

Submission

Commit and push your changes to GitLab before submitting your homework to Gradescope.

  1. Kevin Wayne. 2016. Autocomplete-Me. In Nifty Assignments 2016. http://nifty.stanford.edu/2016/wayne-autocomplete-me/

    Matthew Drabick and Kevin Wayne. 2019. Autocomplete. In COS 226: Algorithms and Data Structures, Spring 2019. https://www.cs.princeton.edu/courses/archive/spring19/cos226/assignments/autocomplete/specification.php