Link

Point Sets

Composing and testing a complex data structure from scratch.1

Learning Goals

  • Compose algorithms via iterative improvement.

    Algorithms can be tricky to implement. One way we can manage the complexity of an algorithm is to develop it in stages, starting from something simple that works and then progressing to more and more complex optimizations while ensuring the program works at every step along the way. This way, we can maintain correctness as we add complexity to our programs.

  • Encapsulate complex behavior beneath a simple interface.

    The best abstractions hide complex behavior beneath a simple interface. Programs often gradually grow more and more complex, so a key challenge of implementing complex data structures is in managing this complexity while maintaining the illusion of a simple interface.

Table of contents

  1. Getting Started
  2. Point
  3. PointSet API
  4. NaivePointSet
  5. KDTreePointSet
  6. Testing
  7. Optional Extension: Boid Simulation
  8. Submission

Getting Started

Pull the skeleton repository to get the pointset assignment.

Point

We provide a class, Point, that represents a location with an x and y coordinate. You may not modify this class. The key methods in this class are:

  • public double x(): Returns the x coordinate value.
  • public double y(): Returns the y coordinate value.
  • public double distanceSquaredTo(Point p): Returns the squared Euclidean distance between two points.
  • public double distanceSquaredTo(double px, double py): Returns the squared Euclidean distance between two points.
Note
The distance methods returns squared distances, not the actual distance.

PointSet API

We also provide an interface called PointSet that represents a set of such points. It has only one method.

  • public Point nearest(double x, double y): Returns the point in the set nearest to the given (x, y) coordinate.

The PointSet will be able to search through the list of Point objects stored, finding the point closest to a given coordinate. In HuskyMaps, we’ll be using this to find the roadway that is nearest to the user’s mouse-click location.

Point p1 = new Point(1.1, 2.2); // Point with x = 1.1, y = 2.2
Point p2 = new Point(3.3, 4.4);
Point p3 = new Point(-2.9, 4.2);

PointSet nn = new NaivePointSet(List.of(p1, p2, p3));
int x = 3.0, y = 4.0;           // Mouse-click at (3, 4)
Point ret = nn.nearest(x, y);   // ret == p2
ret.getX();                     // Evaluates to 3.3
ret.getY();                     // Evaluates to 4.4

NaivePointSet

Before you create your complex-but-fast KDTreePointSet class, you will first create a simple-but-slow linear-time implementation of the PointSet interface. The goal of creating this class is to provide an alternative, albeit slower, solution that you can use to easily verify if the result of your k-d tree’s nearest is correct.

Note that your NaivePointSet class should be immutable, meaning that it should be impossible to add or or remove points from it—even if the user modifies the constructor’s input list of points after after instantiating the PointSet. (Since the Point class is immutable, this is as simple as making a defensive, shallow copy of the list.)

NaivePointSet will only be tested for correctness, not efficiency.

For reference, our solution for this class is 32 lines total from top to bottom. You should not spend significantly more than half an hour to complete this part.

KDTreePointSet

Now, write the KDTreePointSet class with the same constructor and method signature as NaivePointSet. Your k-d tree should guarantee O(log N) access time on real-world data. You don’t need to worry about the absolute worst-case runtime.

As with NaivePointSet, your KDTreePointSet class should be immutable. You may assume that the input to the constructor has random ordering (i.e., is not sorted in any way). While k-d trees can theoretically work for any number of dimensions, your implementation will only work for the 2-dimensional case when our points have only x and y coordinates.

Tips

For implementing nearest, we recommend that you write a simple version and verify that it works before adding in the various optimizations from the pseudocode. For example, you might start by writing a nearest method that recursively traverses the entire tree in Ω(N) time. Then, you might add code so that it always visits the closer side before checking the further side (changing traversal order without improving runtime). Then, after verifying that this works, you could finally implement the optimization of only traversing the farther side if necessary, bringing the runtime down to O(log N). By doing things this way, we can test and iterate on our program one idea one at a time instead of expecting everything to work immediately.

Commit frequently. Don’t be afraid to throw away your solution and start from a previous commit (or from scratch) if you feel that debugging is not getting anywhere. Restarting gives us a fresh perspective, but we get to keep all of our experiences with developing the program that we picked along the way so that our next try will be more robust. Our solution to the nearest method is about 30 lines of code, though it’s possible to efficiently solve the problem in fewer lines.

Testing

We will not provide any skeleton tests nor autograder messages (beyond a basic sanity check) for this homework. You will be responsible for writing your own tests and ensuring the correctness of your code. While you need to submit your tests to the autograder, they won’t be graded.

There are a number of different ways that you can construct tests for this part of the project.

  1. Use the provided NearestNeighborVisualizer with the debugger.
  2. Write hand-crafted unit tests with a few example points.
  3. Write randomized tests.

To use the NearestNeighborVisualizer, we need to pass a filename to the main method. Modify the Run Configuration and add a path like data/points/circle10.txt to the Program Arguments field.

Our suggestion is to use randomized testing which will allow you to test your code on a large sample of points. This should encompass most, if not all, of the edge cases you might run into. With randomized tests, you can generate a large number of random doubles to construct random points to add to your tree, as well as points to query nearest. To verify the correctness of the results, compare the results of KDTreePointSet.nearest to the results to NaivePointSet.nearest. If our implementation can pass a very large number of randomly-generated scenarios, we can have some confidence in the correctness of our data structure and its algorithms.

Optional Extension: Boid Simulation

Implement the KDTreePointMap class in the boids subpackage. The nearest method should return the k-nearest points efficiently. This will allow you to run the BoidSimulator.2 Behold their flocking majesty.

Submission

Commit and push your changes to GitLab before submitting your homework to Gradescope.

  1. Josh Hug. 2019. Project 2AB: Extrinsic PQ and KDTree. In CS 61B: Data Structures, Spring 2019. https://sp19.datastructur.es/materials/proj/proj2ab/proj2ab 

  2. Josh Hug and Evan Sparano. 2013. Boid Simulator. In COS 226, Fall 2019. https://www.cs.princeton.edu/courses/archive/fall19/cos226/assignments/kdtree/specification.php