Handout P6

Contents:

Getting Started

As usual, the source code that you need can be found by updating your cse331 directory from SVN.

Introduction

In this problem set you will assemble the modules you developed in problem sets 2 through 4, as well as some additional modules, to build Husky Maps, a Google Maps-like system for obtaining directions between street addresses. Husky Maps obtains its data from Tiger databases provided by the U.S. Census bureau (the 2006 databases are made publicly available here). It has both textual and graphical user interfaces, plus a programmatic interface, which enables it to be used by other programs.

Although you will be reusing code from other problem sets, you should go about your design as if you were developing Husky Maps from scratch, then reuse as appropriate.

You are not required to use the modules you developed previously. If you would like to, you may start from a clean slate and develop Husky Maps from scratch. (This is probably not a good idea, but you can consider it.) If you choose to do this, we urge you to discuss your new design with your TA; getting feedback on your ideas is essential to doing quality work and avoiding mistakes.

Even though you have done most of the work needed for the Husky Maps system in previous problem sets, you may still wish to improve on it, given the expertise and understanding you have acquired during the quarter. If you chose representations of abstract data types that are not efficient enough, you may need to rework them. The representation you chose for Graph, for example, may not scale to large graphs.

In this document we will say "Husky Maps does such and such" as shorthand for "When you finish, Husky Maps will do such and such."

The Problem

Husky Maps takes as input two addresses — a starting address and a destination address — each consisting of an address number, street name, and zipcode (e.g., 4114 University Way NE 98105). If there is a path from the start to the destination, Husky Maps gives directions; if not, it reports that no path between the two addresses exists. Husky Maps does not account for one-way streets, because such information is not available in the Tiger database. The database is from 2006, so don't be surprised if there are occasional discrepancies between it and reality.

On startup, Husky Maps reads a collection of database files containing the map information, and optionally a set of zipcodes to which searches will be confined. The course staff provides files in three directories (all subdirectories of /cse/courses/cse331/tigerdb), giving several different databases; including tiny (a small database that includes only Island County), small (Snohomish County), and medium (King County). There are other databases available; look in the /cse/courses/cse331/tigerdb directory to find them. The staff also provides a StreetSegReader class that reads a Tiger database and provides an Iterator of StreetSegments.

Text-Based Interface (15 points)

The class ps6.TextUI implements a text-based interface (also known as a command line interface or CLI). When invoked from the command line as:

java ps6.TextUI database-directory zipcode1 zipcode2 ... zipcodeN

with the first argument giving the database directory, and additional arguments giving zero or more zipcodes, the main method loads all the database files from the specified directory with the file extension .zip, using only street segments that are in one of the given zipcodes. If no zipcodes are given as arguments, there is no filtering, and all street segments are included. (Some street segments in the database are missing (both) zipcodes; these should always be included. See specs for DirectionsFinder. There may be other issues in the exact definition of filtering that you should resolve in your problem analysis). If the attempt to load fails (say, because the path given was invalid), Husky Maps prints:

Database error

and immediately exits. Otherwise, it enters a user interaction loop.

The user interaction loop queries the user for two addresses, outputs directions for traveling between those addresses, and then repeats. The address queries have the following format:

starting number? user-input
starting street? user-input
starting zipcode? user-input
destination number? user-input
destination street? user-input
destination zipcode? user-input
walking or driving [w/d]? user-input

If the starting number is -1, then the program exits immediately, without requesting a starting street, starting zipcode, or any destination information.

IMPORTANT NOTE: You should not cause your program to exit by invoking System.exit(). Instead, you should cause your program to terminate by breaking out of the interactive loop with return or break. If you fail to do so and invoke System.exit(), some of your JUnit tests will fail to run.

All seven inputs should be read before any other checking is performed.

If the starting zipcode does not appear in the database, the directions output should be:

No such zipcode: number street zipcode

Otherwise, if the starting street does not appear in the specified zipcode, the directions output should be:

No such street: number street zipcode

Otherwise, if the starting number does not appear on the street in the specified zipcode, the output should be:

No such number: number street zipcode

Otherwise, the second zipcode, street, and number should be checked, with error output as specified above.

If the input for type of directions is not 'w' or 'd', the output should be:

Invalid direction type: direction-type

If there is no way to get from the first address to the second, the directions output should be:

No path from number street zipcode to number street zipcode

Essentially, the text UI outputs the first (and only the first) of these error messages that applies if there is an error. In the case of an error, the text UI should not exit, but should instead reenter the interactive loop from the beginning.

Otherwise, the directions output should be as follows:

For driving directions:

Start at number street zipcode
directions-line
...
number street zipcode is on your left/right
Trip length: n.n miles

For walking directions:

Start at number street zipcode
directions-line
...
number street zipcode is on your left/right
Trip time: n minutes

The format of each directions-line (of which there is one or more) is as specified by DrivingRouteFormatter or WalkingRouteFormatter depending on the direction type the user has chosen. The turn specified in the first directions line is always either a "left" or a "right" (not "slight left", etc.), assuming that travel begins at the starting address facing the street; likewise, the last line will state that the destination is on the left or the right as you are traveling towards the destination on the prescribed path. The trip length should be given to the nearest tenth of a mile. When reporting trip time, you should round the time to the nearest minute. For walking, assume a speed of 20 minutes per mile.

Please note that the case where both the starting and the destination addresses are on the same street segment warrants special attention for both the first and last directions line. We do not specify the appropriate directions for the special case when the start and end addresses have identical fractional distances along the segment and you are free to print anything reasonable.

If ps6.TextUI is invoked as specified above, it should not produce any other output to System.out, System.err, or elsewhere. You may implement other behaviors, such as debugging output or more detailed directions, by interpreting command-line flags that turn on such behavior, by providing other classes with main methods, by turning off all debugging output before submitting your solution, or by some other mechanism. IMPORTANT: The default behavior of TextUI needs to conform to the above specification exactly in order to pass automated testing.

We have provided a partially implemented TextUI for you to complete and also provided you with two files textui.test and textui.expected for checking your output format. To test your TextUI, change directory to <YourWorkspace>/cse331/bin/ and run:

 java ps6.TextUI /cse/courses/cse331/tigerdb/tiny < ../src/ps6/test/textui.test > ../src/ps6/test/textui.actual 

Your goal is to have textui.actual printed in the same format as textui.expected.

Programmatic Interface (15 points)

In this problem set, the choice of internal components (classes and interfaces) and their specifications is up to you. However, in order to allow us to test your system, we require that it provide a programmatic interface: a set of classes that allows Husky Maps to be executed from another program. Providing such an interface is essential anyway for cleanly separating the user interface from the rest of the system, and it allows you to do your own testing more easily.

See the specifications of the required programmatic interface.

You may change the specs for any of the classes that you defined in the previous problem sets, for example, in your Graph ADT (Problem Set 3) or RoutePath (Problem Set 4). However, you must conform to the staff-provided requirements. For example, you may not modify the specifications of the fully specified classes in Problem Set 2, i.e., RouteFormatter, DrivingRouteFormatter and WalkingRouteFormatter, and some types and methods were also specified in later problem sets.

Reading the Tiger Database

StreetSegReader

The staff-provided StreetSegReader class can be used to read street segments from Tiger database directories. Please read the specifications for StreetSegReader carefully.

The specification of StreetSegReader refers to a Tiger Database Directory. We have provided a collection of directories holding Tiger Databases for you to use, which are located in the /cse/courses/cse331/tigerdb/ directory. Some of the databases are very large; thus, we have organized some of the directories by size (large, medium, small, tiny). This way you can experiment using a small database and then work your way up to larger ones.

Some properties of the collections of GeoSegments you have worked with in the past will not hold with the StreetSegments produced from the Tiger Database. For example, StreetSegments often share the same name (as opposed to having one GeoSegment named Stevens1, the next Stevens2, and so on). Many StreetSegments in the Tiger Database don't even have a name associated with them! (In that case, the name will be the empty string, StreetSegReader will automatically assign the String "(unnamed street)" to it as the street name; users cannot look up addresses on such a street.) So be careful in what you assume when working with the Tiger Database. Also, you want to avoid tests that involve segments in the Duplicates List.

Please note that street numbers are often not unique for a given street. Hence, you will need to take the zipcode into account when you check if a street segment contains an address to make sure that you get the right segment.

Segment list

Every database directory contains a file segments.txt.gz that lists the street segments present and their number ranges and zipcodes. You may find this useful in generating queries and to help you with debugging. Note: there can be many lines with the same street name and zipcodes. Since this file is meant to be more of assistance to you than necessary for the problem set, only use it if it helps. Under no circumstances should segments.txt.gz, segments.txt, or any TigerDB zip file be placed in your repository.

Each line of segments.txt.gz takes the form:

streetName startPt endPt length [leftZip] [rightZip] [leftNumbers] [rightNumbers]

where the items in square brackets are optional.

Do not attempt to use segments.txt to load the database; that is what the StreetSegReader is for. These files are rather large, so a quick way to filter out the StreetSegments of interest is to use the grep command. For example, if you want a listing of the streets with the name "University Way NE" in /cse/courses/cse331/tigerdb/medium, you can try:

zcat /cse/courses/cse331/tigerdb/medium/segments.txt.gz | grep "University Way NE"

(As an alternative to zcat, you can use gunzip to temporarily create flat text files.)

Here are some addresses of popular locations that you may want to test in queries (they are also listed in addresses.txt):

Shultzy's Sausage         : 4114        University Way NE       98105
Sketchy Safeway           : 4732        Brooklyn Ave NE         98105
Top Pot Doughnuts         : 6855        35th Ave NE             98115
U-District Farmers Market : 4519        University Way NE       98105
Space Needle              : 203         6th Ave N               98109
Seattle Art Museum        : 1300        1st Ave                 98101
Trabant Coffee & Chai     : 1309        NE 45th St              98105
Dick's Drive-in           : 115         Broadway E              98102

Duplicates list

Each directory in /cse/courses/cse331/tigerdb contains a killfile.txt file which may be helpful in identifying sets of StreetSegments which have the same end points, but which are not equal. Lines starting with WARNING indicate segments which meet these warning criteria. Running tests which use any of the warning streets in killfile.txt may lead to results which are not consistent over time. The StreetSegmentReader will also (by default) implicitly remove StreetSegments marked KILL in the killfile.

Note: at this time, the kill files are empty. We will update the kill files as students and staff find and report problems.

Testing

Validate

The ant validate command uses your test suite, in SpecificationTests.java . We supply you with basic tests including the following tests, which use the tiny database:

From: 950 NW 2nd Ave 98277
To: 473 SW Fairhaven Dr 98277

Start at 950 NW 2nd Ave 98277
Turn left onto NW 2nd Ave and walk for 4 minutes.
Turn right onto NW Fairhaven Dr and walk for 2 minutes.
Turn slight right onto SW Fairhaven Dr and walk for 7 minutes.
473 SW Fairhaven Dr 98277 is on your right
Trip time: 13 minutes

From: 841 SE Pioneer Way 98277
To: 1403 Monroe Landing Rd 98277

Start at 841 SE Pioneer Way 98277
Turn left onto SE Pioneer Way and go 0.5 miles.
Continue onto W Pioneer Way and go 0.6 miles.
Turn slight left onto State Route 20 and go 1.6 miles.
Turn left onto Monroe Landing Rd and go 0.2 miles.
1403 Monroe Landing Rd 98277 is on your right
Trip length: 2.9 miles

From: 2430 Fairway Ln 98277
To: 1614 Swantown Rd 98277

Start at 2430 Fairway Ln 98277
Turn left onto Fairway Ln and go 0.2 miles.
Turn sharp left onto Swantown Rd and go 1.1 miles.
1614 Swantown Rd 98277 is on your right
Trip length: 1.3 miles

The fact that we supply these few tests does not mean you should not supply more of your own tests in SpecificationTests, however.

Testing Strategy (15 points)

Testing Husky Maps will involve both unit testing for the new modules and also integration testing to confirm that all the modules work together to produce the correct results, i.e., return the correct directions. You already have some experience with both unit testing and integration testing from previous problem sets. Unit testing involves writing JUnit tests that verify the correctness of the methods in each module and you should do this systematically.

For integration testing, your main goal is to check that Husky Maps returns the correct directions for the route between two end points. You would want to be systematic about this and divide the testing domain into subdomains; for example, you may wish to test routes that involve end points on the same segment, end points on different segments and also testing the system on end points that are of increasing distances apart, etc.

The main difficulty that you would encounter in coming up with test cases is that it is not easy to determine what the correct answers should be. One way to construct test cases is to use information from the segments.txt.gz files for simple cases involving short routes. For longer routes, it would probably be easier to construct test cases based on landmarks that are familiar to you. Google Maps is another way to sanity-check your results. However, Google Maps uses slightly different street data than the 2006 Tiger database that we are using for thi problem set, and it is always possible that there is an error in one or both of the datasets.

To test TextUI, you can either use the provided testing infrastructure or create test files similar to textui.test (so that you don't have to type the same input commands into TextUI repeatedly) and run them from the command line.

To use our testing infrastructure, you can create a class that stores your test queries similar to ValidateQueries. Add your test cases by creating appropriate TestRecords. The programmatic test suite is designed to only load a database once per run of the test suite, whereas the UI tests will reload the database for each test. Therefore, you may find it inefficient to perform UI tests on databases other than the tiny database.

Requirements

Your implementation should have reasonable performance. We specify the performance requirements here in terms of some locations whose addresses are given in the section on Database Contents below.

On an instructional workstation, using a 128-megabyte Java heap, your implementation should meet the following performance requirements (on other faster platforms, you should do even better; these are only the minimal requirements):

98105 98115
Shultzy's -> Top Pot Doughnuts : 10 seconds

These are conservative numbers. A program that is carefully designed and uses good representations of the datatypes you built in problem sets 2 to 4 should run an order of magnitude faster. Don't be surprised if a query is fast in one direction but slow in another. For your convenience, we have included files with these searches in the same format as the textui.test in the tests directory.

If you find that your implementation performs badly, don't start optimizing all over the place. If you do so, you're unlikely to improve the performance dramatically, and you'll probably introduce bugs into your code. Instead, you should figure out where the bottlenecks are, and develop a strategy for handling them. Read our advice on representations below. Remember, after you implement improvements to your code, make sure to run regression tests.

Graphical User Interface

Prototyping a GUI (23 points)

In Problem Set 7, you will build a Graphical User Interface (GUI) for Husky Maps. In this problem we ask you to put some preliminary thought into the design of that interface. Specifically, we want you to construct a paper mockup of the GUI you plan to build. You can use paper and pencil, or any drawing software you desire. However, you must add a file called gui.pdf to your answers/ folder and add it to SVN. This file must be the size of a sheet of paper, must be in portrait format, and must not be larger than 500 kilobytes.

You should also describe your GUI in a file called gui.txt. Please discuss at least 2 important design decisions you made, as well as at least 2 alternative decisions you considered and why your design is superior. Try to be concise. Place this file in your answers/ folder and add it to SVN.

Husky Maps takes as input two addresses — a starting address and a destination address — each consisting of a building number, street name, and zipcode (e.g., 1300 1st Ave 98101). If there is a path from the start to the destination, Husky Maps gives directions; if not, it reports that no path between the two addresses exists. On startup, Husky Maps reads a collection of database files containing the map information (it reads these files from a directory), and optionally a set of zipcodes to which searches will be confined.

Your GUI should provide at least the following functionality:

There exists a great deal of freedom in the design of any graphical user interface. While building your paper mock up, please try to adhere to the guidelines presented in the relevant usability lectures. Try to keep your GUI simple. Your TA will be providing you with feedback about your GUI that you can use in order to update your design before Problem Set 7.

Hints

Strategy

One possible approach to designing and implementing Husky Maps is:

Graph Construction

The main functionality of StreetSegReader is to provide you with an Iterator of StreetSegments. You will have to determine on the connectivity between StreetSegments based on the coordinates of their end points. It is quite common for several StreetSegments to meet at the same point (i.e. at intersections).

The StreetSegReader provides no one-way street information, and it only produces a single StreetSegment object for each segment of a street in the database. Also, because your implementation of Graph in Problem Set 3 is directed, you may need to generate a reverse segment for each segment returned by StreetSegReader when you create the network graph.

Efficiency

In general, a focus on correctness is far more important than a focus on performance, especially when only minor efficiency improvements are at stake. However, dramatic, order-of-magnitude performance improvements may be worth making even before your code is fully debugged, because they may make debugging faster and easier, thus contributing to the correctness goal. For example, if your code is taking minutes to load the tiny database — something you may do a lot during testing — then your testing will be slow and unpleasant. Reducing that time to a fraction of a second will make your debugging more effective.

Choice of Representations

The representation of an abstract type can dramatically affect performance and scalability. Even with correct implementations of the modules you built in the earlier problem sets, you may be unable to assemble a Husky Maps program that can load the entire database and respond to queries in a reasonable time. Here are some hints to help you identify where you might have problems, and how to overcome them.

Hash Functions

Objects that are used as keys in hash tables need reasonable hash functions to prevent collisions. In the worst case (for example, a hash function that always returns the same value), lookups in a hash table degenerate from unit cost to linear cost. Advice on constructing reasonable hash functions can be found in the Effective Java text.

In short, there are three properties which a good hash function should have:

  1. It must obey the hash code invariant. a.equals(b) ⇒ a.hashCode()==b.hashCode().
  2. It is nice if it has good coverage. a.hashCode()==b.hashCode() usually means a.equals(b).
  3. It is important that hashCode be fast. You may want to consider storing the hash code of an object in a private field so your hashCode method can simply return it.

Make sure that hashCode() for GeoPoint, GeoSegment and StreetSegment are implemented efficiently.

Turning off checkRep()

If your implementation seems to take forever to load the tiny database, consider turning off (commenting out or short-circuiting with an if statement) checkRep() in your Graph implementation. You can also disable assertions via java -da. However, please ensure that Graph is mostly debugged before you do so.

Java Heap Size

You will may need to increase the Java heap size in order to load the entire medium database and handle queries on it. Increasing the maximum Java heap size can be done in the JDK with the command line argument -Xmxsize. We recommend a 128-megabyte heap for Husky Maps; thus the argument would be -Xmx128M. Run java -X for more information on this and other non-standard options to the Java virtual machine.

To run your program with a larger than default heap size in Eclipse, instead of simply right-clicking on TextUI and choosing Run As -> Java Application, choose Run As -> Run.... In the pop-up, click on the Arguments tab and type in -Xmxsize into the VM Arguments textArea.

Fractional Segments

Your Husky Maps program as submitted for grading must assume that the entire length of both the start and finish street segments are traversed.

This is not a realistic assumption, and you may optionally account for traversing only part of a segment (for extra credit). Consider traveling from the Apartment to the Coffeehouse:

A graphic demonstrating need for precision at start and end street segments

If your path length included all (or none) of the segments of Penny Lane and Broadway between Abbey Road and Side St, then you may incorrectly report the route that goes left onto Penny Lane, right onto Side St, and right onto Broadway, rather than the shorter route which goes right onto Penny Lane, left onto Abbey Road, and left onto Broadway.

If you do implement fractional segments, make sure that it is not turned on by default. Otherwise, because the auto-grader currently assumes that fractional segments are not implemented, your program will fail all the auto-graded tests. Instead, you should explain in your documentation how your TA can activate and test your fractional segment feature.

Multiple shortest paths

It is possible, though rare, for there to be multiple shortest paths between two locations. This can happen for two separate reasons:

Your code is required to return a shortest path, but which shortest path it returns is not specified. The test cases used for validate and for grading your solution have unique shortest paths, so you should not need to worry about this issue.

However, you may find that running your own route-finding queries on different JVMs may produce different paths (of the same length), because of differences in the implementation of iterators on the different JVMs. If this happens (and only if this happens), you should investigate to determine whether the two paths are really of equal length, or there is a bug in your implementation which causes the different results.

Here are three approaches to debugging such a problem:

Miscellaneous

There are two ways to leave an address: by initially turning right, or by initially turning left. Similarly, there are two ways to approach an address: with the destination on your right, or the destination on your left. So, when DirectionsFinder.getDirections uses the PathFinder you created in Problem Set 3 to search for the shortest path, the start and goal sets should each contain both the start and end segments and their corresponding reverse segments.

You should remove trailing white spaces from the user input (tip: String.trim() will do this). You can also use String.replaceAll( "\n", "") to strip newline characters, if necessary, to meet the specifications for Directions. Although you are not allowed to change the interface for RouteFormatter, you can convert the String returned by RouteFormatter into lines of directions using String.split("\n").

Zipcode filtering is meant to be easy: you have a set of zipcodes from user input and StreetSegmentReader provides you with an Iterator of StreetSegments. For each segment, you use it to build your graph if its zipcode is in the zipcode set; if not, you ignore it. Another possible approach is to create a new ZipcodeFilter (which implements ps6.StreetSegmentFilter) and modify StreetSegReader so that it returns an iterator of only the segments you need. The latter is likely to be significantly more complicated.

The start and destination addresses may appear on the same street; they may even appear on the same street segment. Husky Maps should still produce correct output. In the second case, there would be only one directions-line (as described in the directions section). The trip does not contain any U-turns and the trip distance is the entire length of the street segment. Keep in mind that Husky Maps assumes buildings are evenly spaced apart along a street segment and think about when you might need to use orderStatistic from StreetNumberSet.

The Java platform has a relaxed specification for the behavior of floating point operations. This should not affect you in most cases (it is one reasons you were told to use ints to represent latitude and longitude), but can possibly cause your code to arrive at different numeric results depending on the platform you are using. It can also, in certain cases, cause your program to arrive at different results over a single run of your code. To avoid all of these problems use the strictfp keyword in front of class in all classes which perform floating point arithmetic; for instance:

public strictfp class GeoSegment {
  ...
}

Collaboration (1 point)

Please answer the following questions in a file named collaboration.txt in your answers/ directory.

State whether or not you collaborated with other students. If you did collaborate with other students, put their names and a brief description of how you collaborated.

Reflection (1 point)

Please answer the following questions in a file named reflection.txt in your answers/ directory:

  1. How many hours did you spend on the following components of this problem set? (Only include time when you are completely focused on CSE 331. For example, if you worked an hour in your noisy dorm lounge, or you were periodically reading email or IMs, don't count that hour.)
  2. In retrospect, what could you have done better to reduce the time you spent solving this problem set?
  3. What could the CSE 331 staff have done better to improve your learning experience in this problem set?
  4. What do you know now that you wish you had known before beginning the problem set?

Returnin (25 points)

After you receive your corrected problem set back from your TA, you will have until March 11, 2011 to resubmit it electronically. The procedure for this is detailed in the General Information document.

You will only receive credit if you pass all the tests, and you have a fixed number of trials without penalty. See the returnin331 documentation for details.

What to Turnin

The following files should be checked into SVN before you run ant validate:

Provided Classes

Take a look at the Javadoc specifications for the classes provided.

Hints

If you're having trouble, the forums are a good place to look for help.

Errata

Since this problem set was released, the location of the textui.test and textui.expected files has changed, as has the command suggested above for using them.

Q & A

Q: Where do I begin?

Start by reading over the whole assignment. There are only two major classes to implement in the problem set: TextUI and DirectionsFinder. However, in order to implement these classes, you may have to create other classes (e.g. a class that implements the Directions interface).

Q: How do I run the test suite on Windows or at home?

Change the Tiger database path in test/TestRecord.java. For example, if you are using Windows in the instructional lab, set the path as follows:

  private static final String tigerBaseDir = "O:/cse/courses/cse331/tigerdb";

Remember to change the path back to its original value before running ant validate and submitting your assignment!

If you wish to run the tests at home, you will need to copy the Tiger database to your home computer. Don't check it into your Subversion repository, because it is extremely large (which is why the staff didn't put it there to begin with).