HW2

Get the starter code. Be sure to get all the files in that folder. (You can click "download this folder" on gitlab.)

Due: Thursday, October 19, 11:59pm Pacific Time

In this homework, we will work with JSON data in OCaml. We will manipulate it, print it, and (optionally, as a challenge problem), parse it. We will also play with some real data.

Your solutions must use pattern-matching. You may not use the functions List.hd, List.tl, Option.is_some, Option.get, fst, snd, etc.

For any problem, if the handout asks you for a binding with a specific type, a more general type is also allowed, provided the code is functionally correct.

List order does not matter unless specifically stated in the problem.

This homework is more difficult than Homework 1. There are 26 required problems and 12 challenge problems. But several of the problems have one-line solutions.

Background

Dune

The code and directory structure for this homework are more complex than the first homework. We are using the dune build system. We will teach you how to use dune in this document. Remember to run the command from inside the hw2 folder, or else they will not work.

  • Instead of utop, you should type dune utop to run your code. Then use #use as usual.
  • dune build compiles your code into a library that you can then load from dune utop or run the tests with.
  • dune test compiles your code and runs the test file hw2test.ml.

JSON

JSON is a human-readable data serialization format, originally inspired by Javascript's Object Notation. JSON is commonly used to represent data, for example the King County Metro real-time transit data used in this assignment.

No previous experience with JSON is expected or needed, but if you are curious or confused, the ultimate reference is RFC 7159. We will make some modest simplifying assumptions about JSON, so we will not quite be RFC compliant, but we will be close.

A JSON value is one of seven things:

  • a number (floating point, e.g., 3.14 or -0.2)
  • a string (e.g., "hello")
  • false
  • true
  • null
  • an array of JSON values, written between square brackets and commas (e.g., [1, "world", null])
  • an "object", which is a sequence of field/value pairs
    • a field is always a string literal (e.g., "foo")
    • a value is any JSON value
    • objects are written with curly braces, commas, and colons, e.g., {"foo" : 3.14, "bar" : [1, "world", null], "ok" : true}

However, our OCaml code will not use this concrete JSON notation, but rather use this variant type to represent JSON values:

type json = 
| Num of float
| String of string
| False
| True
| Null
| Array of json list
| Object of (string * json) list

The data files

The bus data files are needed only for problems 21 through 26, but you may find them useful for testing other places as well. In case you are curious, the data is real and comes from King County Metro. The dataset is real-time and is the same feed used by Google Maps and OneBusAway to display route timing information. The most recent data can be viewed here. Note that this data updates every few seconds. We provide a snapshot from the past, so that we're all using the same data. This snapshot can be found in JSON format in the file complete_bus.json.

That file contains around 1000 records. We have also included small and medium subsets of the data, containing 10 and 100 records, respectively, available in the files small_bus.json and medium_bus.json.

Converting the textual representation of JSON data into the structured representation given by the OCaml type json is the problem of parsing. Parsing JSON is an interesting problem, which we encourage you to explore with the challenge problems. (They are easier than you might expect!)

To facilitate using the data without needing to parse it yourself, we have included the pre-parsed data files parsed_small_bus.ml, parsed_medium_bus.ml, and parsed_complete_bus.ml, each of which is a valid OCaml file that binds a single variable name to a value of type json.

We will explore the data further below, but for now it suffices to say that it consists of a JSON object that contains an array of vehicle records, each of which has several fields, such as the route name and number, and the position in latitude and longitude.

The Problems

Part 0: Warm-up

The first problem is about a silly, useless JSON value to get you used to working with the json type.

  1. Write a function make_silly_json that takes an int i and returns a json. The result should represent a JSON array of JSON objects where every object in the array has two fields, "n" and "b". Every object's "b" field should hold true (i.e., True). The first object in the array should have an "n" field holding the JSON number i.0 (i.e., the integer i converted to a floating point JSON number), the next object should have an "n" field holding (i - 1).0 and so on where the last object in the array has an "n" field holding 1.0.

    Sample solution is less than 10 lines.

    Write tests to convince your reader that your function is correct.

    Hint: There's a function in the OCaml standard library called float_of_int that converts an integer to a float.

    Hint: Use a helper function that does most of the work.

Part 1: Printing JSON values

  1. Write a function concat_with that takes a separator string and a list of strings, and returns the string that consists of all the strings in the list concatenated together, separated by the separator. The separator should only be between elements, not at the beginning or end.

    Sample solution is 5 lines.

    Write tests to convince your reader that your function is correct.

    Hint: Use ^ to concatenated two strings.

  2. Write a function quote_string that takes a string and returns a string that is the same except there is an additional " character at the beginning and end.

    Sample solution is 1 line.

    Write tests to convince your reader that your function is correct.

  3. Write a function string_of_json that converts a json into the proper string encoding in terms of the syntax described in the Background section.

    Sample solution is 25 lines.

    Write tests to convince your reader that your function is correct.

    Hint: The two previous problems are useful.

    Hint: Use a helper function for arrays and a helper function for objects. In both cases, create a string list that you then pass to concat_with.

    Hint: In the Num case, use the provided json_string_of_float function.

Part 2: Processing JSON values

  1. Write a function take of type int * 'a list -> 'a list that takes an int called n and a list called xs and returns the first n elements of xs in the same order as xs. You may assume that n is non-negative and does not exceed the length of xs.

    Sample solution is about 5 lines.

    Write tests to convince your reader that your function is correct.

  2. Write a function firsts of type ('a * 'b) list -> 'a list that takes a list of pairs and returns a list of all the first components of the pairs in the same order as the argument list.

    Sample solution is about 4 lines.

    Write tests to convince your reader that your function is correct.

  3. Write a comment in your file after your definition of firsts answering the following questions. Suppose xs has type (int * int) list and let n be an integer between 0 and the length of xs (inclusive), and consider the expressions

    firsts (take (n, xs))

    and

    take (n, firsts xs)

    Either (1) write one sentence explaining in informal but precise English why these two expressions always evaluate to the same value; or (2) give example values of xs and n such that the two expressions evaluate to different values. Regardless of whether you decide option (1) or option (2) is correct, also write one sentence explaining which of the two expressions above might be faster to evaluate and why.

  4. Write a function assoc of type 'a * ('a * 'b) list -> 'b option that takes two arguments k and xs. It should return Some v1 if (k1, v1) is the pair in the list closest to the beginning of the list for which k and k1 are equal. If there is no such pair, it returns None.

    Sample solution is about 3 lines.

    Write tests to convince your reader that your function is correct.

    Note: There's a function with similar functionality in the OCaml standard library, but calling it requires features we haven't covered yet. Do not use that function or you will not receive credit.

  5. Write a function dot that takes a json (call it j) and a string (call it f) and returns a json option. If j is an object that has a field name equal to the string stored in f, then return Some v where v is the contents of that field. If j is not an object or does not contain a field with name equal to f, then return None.

    Sample solution is 4 short lines thanks to an earlier problem.

    Write tests to convince your reader that your function is correct.

  6. Write a function dots that takes a json called j and a string list called fs that represents an "access path", or in other words, a list of field names. The function dots returns a json option by recursively accessing the fields in fs, starting at the beginning of the list. If any of the field accesses occur on non-objects, or to fields that do not exist, return None. Otherwise, return Some v where v is the value of the field "pointed to" by the access path.

    Sample solution is about 7 lines.

    Write tests to convince your reader that your function is correct.

    Hint: Use recursion on fs plus your solution to the previous problem.

  7. Write a function one_fields that takes a json and returns a string list. If the argument is an object, then return a list holding all of its field names (not field contents). Else return the empty list. Use a tail-recursive, locally defined helper function. The list you return can be in any order, but it is probably easiest to have the results in reverse order from how they appear in the object, and this reverse order is fine/expected. But any order is fine.

    Sample solution is about 10 lines.

    Write tests to convince your reader that your function is correct.

  8. Write a function no_repeats that takes a string list and returns a bool that is true if and only if no string appears more than once in the input. Do not (!) use any explicit recursion. Rather, use the provided helper function dedup, which returns its argument with duplicates filtered out, together with the standard library function List.length to complete this problem in one line.

    Write tests to convince your reader that your function is correct.

  9. Write a function recursive_no_field_repeats that takes a json and returns a bool that is true if and only if no object anywhere "inside" (arbitrarily nested) the json argument has repeated field names.

    If the argument does not contain any objects, the function should return true.

    Note that it is not relevant whether different objects inside the input have field names in common.

    In addition to using some of your previous functions, you will want two locally defined helper functions for processing the elements of a JSON array and the contents of a JSON object's fields. By defining these helper function locally, rather than at the top level, they can call recursive_no_field_repeats in addition to calling themselves recursively.

    Sample solution is about 15 lines.

    Write tests to convince your reader that your function is correct.

  10. Write a function count_occurrences of type

    string list * exn -> (string * int) list

    If the string list argument is sorted (using OCaml's built-in comparison operator, <), then the function should return a list where each string is paired with the number of times it occurs. (The order of the output list does not matter.)

    If the list is not sorted, then raise the exn argument.

    Your implementation should make a single pass over the string list argument, primarily using a tail-recursive helper function. You will want the helper function to take a few arguments, including the "current" string and its "current" count.

    Sample solution is about 12 lines.

    Write tests to convince your reader that your function is correct.

  11. Write a function string_values_for_access_path of type

    (string list) * (json list) -> string list.

    (The parentheses in this type are not required, but we include them for clarity; the REPL will not print them.) For any object in the json list that has a field available via the given "access path" (string list) and has contents that are a JSON string (e.g., String "hi") put the contents string (e.g., "hi") in the output list. Order of the output list does not matter. The output should have duplicates where appropriate.

    Sample solution is 6 lines thanks to dots.

    Write tests to convince your reader that your function is correct.

  12. Write a function filter_access_path_value of type

    string list * string * json list -> json list.

    The output should be a subset of the third argument, containing exactly those elements of the input list that have a field available via the given access path, and that field's contents are a JSON string equal to the second argument.

    Sample solution uses dots and is less than 10 lines.

    Write tests to convince your reader that your function is correct.

  13. Some of the bus data uses latitude and longitude positions to describe the location of vehicles in real time. To narrow our focus onto a particular geographical area, we will use the record types rect and point, which represent a rectangle and a point, respectively. The types are defined in the starter code, but copied here for completeness.

    type rect = { min_latitude: float; max_latitude: float;
                  min_longitude: float; max_longitude: float }
    type point = { latitude: float; longitude: float }
    

    Write a function in_rect of type rect * point -> bool that determines whether a given point falls inside a given rectangle (inclusive).

    Solution is 2 lines and uses a lot of conjunction (&&).

    Write tests to convince your reader that your function is correct.

  14. Write a function point_of_json of type json -> point option. If the argument is a json object that contains fields named "latitude" and "longitude", both of which are json numbers, then point_of_json returns Some p where p is the point represented by these coordinates. Otherwise, it returns None.

    Solution is 5 lines and uses dot and nested patterns.

    Write tests to convince your reader that your function is correct.

  15. Write a function filter_access_path_in_rect of type

    string list * rect * json list -> json list.

    The output should be a subset of the third argument, containing exactly those elements of the input list that (1) have a field available via the given access path, (2) that field's contents are a JSON object that can be converted to a point via point_of_json, and (3) the resulting point is within the rectangle specified by the second argument.

    Sample solution is less than 15 lines.

    Write tests to convince your reader that your function is correct.

  16. After your definition of filter_access_path_in_rect, write a comment containing 1-3 sentences describing the similarities with filter_access_path_value. Can you think of any way to refactor these two function to use a common, more general function? (Do not actually do this refactoring.) On a scale from 1 to 10, how annoyed are you about having to write both of these functions on the same homework?

Part 3: Analyzing the data

Finally, we will use the functions you wrote in the part 2 to create some variable bindings to help us analyze the JSON data. We will analyze the complete subset of the data stored in complete_bus_positions_list, which you will need to uncomment in the provided code. All of the problems in part 3 have one-line solutions and use one or more functions defined in previous parts.

No tests are required for this part.

  1. Bind to the variable route_histogram a histogram (using the provided histogram_for_access_path) of the objects in complete_bus_positions_list based on the "route_num" field.

    Hint: Since this field is nested inside several objects, you need to look at the data to figure out the rest of the access path that comes before the "route_num" field.

  2. Bind to the variable top_three_routes a list containing the three most frequently appearing route numbers in route_histogram.

    Hint: Use take and another function from part 2.

  3. Bind to the variable buses_in_ud a list containing all records referring to buses in the U district. For the purposes of this problem, the U district is defined by the rectangle bound to the variable u_district in the provided code.

  4. Bind to the variable ud_route_histogram a histogram of the objects in buses_in_ud based on the "route_num" field, as in problem 21.

  5. Bind to the variable top_three_ud_routes a list containing the three most frequently appearing route numbers in ud_route_histogram, as in problem 22.

  6. Bind to the variable all_fourty_fours a list containing all records from complete_bus_positions_list that refer to a vehicle whose (suitably nested, as in problem 21) "route_num" field is "44".

Challenge problems

The challenge problems build a parser for JSON. The are described in hw2challenge.ml.

Summary

Evaluating a correct homework solution should generate these bindings or more general types in addition to the bindings from the provided code.

val make_silly_json : int -> json 
val concat_with : string * string list -> string 
val quote_string : string -> string 
val string_of_json : json -> string 
val take : int * 'a list -> 'a list 
val firsts : ('a * 'b) list -> 'a list 
val assoc : 'a * ('a * 'b) list -> 'b option 
val dot : json * string -> json option 
val dots : json * string list -> json option 
val one_fields : json -> string list 
val no_repeats : 'a list -> bool 
val recursive_no_field_repeats : json -> bool 
val count_occurrences : string list * exn -> (string * int) list 
val string_values_for_access_path : string list * json list -> string list 
val filter_access_path_value : string list * string * json list -> json list 
val in_rect : rect * point -> bool 
val point_of_json : json -> point option 
val filter_access_path_in_rect : string list * rect * json list -> json list
val route_histogram : (string * int) list
val top_three_routes : string list
val buses_in_ud : json list
val ud_route_histogram : (string * int) list
val top_three_ud_routes : string list
val all_fourty_fours : json list

Of course, generating these bindings does not guarantee that your solutions are correct.

Put your solutions to the problems in hw2.ml. Put your tests in hw2test.ml. If you do the challenge problems, put your solutions to those in hw2challenge.ml. Make sure that running both dune build and dune test work. Upload these files to Gradescope.