|
CSE Home | About Us | Search | Contact Info |
|
Homework #2out: Monday April 16th, 2012due: Thursday May 3rd, 2012 by 9:00pm. [ summary | part a | part b | part c | how to submit | grading ]
Homework #2 has several parts to it. In part A, you will read through our (somewhat complex) code for JSON parsing and complete it. In part B, you will complete our code for interacting with Twitter through their search API. In part C, you will write a simple search program that retrieves search results and creates a JavaScript-based word cloud from them for display in a browser. Here is the code you should download that you'll modify for this assignment. As a very first step, once you've untar'ed the code, take a look in the libhw1/ directory. We've provided you with our hw1 library (libhw1.a) in case yours has bugs in it. But, feel free to replace this file with your own libhw1.a if you want to build on top of your solution rather than ours.
Background on JSON. When programs need to communicate with each other over networks, they have to agree on a format for any data that they exchange with each other. JavaScript Object Notation (JSON) is a very simple data exchange format used by most public Web APIs, including Twitter. JSON is able to represent rich, nested structures that comprise strings, numbers, booleans, arrays, and maps.
Here is a simple example of some data encoded in JSON:
Diagrammatically, this JSON structure looks like this:
As you can see, a piece of JSON is described by a tree, where each tree element is one of several base JSON values. These base JSON values are:
Our JSON parsing implementation We have implemented most of a JSON parser for you. The goal of the JSON parser is three-fold. First, it takes a piece of text containing JSON-formatted data and parses though it, producing a JSON data structure similar to the tree-like diagram shown above. Second, it provides you with some functions that allow you to manipulate this JSON data structure. Third, it provides a function that generates JSON text as output, given a JSON data structure as input. Let's walk through the code. Take a look at JSONParser.h; this header file declares the public interface to the parser. The interface consists of a set of C++ class definitions. The first, JSONValue, is a purely abstract class that is never actually instantiated; it is a base class that is subclasses for each type of JSON value listed above (e.g., a json object, a json string, and so on.). This base class does declare a class method, called JSONParse; the method accepts a std::string as input and returns a pointer to a JSONValue class (i.e., to one of the subclasses, cast to the abstract base class) as a return value. You use this method to parse a piece of JSON. Next, you can see a virtual function called "dump"; each of the JSON subclasses implement this function. It's role is to produce a text representation of itself, and write that text representation to the provided stream. By traversing through a JSON data structure recursively, you can use this method to produce the text representation of an entire JSON data structure. Below this, you will see a set of subclasses of JSONValue, one for each JSON data type. For the more complex subclasses (Object and Array), the subclass defines some methods that allow you to access and manipulate the value, such as by appending a value to the end of a JSON array. All of the classes, including the simple ones, provide a Dump method for converting to a text representation. As well, all of the classes expose their state through an appropriately named instance variable, such as tree_ for JSONObject or array_ for JSONArray.
Next, take a look at JSONParser.cc. This contains the partial implementation of our Parser. We have implemented most of the finicky, ugly parts of JSON parsing, but we want you to complete two pieces of our implementation. First, you should complete the implementation of ParseArray(). (Step 1). This is the function that takes a string, consumes the array from the front of the string, and produces/returns a JSONArray structure that represents it. We suggest you base your implementation off of a very similar implementation we provided you for ParseObject(). Second, you should complete the implementation of JSONArray::Dump(), which is the method that produces a string representation of a JSONArray object and writes it out to the provided stream. (Step 2). Again, we suggest that you base your implementation off of the similar implementation of JSONObject::Dump(). Third, you should complete the implementation of JSONArray::operator[](), which is the method that overrides C++'s array access operators, and allows customers to ask for particular items within the array. You will need to do a few things:
What to do. You should follow these steps to do Part A of this assignment:
I'm guessing you have used Twitter: it's a web site where you can publish short (140 characters or less) status updates, subscribe to others' status updates, and search for updates based on their content, sender, and receivers. In addition to providing a user interface through the Web, Twitter has also provided the world with an application programming interface, or API. We will be taking advantage of the search API that allows our programs to communicate with Twitter over the Internet, posing queries and receiving back JSON-formatted search responses. The Twitter search API is REST-based; this is just a fancy way of saying that in order to request data, you just issue a Web request to a carefully formatted URL. So, for example, try visiting the following URL with your browser; it has been constructed to ask Twitter for any recent tweets mentioning the phrase "lance armstrong": http://search.twitter.com/search.json?q=%22lance%20armstrong%22 Instead of a human-viewable web page, you'll see JSON data that describes a set of search results. The JSON data isn't particularly friendly to read; try using the clipboard to copy all of that text and paste it into this JSON pretty-printing site: it should make it much more readable. Looking through the pretty-printed results, you can see that the results roughly have the structure shown in the diagram at the very top of this web page, with the exception of having more fields than we showed. The results are a single JSONObject (i.e., a single map); within the map, there is a key called "results" that contains an array of result JSONObjects. Each one of the result JSONObjects contains a set of interesting name/value pairs, including the name of the user that sent the tweet, the text of the tweet, and so on. In this part of the assignment, you will write some code that opens up a network connection to Twitter, issues a search request, and receives and parses the reply JSON data, using the JSONParse code that you finished in part A. We've split part B up into three major steps:
What do do. Let's go over each step in more detail. Step 1: opening a socket Look in your code directory, and you'll notice files called SocketClient.h and SocketClient.cc. Take a look at SocketClient.h; this is a very simple header file that declares two functions, one called DNSLookup and one called ConnectToServer. Next, take a look at SocketClient.cc: notice that the implementation of these functions is missing. Your job in Step 1 is to finish the implementation. Note that the implementation is quite similar to some of the code that we have covered in the sections and in the lecture! We have also provided a simple unit test that exercises these functions, so when you think you've finished, try running test_suite and see if you make it past the Test_SocketClient unit test; if so, you're successfully opening up a socket to a remote server! Step 2: issuing an HTTP transaction Now that you're able to open up a socket to a remote server, you need to implement the functionality to issue a web request against a web server and receive the reply. The protocol that you must use to do this is called HTTP (the hypertext transfer protocol). HTTP is quite complex, but fortunately, we'll be using a very simple subset of it in this assignment. Take a look at HTTPClient.h; in it, we've declared a single function called HTTPGet. A customer invokes this function to connect to a web server, issue a request for a URL to that server, and then read the response data. Now, take a look at HTTPClient.cc; notice that we have not given you the implementation. Your job is this step of Part B is to implement HTTPGet. Here are the rough steps you will need to follow:
Step 3. Issue a Twitter Search and Parse the JSON response data In Step 3, you'll complete the implementation of TwitterAPI.cc by using HTTPClient.h to issue an HTTP request against the Twitter search API, receive back the JSON response data, parse the JSON data, and then construct a class with the query response fields nicely parsed out. To issue a query against Twitter, all you need to do is use HTTPGet() from Step 2 to connect to hostname "search.twitter.com" and port 80, and pass in an appropriate document path. The document path you will ask for has the following URL: /search.json?rpp=100&q=encoded search queryGiven a search query passed to you by the customer, you must encode it using the URIEncode() routine from Utils.h to transform the query into text that is safe to embed within a URL. For example, if the customer wants to search for the phrase "lance armstrong", then the quotation marks and the space character need to be converted into safe versions of them using URIEncode(), and the resulting document path to ask for is: /search.json?rpp=100&q=%22lance%20armstrong%22So, using HTPGet(), you issue the request for this document, and you get back JSON-encoded search results. Your next step is to use JSONParse from part A to parse these results into a nested JSONValue structure. Finally, you have to iterate through this structure, pulling out key fields and building up a std::vector of Results. (A Result is a nested class within the TwitterSearch class, as you can see from the TwitterAPI.h header file.) The JSON that you receive back will be a more complex version of what was shown in the diagram near the top of this page in Part A. In particular, try clicking on this search URL to see that encoded JSON data in your browser: http://search.twitter.com/search.json?q=%22lance%20armstrong%22 It's hard to read, but if you use the tool to make it more readable provided near the top of this page, you'll notice that there is a top level object with a results field whose value is an array of result objects. Each object contains some string values. You will need to build up a Result structure for each result object, extracting out the fields. You should use the Get JSONObject::Get() method to fetch the "results" array from the top-level results, and then you can use the JSONArray's array access operator [ ] to access the individual results. Within each result, you use the JSONObject::Get() method to access the string fields, such as the "from_user" field. Now, take a look at TwitterAPI.cc; note that we have provided some of the code, but your job is to complete the implementation. Once you think you have finished, try running the unit tests and see if you make it past the TwitterAPI unit test.
Now that you have code that is able to interact with Twitter, we are going to implement a very simple user interface. You will complete a program that prompts the user for a query, reads the query from the user, then issues the query against twitter. Instead of printing the results out to the screen, the program will write a temporary HTML file with the results within it, then invoke the Firefox browser to display the results graphically. First, try running our solution binary (solution_binaries/twitter_shell). Note how it works: the user types a query and presses return, and the shell then works its magic. Your job is to replicate this behavior. Take a look at twitter_shell.cc; note that the implementation is partially missing. Fill it in the usual way: look for the STEP X comments, and follow the directions. Here's some detail on a few of steps:
When you're ready to turn in your assignment, do the following:
We will be basing your grade on several elements:
|
Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350 (206) 543-1695 voice, (206) 543-2969 FAX [ comments to gribble ] |