CSE333 17au -- Homework #2

Out: Wednesday January 18, 2017
Due: Thursday January 26, 2017, 11:59 PM.

[ summary | assignment | how to submit | grading ]

Summary
In HW1 you completed the implementation of a generic linked list library. HW2 gives you the chance to design and implement a different generic container library, one providing 2-dimensional arrays. It is up to you to design the interface - the set of methods provided by your library. You will be using this library for the remainder of the homeworks, though, so the work of implementing methods you'll later need and don't implement now will be piled upon the work of those later assignments. At the same time, methods you implement now and never use will just be extra work.
Requirements
You'll implement your 2d array library and a disposable mainline that tests its functionality. Your mainline should be invoked in the same way and perform the same operations as our sample solution, as shown in the output below. This particular mainline uses the generic 2d array to store ints.
$ ./hw2 test.json
[1] Deserialize test.json
    		  deserialized 0
		  deserialized 1
		  deserialized 2
		  deserialized 3
    		  deserialized 4
		  deserialized 5
Array is 2 x 3
0 1 2 3 4 5

[2] Set [1][1] to 100
Array is 2 x 3
0 1 2 3 100 5

[3] swap [1][1] and [0][0]
Array is 2 x 3
100 1 2 3 0 5

[4] swap [1][1] and [10][10]
Array is 2 x 3
100 1 2 3 0 5

[5] serialize array to file json.out
    	      serialized 100
    	      serialized 1
	      serialized 2
	      serialized 3
	      serialized 0
	      serialized 5

[6] destroy array
    	    destroyed 100
    	    destroyed 1
	    destroyed 2
	    destroyed 3
	    destroyed 0
	    destroyed 5

Each line starting [n] is a test step performed by the mainline, as follows:

  1. The first step is to deserialize a 2d array from a serialized description contained in a file. The name of the file is given as the only command line argument. In this case, the file is test.json. The file contains a json representation of the 2d array. More on that in a later section. For now, it's enough to think of this as reading the contents of a file and creating the in-memory representation of a 2d array.

    Because the array library is generic (intended to store items of any type), the library code cannot know how to deserialize the individual data items. It therefore delegates that work to the client, that is, the coder writing the mainline that uses the 2d array library. In the output you see lines like deserialized 0 that are produced by the client code as it deserializes each element of the array.

    A crude dump of the array is performed at the end of this, and every, step.

  2. The mainline causes element [1][1] to be set to 100.

  3. The mainline causes the elements at [1][1] and [0][0] to be swapped.

  4. The mainline attempts to swap the elements at [1][1] and [10][10], which is out of bounds for this array. No code crashes.

  5. The mainline serializes the current 2d array to file json.out. (Again, more on json in a bit.) The client is responsible for serializing individual elements contained in the array since the library doesn't know their types. An output line is printed by the client code for each element it serializes.

  6. The 2d array is destroyed. Client code is invoked to destroy each element.
Library Design
What Code Goes Where?

Exactly.

It is difficult to make good decisions about what to implement in the library and what to leave out, forcing the client code to implement it. It's very hard to get that right, even for experienced programmers. There are no hard and fast rules about these decisions.

That said, it's normal for programmers with about the experience you probably have to have a hard time thinking about the distinction at all. You end up writing the code either way, no matter where it goes, so it doesn't seem to matter.

You should imagine, though, that your library code is going to be used by many, many clients, for many different applications. You want your library code to make their jobs easier - that's the whole point.

It's our intention that your experience with the linked list interface in HW1 will be a significant guide to your decisions about HW2.

One Pretty Firm Rule

Library code shouldn't be making decisions that affect the entire application. For example, it shouldn't execute exit, terminating the application, no matter what kind of failure it has detected. The decision to terminate should be left to the application writer (i.e., the client code). The library should probably have a way to indicate that it's very unhappy, but it should leave it at that.

Similarly, library code should avoid having side effects. One example is printing. Library code shouldn't print anything (unless specifically asked to do so by the client code). If it does, that can get in the way of many applications, which might not want that printing. That said, not being able to print can cause you a lot of work if you try to reflect errors to the client code. If your code can encounter more than one kind of error, you'd need to provide return codes with different values that identify the kind of error. Your return code is just a code, though, and usually can't express as much as a printed error message. Additionally, it's common for the natural implementation of a library routine to be as a function that returns some data, not a return code (e.g., m = max(a,b)). In this case, that you've designed a method that cannot indicate an error to the caller but have encountered an error, you might decide that while debugging your library that it will print. Our sample solution does.

Anticipate Common Client Mistakes

It is very frustrating to use libraries that simply crash when you make some mistake calling them. Even though the bug is in your client code, you'd much prefer that the library gave you some help finding it than simply crashing. For the library to do that, it needs code whose only purpose is to validate the invocations being made on it. For instance, you might test that some pointer handed to you isn't NULL before you try to use it, even though your documentation is perfectly clear that handing NULL to you is a client error. Making that test slows your code down, but it's a significant help to the client, as NULL pointer mistakes are common.

Some (few) libraries provide both debug and release versions. The former extensively validates client inputs, whereas the latter does no validation at all. The assert macro (man assert) is intended to help you write code that can be compiled into both debug and release versions.

Ok, Really, What Are the Detailed Specifications?
Your application must perform the sequence of operations shown in the example output above, for an arbitrary input file. We don't care about your application, though, just about your library.

The specifications for your library are up to you, with one exception. The exception is that we're defining the serialized representation of 2d arrays, as discussed next. We're doing that so that (a) we can provide initial array contents in a format that all your implementations can read, and (b) you will eventually be able to pass array values between your implementations.

You will be required to continue using the 2D array library you build in this homework in all succeeding homeworks. It is normal to want to extend the interface as you see what later homeworks require. It is also normal to find bugs in this project as you do later projects. Those things don't count against you. You should still try to complete this homework in a way that avoids having to debug or extend while trying to do some later homework, though.

Serialization / JSON
The in-memory representation of an object may have some complicated structure. A tree, for instance, relates a parent node to potentially many child nodes. Serialization is the process of taking an arbitrary in-memory structure and creating a one dimensional representation of it. Serialized representations can be put in files, for instance, and then later retrieved and deserialized to reconstruct the in-memory object. In this project we use JSON as the language to describe the serialized representation of a 2d array.

From www.json.org:

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.

JSON can represent simple types: strings and numbers. It also can represent two complex types, arrays and maps (called objects because that's what they're called in javascript). String are denoted like this: "some string". Numbers are like this: 6. Arrays are denoted using square brackets: [1, "two", 3.0]; Maps/objects are denoted using curly braces: { "keyOne" : 1, "keyTwo": [0,1] }. Simple, eh?

In the example output above, the first step was to deserialize file test.json to produce the 2x3 array
012
345
Our specification for the serialization of a 2d array is as a single JSON object with members rows, columns, and data. The first two are integers giving the number of rows and columns in the array. The last is an array listing the values of the objects in the array in row major order. (That means by first traversing row 0 of the array, then row 1, etc.) The file contents for our sample array look like this:

{"rows": 2, "columns":3, "data":[0,1,2,3,4,5]}

Jansson
You can use any library you want to help you read and write JSON. The one I used, and the one we're distributing as part of this project, is Jansson.

Using Jansson to deserialize, you invoke one of its methods to read a file containing JSON. The result is a Jansson object that is the in-memory representation of the JSON object. You then use Jansson methods to fetch the values of fields of that object, and use those values to populate your own in-memory representation of a 2d array, the one that calls to your interface are going to manipulate. To serialize you do the reverse: you use Jansson methods to construct a Jansson object and populate it with values from your 2d array, and then call a Jansson method to write the JSON for that object to a file.

If you fetch hw2.tar.gz and expand it, it will produce two directories, hw2 and jansson. The hw2 directory contains a makefile that can build your source using the Jansson library in the jansson directory. (You will have to slightly edit the makefile to list the names of your source files in the place indicated.) Just say make to build. Say make run to run using valgrind.

That distribution was built on Linux. If you're using some other system, you may have to build the Jansson libraries on your system, or use some other JSON library that is easier to install on your system.

Please note that we test on attu, and your code has to run there for us. If you submit code that doesn't use the standard Jansson setup just described, you must submit everything required for us to run your code once we have expanded the tar file containing it. That means that if you use some library other than Jansson, you have to supply it (unless it's already installed on attu, which is unlikely).

Finding Memory Leaks: valgrind
valgrind is actually a suite of tools for finding memory and threading bugs. When invoked like this
    $ valgrind --leak-check=full <cmd>
  
it looks for memory leaks -- memory that has been allocated but not freed. (You must compile your code with the -g switch to get the most useful output from valgrind.)

You should use valgrind to make sure that you free all memory you allocate. The makefile shows how to invoke your code using valgrind for this purpose. Doing that on the sample solution results in this output when my program terminates:

  ==5694==
  ==5694== HEAP SUMMARY:
  ==5694==     in use at exit: 0 bytes in 0 blocks
  ==5694==   total heap usage: 41 allocs, 41 frees, 11,772 bytes allocated
  ==5694==
  ==5694== All heap blocks were freed -- no leaks are possible
  ==5694==
  ==5694== For counts of detected and suppressed errors, rerun with: -v
  ==5694== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
    

Grading