Out: Friday, January 10, 2020
Due: Thursday, January 23, 2020 by 11:59 pm
You will finish our implementation of two C data structures: a doubly-linked list (Part A) and a chained hash table (Part B). You will also maintain a bug journal as you debug your project.
Please read through this entire document before beginning the assignment, and please start early! This assignment involves messy pointer manipulation and malloc/free puzzles, and these can cause arbitrarily awful bugs that take time and patience to find and fix.
All CSE 333 homeworks are only supported on attu and the CSE VM. We do not support building and running this assignment in any other work environments.
You are expected maintain a bug journal as you work on each homework this quarter. This is a lightweight way to help you focus on how you approach debugging and should not add much time, assuming that you fill it out as you encounter bugs. In fact, its intent is to save you time overall by improving your debugging skills!
In each homework, you will find a file in the directory called
bugjournal.md
with a skeleton for the prompts that you will find
below for you to fill out. You may treat this file as a standard text file.
For at least three bugs that you encounter during this homework, track your progress in solving them by responding to the following prompts:
Important: While we would prefer if you focused on "major" bugs, you may journal about smaller bugs (quickly fixed or without the use of a debugger) if you encountered fewer than three major bugs.
That's great! Instead of journaling your bugs, reflect on your programming process by describing two processes that you used to help you avoid bugs, writing a few sentences for each.
Example:
If you've programmed in Java, you're used to having a fairly rich library of elemental data structures upon which you can build, such as vectors and hash tables. In C, you don't have that luxury: the C standard library provides you with very little. In this assignment, you will add missing pieces of code in our implementation of a generic doubly-linked list.
At a high-level, a doubly-linked list looks like this:
Each node in a doubly-linked list has three fields: a payload, a
pointer to the previous element in the list (or NULL
if there is no
previous element), and a pointer to the next element in the list.
If the list is empty, there are no nodes. If the list has a single
element, both of its next and previous pointers are NULL
.
So, what makes implementing this in C tricky? Quite a few things:
Given all of these complications, our actual linked list data structure ends up looking like this:
Specifically, we define the following types and structures:
LinkedListNode
in the list. When a
customer requests that we add an element to the linked list, we
malloc a new LinkedListNode
to store the pointer to that element, do
surgery to splice the LinkedListNode
into the data structure,
and update our LinkedList
's metadata.
LLIterator
contains bookkeeping associated with an iterator.
In particular, it tracks the list that the iterator is associated with and
the node in the list that the iterator currently points to. Note that
there is a consistency problem here: if a customer updates a linked list
by removing a node, it's possible that some existing iterator becomes
inconsistent because it referenced the deleted node. So, we make our
customers promise that they will free any live iterators before
mutating the linked list. (Since we are generous, we do allow a customer
to keep an iterator if the mutation was done using that iterator.)
When a customer asks for a new iterator, we malloc an instance and return
a pointer to it to the customer.
You should follow these steps to do this assignment:
git pull
. (See
the CSE 333 Git Tutorial for some tips if
the pull
command fails because you have unstaged
changes or other problems.) After the pull
command
finishes you should see the following directories and files in your
repository:bash$ ls clint.py gtest hw0 hw1
Makefile
: a makefile you can use to compile the
assignment using the Linux command make all on attu and the VM.
LinkedList.h
: a header file that defines and documents
the API to the linked list. A customer of the linked list includes this
header file and uses the functions defined within in. Read through this
header file very carefully to understand how the linked list is expected
to behave.
LinkedList_priv.h
& LinkedList.c
:
LinkedList_priv.h
is a private header file included
by LinkedList.c
; it defines the structures we diagrammed
above. These implementation details are typically withheld from the
client by placing its contents directly in LinkedList.c
;
however, we have opted to place them in a "private .h" instead so that
our unittest code can verify the correctness of the linked list's
internals. LinkedList.c
contains the partially completed
implementation of our doubly-linked list. Your task will be to finish
the implementation. Take a minute and read through both files; note that
there are a bunch of places in LinkedList.c
that say
"STEP X:" these labels identify the missing pieces of the implementation
that you will finish.
example_program_ll.c
: this is a simple example of how a
customer might use the linked list; in it, you can see the customer
allocating a linked list, adding elements to it, creating an iterator,
using the iterator to navigate a bit, and then cleaning up.
test_linkedlist.cc
: this file contains unit tests that
we wrote to verify that the linked list implementation works correctly.
The unit tests are written to use the Google Test unit testing
framework, which has similarities to Java's JUnit testing framework. As
well, this test driver will assist the TA in grading your assignment: as
you add more pieces to the implementation, the test driver will make it
further through the unit tests, and it will print out a cumulative score
along the way. You don't need to understand what's in the test driver
for this assignment, though if you peek inside it, you might get hints
for what kinds of things you should be doing in your implementation!
solution_binaries
: in this directory, you'll find some
Linux executables, including example_program_ll
and
test_suite
. These binaries were compiled with a complete,
working version of LinkedList.c
; you can run them to explore
what should be displayed when your assignment is working!
make
on attu or the VM to verify that you can
build your own versions of example_program_ll
and
test_suite
. make
should print out a few things,
and you should end up with new binaries inside the hw1 directory.
LinkedList.c
, the binaries you just compiled won't work
correctly yet. Try running them, and note that
example_program_ll
halts with an assertion error or a segfault
and test_suite
prints out some information indicating failed
tests, and may crash before terminating.
LinkedList.c
. Go through LinkedList.c
, find each
comment that says "STEP X", and place working code there (please keep the
"STEP X" for your graders' sanity, though!). The initial steps are meant
to be relatively straightforward, and some of the later steps are trickier.
You will probably find it helpful to read through the code from top to
bottom to figure out what's going on. You will also probably find it
helpful to recompile frequently to see what compilation errors you've
introduced and need to fix. When compilation works again, try running the
test driver to see if you're closer to being finished.
static
)
helper functions in your implementation, and you should do that when it
improves modularity.Verify333
is used in many places in the
code to check for errors and terminate execution if something is wrong.
You might find it helpful to discover the function that is called when
this happens so you can place a debugger breakpoint there.
valgrind --leak-check=full ./solution_binaries/example_program_llNote that Valgrind prints out that no memory leaks were found. Similarly, try running the test driver under Valgrind:
valgrind --leak-check=full ./solution_binaries/test_suiteand note that Valgrind again indicates that no memory leaks were found.
example_program_ll
and test_suite
binaries, and
try running them under Valgrind. If you have no memory leaks and the
test_suite
runs the linked list tests to completion, you're
done with Part A!
A chained hash table is a data structure that consists of an array of buckets, with each bucket containing a linked list of elements. When a user inserts a key/value pair into the hash table, the hash table uses a hash function to map the key into one of the buckets, and then adds the key/value pair onto the linked list. (As an important corner case, if the key of the inserted key/value pair already exists in the hash table, our implementation of a hash table replaces the existing key/value pair with the new one, and returns the old key/value pair to the customer.)
Over time, as more and more elements are added to the hash table, the linked lists hanging off of each bucket will start to grow. As long as the number of elements is a small multiple of the number of buckets, lookup time is small: you hash the key to find the bucket, then iterate through the chain (linked list) hanging off the bucket until you find the key. As the number of elements gets longer, lookup gets less efficient, so our hash table includes logic to resize itself to maintain short chains.
As with the linked list in Part A, we've given you a partial implementation of a hash table. Our hash table implementation looks approximately like this:
Specifically, we defined the following types and structures:
LinkedList
s for each bucket), and return a pointer
to that malloc'ed structure to the customer.
HTIterator
points to a structure that contains bookkeeping
associated with an iterator. As before, it tracks the hash table that
the iterator is associated with as well as a linked list iterator. When
a customer asks for a new iterator, we malloc an HTIterator
and return a pointer to it.
You should follow these steps to do this assignment:
HashTable.c
contains the implementation you need to finish,
HashTable.h
contains the public interface to the hash table
and documents all of the functions & structures that customers see, and
HashTable_priv.h
contains some private, internal structures
that HashTable.c
uses.
HashTable.h
first to get a sense of what
the hash table interface semantics are. Then, take a look at
example_program_ht.c
; this is a program that uses the hash
table interface to insert/lookup/remove elements from a hash table, and
uses the iterator interface to iterate through the elements of the hash
table.
test_hashtable.cc
contains our Google Test
unittests for the hash table. Run this – on its own, and using
valgrind – to see how close you are to finishing your hash table
implementation.
HashTable.c
, find all of the missing pieces
(identified by "STEP X" comments, as before), and implement them.
solution_binaries
, we've provided you with
linux executables (i.e. example_program_ht
and the
same test_suite
) that were compiled with our complete, working
version of HashTable.c
. You can run them to explore what
should be displayed when your part B implementation is working and look at
the source code for examples of how to use the data structures.
You'll note that we provided a second Makefile called
Makefile.coverage
. You can use it to invoke the
gcov
code coverage generation tool. Figure how to (a) use it
to generate code coverage statistics for LinkedList.c
and
HashTable.c
, (b) note that the code coverage for HashTable is
worse than that for the LinkedList, and (c) write additional HashTable
unit tests to improve HashTable's code coverage.
The bonus task is simple, but we're (deliberately) providing next to no detailed instructions on how to do it – figuring out how is part of the bonus task!
Please make sure your additional unit tests don't change the scoring mechanism that we use, obviously (we'll be checking that). Place your additional unit tests in a separate file from the original test suite. That will make it easier for us to find and evaluate your tests.
When you are ready to turn in your assignment, you should follow exactly
the same procedures you used in hw0, except this time tag the repository
with hw1-final
(instead of hw0-final
). Remember
to clean up and commit and push all necessary files to your
repository before you add and push the tag.
After you have created and pushed the tag, be absolutely
sure to test everything ON ATTU OR THE VM by
creating a new clone of the repository in a separate, empty directory,
checkout the hw1-final
tag, and verify that everything works
as expected. Refer to the hw0 submission instructions for details and
follow those steps carefully.
If you fail to check your work and your project doesn't build properly when the same steps are done by the course staff to grade it, you may lose a huge amount of the possible credit for the assignment even if almost absolutely everything is actually correct.
We will be basing your grade on several elements:
test_linkedlist.cc
and test_hashtable.cc
. If
your code fails a test, we won't attempt to understand why: we're planning
on just including the number of points that the test drivers print out.
clint.py
tool to check your code for style issues. You also will find it useful
to refer to the Google C++ Style Guide (linked from the course web site).
Much of this guide applies equally well to C.