Out: Friday, January 6, 2023
Due: Thursday, January 19, 2023 by 11:59 pm PST
For Homework 1, you will complete our implementation of two C data structures: a doubly-linked list (Part A) and a chained hash table (Part B).
malloc
/free
puzzles, and these can cause
awful bugs that take time and patience to find and fix.
If you've programmed in Java, you're used to having a fairly rich library of elemental data structures upon which you can build, such as vectors and hash tables. In C, you don't have that luxury: the C standard library provides you with very little. In this assignment, you will add missing pieces of code in our implementation of a generic doubly-linked list.
At a high-level, a doubly-linked list looks like this:
Each node in a doubly-linked list has three fields: a payload, a
pointer to the previous element in the list (or NULL
if there is no previous element), and a pointer to the next element
in the list.
If the list is empty, there are no nodes.
If the list has a single element, both of its next and previous
pointers are NULL
.
So, what makes implementing this in C tricky? Quite a few things:
Given all of these complications, our actual linked list data structure ends up looking like this:
Specifically, we define the following types and structures:
LinkedListNode
into the data structure, and update our LinkedList
's
metadata.LLIterator
contains bookkeeping associated with an
iterator.
In particular, it tracks the list that the iterator is associated
with and the node in the list that the iterator currently points
to.
Note that there is a consistency problem here: if a customer
updates a linked list by removing a node, it's possible that some
existing iterator becomes inconsistent because it referenced the
deleted node.
So, we make our customers promise that they will free any live
iterators before mutating the linked list.
(Since we are generous, we do allow a customer to keep an
iterator if the mutation was done using that iterator.)
When a customer asks for a new iterator, we malloc an instance
and return a pointer to it to the customer.You should follow these steps to do this part of the assignment:
git pull
.
(See GitHub Docs
if the pull
command fails because you have unstaged
changes or other problems.)
After the pull
command finishes you should see at
least the following directories and files in your repository:
$ ls cpplint.py exercises gtest hw0 hw1
hw1
directory.
You'll see a number of files and subdirectories, including these
that are relevant to Part A:
make
on the CSE Linux machines.LinkedList.c
; it
defines the structures we diagrammed above.
These implementation details would typically be withheld from
the client by placing the contents of this header directly in
LinkedList.c
; however, we have opted to place
them in a "private .h" instead so that our unit test code can
verify the correctness of the linked list's internals.LinkedList_priv.h
and LinkedList.c
.example_program_ll
and test_suite
.
These binaries were compiled with a complete, working version
of LinkedList.c
; you can run them to explore
what should be displayed when your assignment is
working!make
on a CSE Linux machine to
verify that you can build your own versions of
example_program_ll
and test_suite
.
make
should print out a few things, and you should
end up with new binaries inside the hw1 directory.LinkedList.c
, the binaries you
just compiled won't work correctly yet.
Try running them, and note that example_program_ll
halts with an assertion error or a segfault and
test_suite
prints out some information indicating
failed tests, and may crash before terminating.LinkedList.c
.
Go through LinkedList.c
, find each comment that says
"STEP X", and place working code there (please keep the "STEP X"
comment for your graders' sanity so they can locate your code!).
The initial steps are meant to be relatively straightforward, and
some of the later steps are trickier.
You will probably find it helpful to read through the code from
top to bottom to figure out what's going on.
You will also probably find it helpful to recompile frequently to
see what compilation errors you've introduced and need to fix.
When compilation works again, try running the test driver to see
if you're closer to being finished.
static
– more info
here)
helper functions in your implementation, and you should do
that when it improves modularity.Verify333
is
used in many places in the code to check for errors and
terminate execution if something is wrong.
You might find it helpful to discover the function that is
called when this happens so you can place a debugger
breakpoint there.hw1
directory run the
following command:
$ valgrind --leak-check=full ./solution_binaries/example_program_llNote that we are runnning this on the solution binaries, so Valgrind prints out that no memory issues were found. Similarly, try running the solution test driver under Valgrind:
$ valgrind --leak-check=full ./solution_binaries/test_suiteand note that Valgrind again indicates that no memory issues were found.
example_program_ll
and
test_suite
binaries while in the
hw1
directory and run them under Valgrind.
For example:
$ valgrind --leak-check=full ./test_suiteIf you have no memory issues and the
test_suite
runs the linked list tests to completion, you're done with
Part A!A chained hash table is a data structure that consists of an array of buckets, with each bucket containing a linked list of elements. When a user inserts a key/value pair into the hash table, the hash table uses a hash function to map the key into one of the buckets, and then adds the key/value pair onto the linked list. There is an important corner case: if the key of the inserted key/value pair already exists in the hash table; our implementation of a hash table replaces the existing key/value pair with the new one and returns the old key/value pair to the customer.
Over time, as more and more elements are added to the hash table, the linked lists hanging off of each bucket will start to grow. As long as the number of elements in the hash table is a small multiple of the number of buckets, lookup time is fast: you hash the key to find the bucket, then iterate through the (short) chain (linked list) hanging off the bucket until you find the key. As the number of elements gets larger, lookups become less efficient, so our hash table includes logic to resize itself by increasing the number of buckets to maintain short chains.
As with the linked list in Part A, we've given you a partial implementation of a hash table. Our hash table implementation looks approximately like this:
Specifically, we defined the following types and structures:
LinkedLists
for each bucket), and return a pointer
to that malloc'ed structure to the customer.HTIterator
points to a structure that contains
bookkeeping associated with an iterator.
Similar to a linked list iterator, the hash table iterator keeps
track of the hash table the iterator is associated with and in
addition has a linked list iterator for iterating through the
bucket linked lists.
When a customer asks for a new iterator we malloc an
HTIterator
and return a pointer to it.You should follow these steps to do this part of the assignment:
HashTable.c
contains the
implementation you need to finish, HashTable.h
contains the public interface to the hash table and documents all
of the functions & structures that customers see, and
HashTable_priv.h
contains some private, internal
structures that HashTable.c
uses.HashTable.h
first to
get a sense of what the hash table interface semantics are.
Then, take a look at example_program_ht.c
; this is a
program that uses the hash table interface to
insert/lookup/remove elements from a hash table, and uses the
iterator interface to iterate through the elements of the hash
table.test_hashtable.cc
contains our Google Test unit tests for the hash table.
Run this — on its own, and using valgrind — to seer
how close you are to finishing your hash table implementation.HashTable.c
, find all
of the missing pieces (identified by "STEP X" comments, as
before), and implement them.solution_binaries
,
we've provided linux executables (i.e.,
example_program_ht
and the same
test_suite
) that were compiled with our complete,
working version of HashTable.c
.
You can run them to explore what should be displayed when your
Part B implementation is working and look at the source code for
examples of how to use the data structures.
You'll notice that we provided a second Makefile called
Makefile.coverage
.
You can use it to invoke the gcov
code coverage
generation tool.
Figure out how to (a) use it to generate code coverage statistics
for LinkedList.c
and HashTable.c
, (b)
note that the code coverage for HashTable is worse than that for
the LinkedList, and (c) write additional HashTable unit tests to
improve HashTable's code coverage.
If you do work on bonus task, you must also include a
hw1-bonus
tag in your repository.
While grading, we will use whichever commit has that tag to examine
the bonus, so it may be the same or a different commit from the one
that has hw1-final
.
The hw1-final
tagged commit must also still work
properly (i.e., pass all tests, no memory issues, etc.).
If you do not have a hw1-bonus
tag in your repository,
we will assume you did not choose to submit anything for the bonus
(which will not affect your grade in any way!).
The bonus task is simple, but we're deliberately providing next to no detailed instructions on how to do it — figuring out how is part of the bonus task!
Please make sure your additional unit tests don't change the scoring mechanism that we use, obviously (we'll be checking that). Place your additional unit tests in a separate file from the original test suite. That will make it easier for us to find and evaluate your tests.
As with hw0, you can compile your implementation by using the
make
command.
This will result in several output files, including an executable
called test_suite
.
After compiling your solution with make
, you can run
all of the tests for the homework by running:
$ ./test_suite
You can also run only specific tests by passing command-line
arguments into test_suite
.
This is extremely helpful for debugging specific parts of the
assignment, especially since test_suite
can be run
with these settings through valgrind
and
gdb
!
Some examples:
LinkedList
tests, enter:
$ ./test_suite --gtest_filter=Test_LinkedList.*
Push
and Pop
from
LinkedList
, enter:
$ ./test_suite --gtest_filter=Test_LinkedList.PushPop
You can specify which tests are run for any of the tests in the assignment — you just need to know the names of the tests! You can list them all out by running:
$ ./test_suite --gtest_list_tests