∂
Due: Tuesday, Oct 15th, 2024 by 10:00 pm
For Homework #1, you will finish our implementation of two C data structures: a doubly-linked list (Part A) and a chained hash table (Part B). You will gain experience and proficiency with C programming, particularly memory management, pointers, and linked data structures.
Please read through this entire document before beginning the assignment, and please start early! This assignment involves messy pointer manipulation and malloc/free puzzles, and these can cause arbitrarily awful bugs that take time and patience to find and fix.
All CSE 333 assignments, including this homework, are only supported on the current CSE Linux environment (attu, CSE lab workstations, and the 24su CSE Home Linux VM). We do not support building and running this assignment in any other work environments, including other versions of Linux.
If you've programmed in Java, you're used to having a fairly rich library of elemental data structures upon which you can build, such as vectors and hash tables. In C, you don't have that luxury: the C standard library provides you with very little. In this assignment, you will add missing pieces of code in our implementation of a generic doubly-linked list.
At a high-level, a doubly-linked list looks like this:
Each node in a doubly-linked list has three fields: a
payload, a pointer to the previous element in the list (or
NULL
if there is no previous element), and a
pointer to the next element in the list.
If the list is empty, there are no nodes. If the list has
a single element, both of its next and previous pointers
are NULL
.
So, what makes implementing this in C tricky? Quite a few things:
Given all of these complications, our actual linked list data structure ends up looking like this:
Specifically, we define the following types and structures:
LinkedListNode
into the data structure, and
update our LinkedList
's metadata.
LLIterator
contains bookkeeping associated with an iterator. In
particular, it tracks the list that the iterator is
associated with and the node in the list that
the iterator currently points to. Note that there is a
consistency problem here: if a customer updates a
linked list by removing a node, it's possible that
some existing iterator becomes inconsistent because it
referenced the deleted node. So, we make our
customers promise that they will free any live iterators
before mutating the linked list. (Since we are generous,
we do allow a customer to keep an iterator if the
mutation was done using that iterator.) When
a customer asks for a new iterator, we malloc an instance
and return a pointer to it to the customer.
You should follow these steps to do this part of the assignment:
git pull
; you may need to
resolve issues if you have unstaged changes or merge
issues (Github has a useful document about how to
resolve
merge conflicts if you need a reference).
make
on the CSE Linux machines.
LinkedList.c
; it defines the structures
we diagrammed above. These implementation details would
typically be withheld from the client by placing the
LinkedList.c
; however, we
have opted to place them in a "private .h"
instead so that our unit test code can verify the
correctness of the linked list's internals.
LinkedList_Priv.h
and LinkedList.c
example_program_ll
and
test_suite
. These binaries were compiled
with a complete, working version of
LinkedList.c
; you can run them to explore
what should be displayed when your assignment is
working!
make
on a CSE Linux machine to verify
that you can build your own versions of
example_program_ll
and test_suite
.
make
should print out a few things, and you
should end up with new binaries inside the hw1 directory.
LinkedList.c
, the binaries you just compiled
won't work correctly yet. Try running them, and note that
example_program_ll
halts with an assertion
error or a segfault and test_suite
prints out
some information indicating failed tests, and may crash
before terminating.
LinkedList.c
. Go through
LinkedList.c
, find each comment that says
"STEP X", and place working code there (please keep the
"STEP X" comment for your graders' sanity so they can
locate your code!). The initial
steps are meant to be relatively straightforward, and some
of the later steps are trickier. You will probably find
it helpful to read through the code from top to bottom to
figure out what's going on. You will also probably find it
helpful to recompile frequently to see what compilation
errors you've introduced and need to fix. When compilation
works again, try running the test driver to
see if you're closer to being finished.
static
) helper
functions in your implementation, and you should do
that when it improves modularity.
Verify333
is used in
many places in the code to check for errors and
terminate execution if something is wrong. You
might find it helpful to discover the function that
is called when this happens so you can place a
debugger breakpoint there.valgrind --leak-check=full ./solution_binaries/example_program_llNote that Valgrind prints out that no memory leaks were found. Similarly, try running the test driver under Valgrind:
valgrind --leak-check=full ./solution_binaries/test_suiteand note that Valgrind again indicates that no memory leaks were found.
example_program_ll
and
test_suite
binaries, and
try running them under Valgrind. If you have no memory
leaks and the test_suite
runs the linked
list tests to completion, you're done with Part A!
A chained hash table is a data structure that consists of an array of buckets, with each bucket containing a linked list of elements. When a user inserts a key/value pair into the hash table, the hash table uses a hash function to map the key into one of the buckets, and then adds the key/value pair onto the linked list. There is an important corner case: if the key of the inserted key/value pair already exists in the hash table; our implementation of a hash table replaces the existing key/value pair with the new one and returns the old key/value pair to the customer.
Over time, as more and more elements are added to the hash table, the linked lists hanging off of each bucket will start to grow. As long as the number of elements in the hash table is a small multiple of the number of buckets, lookup time is fast: you hash the key to find the bucket, then iterate through the (short) chain (linked list) hanging off the bucket until you find the key. As the number of elements gets larger, lookups become less efficient, so our hash table includes logic to resize itself by increasing the number of buckets to maintain short chains.
As with the linked list in Part A, we've given you a partial implementation of a hash table. Our hash table implementation looks approximately like this:
Specifically, we defined the following types and structures:
LinkedLists
for each bucket), and return
a pointer to that malloc'ed structure to the customer.
HTIterator
points to a structure
that contains bookkeeping associated with an iterator.
Similar to a linked list iterator, the hash table iterator
keeps track of the hash table the iterator is
associated with and in addition has a linked list iterator
for iterating through the bucket linked lists. When
a customer asks for a new iterator we malloc an
HTIterator
and return a pointer to it.
You should follow these steps to do this part of the assignment:
HashTable.c
contains the implementation you
need to finish, HashTable.h
contains the public
interface to the hash table and documents all of the
functions & structures that customers see, and
HashTable_priv.h
contains some private,
internal structures that HashTable.c
uses.
HashTable.h
first to get a
sense of what the hash table interface semantics are. Then,
take a look at example_program_ht.c
; this is a
program that uses the hash table interface to
insert/lookup/remove elements from a hash table, and
uses the iterator interface to iterate through the elements
of the hash table.
test_hashtable.cc
contains our
Google Test unit tests for the hash table. Run this -- on
its own, and using valgrind -- to see how close you
are to finishing your hash table implementation.
HashTable.c
, find all of the
missing pieces (identified by STEP X comments, as before),
and implement them.
solution_binaries
, we've
provided linux executables (i.e.
example_program_ht
and the same
test_suite
) that were compiled with our
complete, working version of HashTable.c
You
can run them to explore what should be displayed when
your part B implementation is working and look at the
source code for examples of how to use the data structures.
As with hw0, you compile your implementation
using the make
command. This will result in several
output files, including an executable called test_suite
.
You can run all of the tests in that suite with the command:
bash$ ./test_suite
You can also run only specific tests by passing command line
arguments into test_suite
. For example, to only
run the LinkedList
tests, you can type:
bash$ ./test_suite --gtest_filter=Test_LinkedList.*
Or you can run a single test, such as this one which verifies
the Push()
and Pop()
methods:
bash$ ./test_suite --gtest_filter=Test_LinkedList.PushPop
In general, you can specify which tests are run for any of the tests in the assignment; you just need to know the names of the tests, which can be obtained by running:
bash$ ./test_suite --gtest_list_tests
These settings can be helpful for debugging specific parts of the
assignment, especially since test_suite
can be run with
these settings through valgrind
and gdb
!
However, you should not debug your code using only the supplied
tests! The test setup and code are complex enough that it can be
hard to isolate problems effectively without spending excessive
amounts of time trying to reverse-engineer the details of the
test_suite
code.
An often effective strategy is to use the test_suite
program to identify parts of your code that seem to be misbehaving,
then write some small test programs of your own to isolate the
problems in a much simpler setting; it will be much easier to
debug/fix them there.
In addition to passing tests, your code should be high quality and readable. This includes several aspects:
static
) helper functions, be sure to provide good
comments that explain the function inputs, outputs, and
behavior. These comments can often be relatively brief as long
as they convey to the reader the information needed to
understand how to use the function and what it does when
executed.cpplint.py --clint
and Valgrind
)
to look for common coding bugs and fix reported issues
before submitting your code. Exception: if
cpplint
reports style problems in the supplied
starter code, you should leave that code as-is.When you are ready to turn in your assignment, you should follow
exactly the same procedures you used in hw0, except this time tag
the repository with hw1-final
instead
of hw0-final
. Remember to clean up and
commit and push all necessary files to your repository
before you add and push the tag.
After you have created and pushed the tag(s), be
absolutely sure to test everything
ON ATTU OR A LAB LINUX WORKSTATION OR THE CURRENT CSE LINUX VM
by creating a new
clone of the repository in a separate, empty directory, checkout
the hw1-final
tag, and verify that everything works
as expected.
If you discover any problems, you must delete this
new repository copy (clone) you've used for verification
and fix the problems in your original working repository.
Then make a new clone and check again to be sure the problems are
really fixed.
Refer to the hw0 submission instructions for details
and follow those steps carefully, including steps for deleting
a tag and then tagging a later commit if you need to make
some changes to the version you initially tagged.
If you fail to check your work and your project doesn't build properly when the same steps are done by the course staff to grade it, you may lose a huge amount of the possible credit for the assignment even if almost absolutely everything is actually correct.
We will be basing your grade on several elements: