PROJECT: Homework 6B: Memory Management System

Due: Friday, Dec 8, 2023, at 11:59pm

Before you start:
Submit Canvas comment specifying whether you wish to work with a partner

Do a git pull on cse374-materials to get the updated starter code (in HW6)

Assignment Goal

Homework 6 is a project that continues our exploration of procedural programming, memory management, and software tools. In particular in this assignment you will: implement and test a memory management package that has the same functionality as the standard library malloc and free functions. In Part A (last week) you built a bench tester to develop your testing skills. In Part B (this section) you will use this bench tester to evaluate a new memory management package. You will exercise your skills in pointer management, as well as use a test framework and Makefile.

Synopsis

In this assignment you will develop and benchmark a memory management package. Starter code will be provided for you to use, and is available through the CSE374-materials git repository. This includes a pre-compiled bench.o file to be used if you desire.

This portion of the project (Part B) is a large project and worth a total of 60 points. You will be evaluated on the correctness of your memory management functions, as well as your updated Makefile, and the contents of a Readme file. Notice that Part B is treated as a 'Project' homework, and is therefor NOT eligible to be dropped at the end of the quarter.

Team Work

You may work in a team for this assignment. If you were working with a team mate for Part A, please continue to work with that partner. You will continue to use your shared gitlab repository. Please notice that each participating team member will recieve the same grade at the completion of this project. In order to be considered a participating team member you must have commits on the gitlab repository.

GitLab

You should use your git repository on the CSE GitLab server for this assignment; you cannot use another repository elsewhere. (And, as is true of all assignments, your solution code should not be publicly available on any repository where it could be accessed by other students in the class this quarter or in the future.)

Requirements

The project consists of two main technical pieces: a memory management package, and a program to exercise it and report statistics. While starter code is provided to give you a jump on development, you are ultimately responsible for both portions of the code and for making them work together.

The code provided for Part B implements the headers (and therefore defines a datatype) as well as some of the functions described below. You will need to look through the code to determine what remains to be completed, and how to make it work with the existing functions. You may modify any of the supplied code, and at the minimum Makefile, getmem.c freemem.c. In order to test your memory system you may use your own working version of bench, or you may use the provided bench.o.

Memory Management

The memory management package should include a header file mem.h and C implementation files that specify and implement the following four functions.

Function	Description
`void* getmem(uintptr_t size)`	Return a pointer to a new block of storage with at least `size` bytes of memory. The pointer to the returned block should be aligned on an 16-byte boundary (i.e., its address should be a multiple of 16). The block may be somewhat larger than the size requested, if that is convenient for the memory allocation routines, but should not be significantly larger, which would waste space. The value `size` must be greater than 0. If `size` is not positive, or if for some reason `getmem` cannot satisfy the request, it should return `NULL`. Type `uintptr_t` is an unsigned integer type that can hold a pointer value (i.e., can be converted to or from a pointer of type `void *`, or other pointer type, exactly). It is defined in header `<inttypes.h>` (and also `<stdint.h>`). See discussion below.
`void freemem(void* p)`	Return the block of storage at location `p` to the pool of available free storage. The pointer value `p` must be one that was obtained as the result of a call to `getmem`. If `p` is `NULL`, then the call to `freemem` has no effect and returns immediately. If `p` has some value other than one returned by `getmem`, or if the block it points to has previously been released by another call to `freemem`, then the operation of `freemem` is undefined (i.e., `freemem` may behave in any manner it chooses, possibly causing the program to crash either immediately or later; it is under no obligation to detect or report such errors). An additional implementation requirement: When `freemem` returns a block of storage to the pool, if the block is physically located in memory adjacent to one or more other free blocks, then the free blocks involved should be combined into a single larger block, rather than adding the small blocks to the free list individually.
`void get_mem_stats( uintptr_t* total_size, uintptr_t* total_free, uintptr_t* n_free_blocks)`	Store statistics about the current state of the memory manager in the three integer variables whose addresses are given as arguments. The information stored should be as follows: `total_size`: total amount of storage in bytes acquired by the memory manager so far to use in satisfying allocation requests. (In other words, the total amount requested from the underlying system via malloc.) `total_free`: the total amount of storage in bytes that is currently stored on the free list, including any space occupied by header information or links in the free blocks. `n_free_blocks`: the total number of individual blocks currently stored on the free list. See the discussion below outlining the implementation of the memory manager for more details about these quantities.
`void print_heap(FILE * f)`	Print a formatted listing on file `f` showing the blocks on the free list. Each line of output should describe one free block and begin with two hexadecimal numbers (`0xdddddddd`, where `d` is a hexadecimal digit) giving the address and length of that block. You may include any additional information you wish on the line describing the free block, but each free block should be described on a single output line that begins with the block's address and length.

In addition, there should be a separate header file mem_impl.h and C implementation file for the following function, which is used internally in the memory manager implementation, but is not intended to be used by client code.

Function Description

Function	Description
`void check_heap()`	Check for possible problems with the free list data structure. When called this function should use `assert`s to verify that the free list has the following properties: Blocks are ordered with strictly increasing memory addresses Block sizes are positive numbers and no smaller than whatever minimum size you are using in your implementation Blocks do not overlap (the start + length of a block is not an address in the middle of a later block on the list) Blocks are not touching (the start + length of a block should not be the same address as the next block on the list since in that case the two blocks should have been combined into a single, larger block.) If no errors are detected, this function should return silently after performing these tests. If an error is detected, then an `assert` should fail and cause the program to terminate at that point. Calls to `check_heap` should be included in other functions to attempt to catch errors as soon as possible. In particular, include calls to `check_heap` at the beginning and end of functions `getmem` and `freemem`. Include additional calls to `check_heap` wherever else it makes sense.

void check_heap()

Check for possible problems with the free list data structure. When called this function should use asserts to verify that the free list has the following properties:

Blocks are ordered with strictly increasing memory addresses
Block sizes are positive numbers and no smaller than whatever minimum size you are using in your implementation
Blocks do not overlap (the start + length of a block is not an address in the middle of a later block on the list)
Blocks are not touching (the start + length of a block should not be the same address as the next block on the list since in that case the two blocks should have been combined into a single, larger block.)

If no errors are detected, this function should return silently after performing these tests. If an error is detected, then an assert should fail and cause the program to terminate at that point. Calls to check_heap should be included in other functions to attempt to catch errors as soon as possible. In particular, include calls to check_heap at the beginning and end of functions getmem and freemem. Include additional calls to check_heap wherever else it makes sense.

Dividing the Work

In this assignment we will break the memory system into three files: mem_utils.c contains the functions for evaluating the memory manager. getmem.c will implement getmem, and freemem.c will implement freemem. Helper functions should be placed in the module where they are used, or in mem_utils.c if they are used by both getmem and freemem. The memory system is declared in two header files: mem.h defines the user facing functions and can be included in the bench program. mem_impl.h defines the utility functions and is the 'back-end'.

In addition to these files you will use your bench.c code to test the memory management system.

Note: In a production memory system the entire module may contain one header (mem.h), and one implementation function (memory.c). The second header (mem_impl.h) would be unecessary because those declarations could be in memory.c. The code is divided up for convenience in assignment distribution. Your primary job will be to implement the memory functions, and add functionality to Makefile. Please read the comments in the provided code carefully.

Test and Benchmark

This section was completed in Part A of this project. If you are satisfied with your own version of bench.c you should certainly use that code for testing. You may also wish to use the provided bench.o to see if you notice any differences in performance.

You can use your bench program to test the memory system as you go. Once you think you have all the memory fucntions in working order, you might want to rerun bench after recompiling the code with -DNDEBUG to turn off the assert tests in check_heap to see how much faster the code runs without them. However, leave the check_heap tests on while developing and debugging your code since this will be a big help in catching errors.

Additional Requirements

Besides the software specifications above, you must meet the following requirements for this assignment.

You are required to use the gitlab repository created for you or your team.
You should create a Makefile with at least the following targets:
- bench (this should be the default target). Generate the bench executable program.
- test. Run the bench test program with default parameters. This should recompile the program first if needed to bring it up to date.
- clean. Remove any .o files, executable, emacs backup files (*~), and any other files generated as part of making the program, leaving only the original source files and any other files in the directory unrelated to the project.
You may add any additional targets that you wish for your convenience. There are some ideas for useful targets in the Implementation Suggestions section, below.
You should create a Readme file to accompany your work This file should give a brief summary of:
- Your name(s), and repository url (if used)
- A brief description of how your heap (free list) data structure is organized and the algorithms used to manage it.
- A summary of any additional features or improvements in your memory manager. If you did any additional optimization of your design, or additional exploration of memory parameters, include that information here.
- A summary of the results you observed on several runs of your bench program with your memory manager. This does not have to be extensive, but, you should compare to the performance you obtained using the memory.o file in Part A.
- A summary of any resources you consulted for information about memory management algorithms. Your code, of course, must be your own, but feel free to research and explore memory management topics.
Finally, your code should be of the usual high quality, with clean layout, good comments, and so forth. In particular, the comments describing the free list data structures should contain a complete but succinct description of this data so that someone can read these definitions and understand them without tracing the code that uses them. Use cpplint to check for possible style issues that may need correcting. (Note: cpplint may flag rand and ask you to use the multi-threaded version. For this reason we have included a version of cpplint in the hw6 folder that removes this flag.)

Repository Notes

You have a repository on gitlab you may use for this assignment on the CSE gitlab (https://gitlab.cs.washington.edu). If you are working with in a team a new repository will be created for which you will share access. You should use your experience with gitlab to access this resource.

Memory Management

The above sections describe what you need to do. This section gives some ideas about how to do it. We discuss this further in class, and you should take advantage of the Ed discussion page to trade questions, ideas, and suggestions.

The basic idea behind the memory manager is fairly simple. At the core, the getmem and freemem functions share a single data structure, the free list, which is just a linked-list of free memory blocks that are available to satisfy memory allocation requests. Each block on the free list starts with an uintptr_t integer that gives its size followed by a pointer to the next block on the free list. As each of these values requires 8 bytes, we require that all blocks be a multiple of 16 bytes in size, with all addresses multiples of 16 bytes. This helps ensure that dynamically allocated blocks are properly aligned.

When a block is requested from getmem, it should scan the free list looking for a block of storage that is at least as large as the amount requested, delete that block from the free list, and return a pointer to it to the caller. When freemem is called, it should return the given block to the free list, combining it with any adjacent free blocks if possible to create a single, larger block instead of several smaller ones.

The actual implementation needs to be a bit more clever than this. In particular, if getmem finds a block on the free list that is substantially larger than the storage requested, it should divide that block and return a pointer to a portion that is large enough to satisfy the request, leaving the remainder on the free list. But if the block is only a little bit larger than the requested size, then it doesn't make sense to split it and leave a tiny chunk on the free list that is unlikely to be useful in satisfying future requests. You can experiment with this threshold and see what number is large enough to prevent excessive fragmentation, without wasting too much space that could have been used to satisfy small requests.

What if no block on the free list is large enough to satisfy a getmem request? In that case, getmem needs to acquire a good-sized block of storage from the underlying system, add it to the free list, then split it up, yielding a block that will satisfy the request, and leaving the remainder on the free list. Since requests to the underlying system are (normally) relatively expensive, they should yield a reasonably large chunk of storage, say at least 4K or 8K or more, that is likely to be useful in satisfying several future getmem requests. Normally the same amount is acquired each time it is necessary to go to the underlying system for more memory. But watch out for really big getmem requests. If getmem is asked for, say, a 200K block, and no block currently on the free list is that large, it needs to get at least that much in a single request since the underlying system cannot be relied on to return adjacent blocks of storage on successive calls.

So what is "the underlying system"? For our purposes, we'll use the standard malloc function! Your memory manager should acquire large blocks of storage from malloc when it needs to add blocks to its free list. malloc normally guarantees that the storage it returns is aligned on 16-byte or larger boundaries on modern systems, so we won't worry about whether the block we get from malloc is properly aligned.

Notice that a request for a large block will happen the very first time getmem is called(!). When a program that uses getmem and freemem begins execution, the free list should be initially empty. The first time getmem is called, it should discover that the (empty) free list does not contain a block large enough for the request, so it will have to call the underlying system to acquire some storage to work with. If implemented cleanly, this will not be an additional "special case" in the code -- it's just the normal action taken by getmem when it needs to get new blocks for the free list!

What about freemem? When it is called, it is passed a pointer to a block of storage and it needs to add this storage to the free list, combining it with any immediately adjacent blocks that are already on the list. What freemem isn't told is how big the block is(!). In order for this to work, freemem somehow has to be able to find the size of the block. The usual way this is done is to have getmem actually allocate a block of memory that is a bit larger than the user's request, store the block size at the beginning of the block, and return to the caller a pointer to the storage that the caller can use, but which actually points a few bytes beyond the real start of the block. Then when freemem is called, it can take the pointer it is given, subtract the appropriate number of bytes to get the real start address of the block, and find the size of the block there. Our list nodes are set up to hold this size.

How is freemem going to find nearby blocks and decide whether it can combine a newly freed block with one(s) adjacent to it? There are various ways to do this (as usual), but a good basic strategy is for getmem and freemem to keep the blocks on the free list sorted in order of ascending memory address. The block addresses plus the sizes stored in the blocks can be used to determine where a new block should be placed in the free list and whether it is, in fact, adjacent to another one.

It could happen that a request to freemem would result in one of the underlying blocks obtained from the system (i.e., from malloc) becoming totally free, making it possible to return that block to the system. But this is difficult to detect and not worth the trouble in normal use, so you shouldn't deal with this possibility in your code.

For more information, in addition to Google and Wikipedia, an authoritative discussion is in Sec. 2.5, Dynamic Storage Allocation, in The Art of Computer Programming, Vol. I: Fundamental Algorithms, by Donald Knuth.

Implementation Suggestions

Here are a few ideas that you might find useful. Feel free to use or ignore them as you wish, although you do need to use the 64-bit pointer types correctly.

64-bit Pointers and ints

Your code should work on, and we will evaluate it on, the CSE Linux systems (cancun and the CSE virtual machine). These are 64-bit machines, which means pointers and addresses are 64-bit (8-byte) quantities. Your code will probably work on other 64-bit machines, and, if you're careful, might work on 32-bit machines if it is recompiled, although we won't test that.

One thing that is needed in several places is to treat pointer values as unsigned integers so we can do arithmetic to compute memory block addresses and sizes. We need to be able to cast 64-bit values between integer and pointer types without losing any information. Fortunately the library <inttypes.h> contains a number of types and macros that make the job easier (and fairly portable!). The main type we want to use is uintptr_t, which is a type that is guaranteed to be the right size to hold a pointer value so that we can treat it as an unsigned integer. A pointer value (void* or any other pointer type) can be cast to uintptr_t to create an integer value for arithmetic, and uintptr_t values can be cast to pointers when they hold integers that we want to treat as addresses. (There is also an intptr_t type that is a signed integer type of the right size to hold a pointer, but for our project it would be best to stick with unsigned values.)

You can print pointers and uintptr_t values with printf. Use format %p to print a pointer value, e.g., printf("%p\n", ptr);. For uintptr_t values, since these are stored as long, unsigned integers on our 64-bit systems, they can be printed as decimal numbers using the %lu format specifier: printf("%lu\n",uintvalue);. It turns out that <inttypes.h> defines string macros that make it possible to print values without knowing the actual size of the underlying type. The magic incantation to print an uintptr_t value ui is printf("%" PRIuPTR "\n", ui);. There are other formatting macros to do things like print signed integer pointer values as decimal numbers (PRIdPTR) or in hex (PRIxPTR). See a good C reference for details.

Developing and Testing

As with all projects, you should start (very) small and incrementally build up the final project. Here are some ideas:

You can implement getmem first by itself. Just have freemem return without doing anything. Get freemem working later.
Use small tests involving very few getmem/freemem requests when you are first testing the memory manager routines.
The print_heap function can be very helpful during debugging. Get it working early. Also, gdb can be very useful for exploring the free list (expecially gdb's x command) and for examining the operation of your code.
Write several small test programs whose effect on the heap you can predict by hand, then use the free list printout (above) and/or gdb to check that it really works as you expect.
Don't be shy about adding lots of targets to your Makefile to compile and run small test programs, or run the benchmark program with various argument values. If you find yourself typing the same command more than a few times to run a test, add it to your Makefile as the command for a target with a suitable name (e.g., testonlygetmem, testbigmemory, testmanysmallcalls, etc.).
The get_mem_stats function may be useful during debugging to see the effect on the free list of various patterns of getmem and freemem requests. Don't feel constrained to use it only to produce the required benchmark program reports.
Use check_heap() and other asserts in your program. These can be particularly useful while you are testing and debugging, especially to check that pointers are not NULL when they shouldn't be and that the heap data structures have not been corrupted. In particular, include calls to check_heap() at the beginning and end of getmem and freemem to verify that those functions don't introduce any obvious, checkable errors in the free list. Leave the asserts and check_heap() calls in your code even after things seem to be working. You can always put -DNDEBUG in a gcc command in some Makefile target to disable asserts if you want to run your code without them.
Note that valgrind is unlikely to be particularly helpful for this assignment. We are manipulating pointers in non-standard ways and valgrind will probably report many spurious problems that are not really errors given what the code needs to do.
cpplint.py is provided. You should use this version early to check your code for styles issues.

Submission / Turning-In

For this assignment, you will turn in the code via Gradescope as usual. You should be able to submit it directly from Gitlab. Please make sure that all the code associated with this assignment is in a folder called HW6B. If you are working as a team we will double check your gitlab commits to ensure that each team member is active.

You should submit a folder called HW6B containing a Readme, Makefile, getmem.c and freemem.c. If you modified mem.h, mem_impl.h, or mem_utils.c you should additionally include those. For testing we will use your Makefile to compile and run the code, using our own bench.o file.

PROJECT: Homework 6B: Memory Management System

Assignment Goal

Jump To:

Synopsis

Team Work

GitLab

Requirements

Memory Management

Dividing the Work

Test and Benchmark

Additional Requirements

Repository Notes

Memory Management

Implementation Suggestions

64-bit Pointers and ints

Developing and Testing

Submission / Turning-In