Homework: C Memory Implementation

Due: Monday, November 18, 2024, at 11:59pm

Jump To:

Goals
Synopsis
Set-Up
Provided Files
Your Task
Background Information
Assessment
Turn-in instructions

Assignment Goal

In this assignment you will implement and test a memory management package that produces the same functionality as the standard library malloc and free functions. In this process you will gain further practice with programming in C, manipulating pointers, and using a linked list data structure. You will exercise your skills using a benchmarking program (bench), using gdb, and using make.

Synopsis

In this assignment you will develop and benchmark a memory management package. Specifically, you will implement the back end for a getmem function that serves the job of malloc, and implement a freememfunction that serves the job of free. Starter code will be provided for you to use, and is available through the CSE374-materials git repository. This includes a version of bench.c that may be used as a bench tester for your solution.

You will be completing the functions declared in memory.c. This includes your own implementation of a freemem function, and the helper functions required to manage the memory allocated by getmem and de-allocated by freemem

Set-Up

Before you get started, ensure that your set-up is up-to-date and appropriate.

  1. You should do this assignment using cancun, including gcc
  2. Ensure that your repository is up-to-date and committed. (You can use git status to determine if there are outstanding changes, and git add and git commit if you are unsure.)
  3. Use git pull upstream main to pull the newest commit from the upstream repository. This will give you access to the hw7 folder containing the materials for this assignment. (upstream specifies that you want to pull from the upstream repository, and main specifies the branch. You may see a text editor open to allow you to edit the merge message. This will likely be Vim, so you can edit it and then save, or just accept the current text and exit using :q!.)

You will do this assignment in your new hw7 folder. You should get into the habit of committing and pushing code frequently.

Provided Files

Your hw7 folder contains the following files:

Your Task

Memory Management

The memory management package includes a header file (mem.h) and C implementation file (memory.c) that specify and implement the following functions.

Function Description
void* getmem(uintptr_t size)

Return a pointer to a new block of storage with at least size bytes of memory. The pointer to the returned block should be aligned on an 16-byte boundary (i.e., its address should be a multiple of 16). The block may be somewhat larger than the size requested, if that is convenient for the memory allocation routines, but should not be significantly larger, which would waste space. The value size must be greater than 0. If size is not positive, or if for some reason getmem cannot satisfy the request, it should return NULL. Type uintptr_t is an unsigned integer type that can hold a pointer value (i.e., can be converted to or from a pointer of type void *, or other pointer type, exactly). It is defined in header <inttypes.h> (and also <stdint.h>). See discussion below.

void freemem(void* p)

Return the block of storage at location p to the pool of available free storage. The pointer value p must be one that was obtained as the result of a call to getmem. If p is NULL, then the call to freemem has no effect and returns immediately. If p has some value other than one returned by getmem, or if the block it points to has previously been released by another call to freemem, then the operation of freemem is undefined (i.e., freemem may behave in any manner it chooses, possibly causing the program to crash either immediately or later; it is under no obligation to detect or report such errors).
An additional implementation requirement: When freemem returns a block of storage to the pool, if the block is physically located in memory adjacent to one or more other free blocks, then the free blocks involved should be combined into a single larger block, rather than adding the small blocks to the free list individually.

In order to get getmem and freemem to work, you will want to implement a number of helper functions. These functions have been declared in a mem_internal.h file. It is your job to implement these helper functions. You will be able to find them easily by searching for COMPLETEME in the module file. Each of these functions is prototyped, and documented with its specification within the header file.

You may create additional helper functions if you desire. Follow the standards to prototype them and document them. These functions would not be exposed to the user, so would be placed in the internal header file.

Code Quality Requirements

As with any program you write, your code should be readable and understandable to anyone who knows C. In particular, for full credit your code must observe the following requirements.

  1. Be sure to include appropriate function prototypes near the beginning of each source file for functions defined in that file whose declarations are not included in a header file. (This is done for you, unless you decide to create additional helper functions.)
  2. Comment sensibly, but not excessively. You should not use comments to repeat the obvious or explain how the C language works - assume that the reader knows C at least as well as you do. You should use comments to describe the use of variables and decision points within your code.
  3. Use appropriate names for variables and functions
  4. The provided starter code does use some global variables and macro defined constants for your convenience. You should not create any additional global variables, and you should use the defined constants.

As with the previous assignment, you should use the cpplint.py style checker to review your code. While this checker may flag a few things that you wish to leave as-is, most of the things it catches, including whitespace errors in the code, should be fixed. We will run this style checker on your code to check for any issues that should have been fixed. Please use the class discussion board if you have questions about any of clint's warnings and whether they can be ignored.

Background Information

Additional Provided Functions

In addition to the required work above, a number of utility functions have been declared and implemented for you. You may modify them if you wish. These include:

Function Description
void get_mem_stats(
  uintptr_t* total_size,
  uintptr_t* total_free,
  uintptr_t* n_free_blocks)

Places values at the specified locations corresponding to

  • total_size: total amount of memory requested from the underlying system via malloc.
  • total_free: the total amount of storage in bytes that is currently stored on the free list, including any space occupied by header information or links in the free blocks.
  • n_free_blocks: the total number of individual blocks currently stored on the free list.

void print_heap(FILE * f) Print a formatted listing on file f showing the blocks on the free list. Each line of output should describe one free block and begin with two hexadecimal numbers (0xdddddddd, where d is a hexadecimal digit) giving the address and length of that block.
void check_heap() Check for possible problems with the free list data structure. When called this function should use asserts to verify that the free list has the following properties:
  • Blocks are ordered with strictly increasing memory addresses
  • Block sizes are positive numbers and no smaller than whatever minimum size you are using in your implementation
  • Blocks do not overlap (the start + length of a block is not an address in the middle of a later block on the list)
  • Blocks are not touching (the start + length of a block should not be the same address as the next block on the list since in that case the two blocks should have been combined into a single, larger block.)
If no errors are detected, this function should return silently after performing these tests. If an error is detected, then an assert should fail and cause the program to terminate at that point. Calls to check_heap should be included in other functions to attempt to catch errors as soon as possible. In particular, include calls to check_heap at the beginning and end of functions getmem and freemem. Include additional calls to check_heap wherever else it makes sense.

Test and Benchmark

You have been provided with a C module called bench.c,. This modules provides a bench testing module which will exercise your memory implementation. You should take advantage of the input arguments to control your test runs.

You can use your bench program to test the memory system as you go. Once you think you have all the memory fucntions in working order, you might want to rerun bench after recompiling the code with -DNDEBUG to turn off the assert tests in check_heap to see how much faster the code runs without them. However, leave the check_heap tests on while developing and debugging your code since this will be a big help in catching errors.

Makefile

You have been provided with a Makefile. You should take a look at this, and notice that there are a few different ways to compile the resulting code. In addition, a testing target has been created for you. You are welcome to modify this file as you see fit.

Memory Management

The above sections describe what you need to do. This section gives some ideas about how to do it. We discuss this further in class, and you should take advantage of the Ed discussion page to trade questions, ideas, and suggestions.

The basic idea behind the memory manager is fairly simple. At the core, the getmem and freemem functions share a single data structure, the free list, which is just a linked-list of free memory blocks that are available to satisfy memory allocation requests. Each block on the free list starts with an uintptr_t integer that gives its size followed by a pointer to the next block on the free list. As each of these values requires 8 bytes, we require that all blocks be a multiple of 16 bytes in size, with all addresses multiples of 16 bytes. This helps ensure that dynamically allocated blocks are properly aligned.

When a block is requested from getmem, it should scan the free list looking for a block of storage that is at least as large as the amount requested, delete that block from the free list, and return a pointer to it to the caller. When freemem is called, it should return the given block to the free list, combining it with any adjacent free blocks if possible to create a single, larger block instead of several smaller ones.

The actual implementation needs to be a bit more clever than this. In particular, if getmem finds a block on the free list that is substantially larger than the storage requested, it should divide that block and return a pointer to a portion that is large enough to satisfy the request, leaving the remainder on the free list. But if the block is only a little bit larger than the requested size, then it doesn't make sense to split it and leave a tiny chunk on the free list that is unlikely to be useful in satisfying future requests. You can experiment with this threshold and see what number is large enough to prevent excessive fragmentation, without wasting too much space that could have been used to satisfy small requests.

What if no block on the free list is large enough to satisfy a getmem request? In that case, getmem needs to acquire a good-sized block of storage from the underlying system, add it to the free list, then split it up, yielding a block that will satisfy the request, and leaving the remainder on the free list. Since requests to the underlying system are (normally) relatively expensive, they should yield a reasonably large chunk of storage, say at least 4K or 8K or more, that is likely to be useful in satisfying several future getmem requests. Normally the same amount is acquired each time it is necessary to go to the underlying system for more memory. But watch out for really big getmem requests. If getmem is asked for, say, a 200K block, and no block currently on the free list is that large, it needs to get at least that much in a single request since the underlying system cannot be relied on to return adjacent blocks of storage on successive calls.

So what is "the underlying system"? For our purposes, we'll use the standard malloc function! Your memory manager should acquire large blocks of storage from malloc when it needs to add blocks to its free list. malloc normally guarantees that the storage it returns is aligned on 16-byte or larger boundaries on modern systems, so we won't worry about whether the block we get from malloc is properly aligned.

Notice that a request for a large block will happen the very first time getmem is called(!). When a program that uses getmem and freemem begins execution, the free list should be initially empty. The first time getmem is called, it should discover that the (empty) free list does not contain a block large enough for the request, so it will have to call the underlying system to acquire some storage to work with. If implemented cleanly, this will not be a "special case" in the code -- it's just the normal action taken by getmem when it needs to get new blocks for the free list!

What about freemem? When it is called, it is passed a pointer to a block of storage and it needs to add this storage to the free list, combining it with any immediately adjacent blocks that are already on the list. What freemem isn't told is how big the block is(!). In order for this to work, freemem somehow has to be able to find the size of the block. The usual way this is done is to have getmem actually allocate a block of memory that is a bit larger than the user's request, store the block size at the beginning of the block, and return to the caller a pointer to the storage that the caller can use, but which actually points a few bytes beyond the real start of the block. Then when freemem is called, it can take the pointer it is given, subtract the appropriate number of bytes to get the real start address of the block, and find the size of the block there. Our list nodes are set up to hold this size.

How is freemem going to find nearby blocks and decide whether it can combine a newly freed block with one(s) adjacent to it? There are various ways to do this (as usual), but a good basic strategy is for getmem and freemem to keep the blocks on the free list sorted in order of ascending memory address. The block addresses plus the sizes stored in the blocks can be used to determine where a new block should be placed in the free list and whether it is, in fact, adjacent to another one.

It could happen that a request to freemem would result in one of the underlying blocks obtained from the system (i.e., from malloc) becoming totally free, making it possible to return that block to the system. But this is difficult to detect and not worth the trouble in normal use, so you shouldn't deal with this possibility in your code.

For more information, in addition to Google and Wikipedia, an authoritative discussion is in Sec. 2.5, Dynamic Storage Allocation, in The Art of Computer Programming, Vol. I: Fundamental Algorithms, by Donald Knuth.

Implementation Suggestions

Here are a few ideas that you might find useful. Feel free to use or ignore them as you wish, although you do need to use the 64-bit pointer types correctly.

64-bit Pointers and ints

Your code should work on, and we will evaluate it on, the CSE Linux systems (cancun and the CSE virtual machine). These are 64-bit machines, which means pointers and addresses are 64-bit (8-byte) quantities. Your code will probably work on other 64-bit machines, and, if you're careful, might work on 32-bit machines if it is recompiled, although we won't test that.

One thing that is needed in several places is to treat pointer values as unsigned integers so we can do arithmetic to compute memory block addresses and sizes. We need to be able to cast 64-bit values between integer and pointer types without losing any information. Fortunately the library <inttypes.h> contains a number of types and macros that make the job easier (and fairly portable!). The main type we want to use is uintptr_t, which is a type that is guaranteed to be the right size to hold a pointer value so that we can treat it as an unsigned integer. A pointer value (void* or any other pointer type) can be cast to uintptr_t to create an integer value for arithmetic, and uintptr_t values can be cast to pointers when they hold integers that we want to treat as addresses. (There is also an intptr_t type that is a signed integer type of the right size to hold a pointer, but for our project it would be best to stick with unsigned values.)

You can print pointers and uintptr_t values with printf. Use format %p to print a pointer value, e.g., printf("%p\n", ptr);. For uintptr_t values, since these are stored as long, unsigned integers on our 64-bit systems, they can be printed as decimal numbers using the %lu format specifier: printf("%lu\n",uintvalue);. It turns out that <inttypes.h> defines string macros that make it possible to print values without knowing the actual size of the underlying type. The magic incantation to print an uintptr_t value ui is printf("%" PRIuPTR "\n", ui);. There are other formatting macros to do things like print signed integer pointer values as decimal numbers (PRIdPTR) or in hex (PRIxPTR). See a good C reference for details.

Developing and Testing

As with all projects, you should start (very) small and incrementally build up the final project. Here are some ideas:

Assessment

Your code will be primarily evaluated by an autograder. You are welcome to submit again in order to achieve a better score, but, you will be well served by using your bench tester to evaluate your code on your own first. We are expecting the specified helper functions to be implemented to spec, and details about these functions will be examined along with the overall memory allocation performance. There will be additional manual grading to look at code quality.

Turning In

You will submit this homework to the Gradescope HW7: C Memory Implementation, via Gitlab.

You will first update your Gitlab repository. Use git add and git commit to ensure that your updated t9_tests.c, and any required additional &ldquot;.txt&rdquot; files, are committed. This will be, at minimum, your memory.c.

These should be located in the hw7 folder, located at the top level of your repository. User git push to bring the origin/remote repository up-to-date.

Once you locate the Gitlab assignment you will tap the "GitLab" button on the bottom:

Picture of Gradescope interface highlighting the Gitlab button as opposed to the Upload button
Find your 374 repository in the list, and Submit Project.
Picture of Gradescope submission interface

Once you submit your code the autograder may take some time to run. You may resubmit for a higher grade, but your should always do as much testing as possible on your own platform before resubmitting.