To help you know whether or not your code is correct, we've provided you with a bunch of test code that examine the data structures and file content retrieved by your code, and compares them to the known-to-be-correct values. Once your code passes all of our tests, you're done! If your code fails some of our tests, you can look at the tests and the expected values they print out, to help you diagnose your errors.
There is a fair amount of code in the program, but you'll only need to write on the order of 80 new lines of code to finish the implementation. Of course, getting the right 80 lines isn't trivial. But, we've provided detailed step-by-step comments within the code itself telling you what you need to accomplish each step of the way.
There are two high-level phases to this project:
ext2 (the "second extended filesystem") has a data structure and
on-disk layout that is similar to FFS, which we covered in class.
Here's a picture describing ext2's on-disk layout:
At a different level, ext2 splits the disk into an initial boot block that the BIOS of the computer fetches and executes during the boot process, followed by a series of "block groups." A block group is a lot like a cylinder group that we discussed in class: it's a set of adjacent blocks. Blocks within the same block group tend to be spatially close to each other on the disk, and therefore quick to seek between. Blocks in different block groups might be very far apart. In ext2, each block group has a fixed size, selected at the time the filesystem is formatted.
Each block group is split into a set of sections:
If there are N block groups in the file system, there will be N group descriptors entries in the group descriptor table. Since a group descriptor has a known size (32 bytes), you can use the information in the superblock to calculate how many block groups there are, and therefore how many group descriptors there are, and therefore how many blocks the group descriptor table occupies.
Each block group in the file system contains a full copy of the group descriptor table.
The superblock tells you how many inodes there are in a block group, and given this, you can calculate how many blocks the inode table occupies.
For detailed information about what a directory file looks like in ext2, look at figure 13 on this external page.
There are two details you may encounter. First, when a file is deleted from a directory, it's possible that the directory file itself is not repacked to fill the gap, but rather a "hole" gets created in the directory file where the deleted file's directory entry used to live, and the linked list structure will jump over the hole. Second, the linked list structure is terminated with a directory entry containing inode number 0: this is an indicator that this is a terminating entry (kind of like a NULL terminator in a string), rather than a valid file in the directory.
Second, try compiling the program by cd'ing into ext2reader/, and running "make".
Finally, try running the ext2reader program. You'll need to pass an argument that is the name of a file containing a valid ext2 file system image. We've provided such an image for you in /cse451/projects/p3light/451_filesystem.ext2 on forkbomb.cs. So, to run the program, issue the command:
$ ./ext2reader /cse451/projects/p3light/451_filesystem.ext2
Note that the program will print out the results of our test code, and that the test code assumes you're running the program against the 451_filesystem.ext2 image. Because some of the implementation is missing, the program in its current, unfinished state fails catastrophically and quits with a floating point exception. As you get more of the implementation working, you'll get further through its execution, and see more of the tests and their results. Eventually, when you have your implementation fully working, it will print out how many total points your implementation has earned.
The 451_filesystem.ext2 file does contain a mountable file system image; if you were to write the contents of this file into a raw hard drive, you'd be able to mount it from your computer.
Even better, if you have root access on a linux machine, you can actually try this out without having a spare hard drive. Copy the file into that linux machine, and you can then use a loopback device and ask linux to treat the file as though it were a hard drive. Since you have root access to the 451 virtual machine used in project 1, try copying the file into that VM and running the following commands as root:
$ mkdir -p /mnt $ mount -o loop -t ext2 451_filesystem.ext2 $ ls -al /mnt/
(A programming note: our implementation makes some strong assumptions about how the compiler lays out C structs in memory. This isn't always safe to do, but we've verified that our implementation works on Linux and Mac OS X on x86 machines.)
Make sure you understand how and why read_superblock() works. Note that this implementation relies on some specific knowledge of the ext2 layout on disk: the first copy of the superblock is in a known location on disk (byte 1024), and has a known size (sizeof(struct os_superblock_t) bytes). Scan through superblock.h and get a feel for the kind of information that is stored in a real superblock. You'll need to use some of the fields in the superblock structure later in the project.
Read through ext2access.h, and get a sense of the things you'll need to extract from the superblock or calculate.
We've provided step-by-step instructions on what you need to do to finish implementing this function. As well, our test code will rigorously confirm that you've gotten everything right, once you have enough of the function implementation to make it past the floating point exception. Finally, we've provided some hints in calc_metadata in the form of variables that we think you'll want to use along the way.
So, finish the implementation, compile ext2reader, and try re-running and fixing it until our tests have verified that calc_metadata() is correct. For calibration, our implementation of the parts that are missing have about 12 lines of code.
A warning: we will be pretty mechanical in our grading: we'll give you the grades that the test program says you should get, but we'll deduct points if the compiler generates any compiler warnings. (Note there are two compiler warnings produced when you compile the code we've given you; this is because some of the implementation is still missing. You should find that you fix these warnings as a side-effect of working through the project.)
You'll need to write clean code and correct code to get full marks!
You'll first need to allocate a single buffer that's big enough to store all entries in the blockgroup descriptor table, and then read the entire table from disk into the buffer in one fell swoop. Doing this relies on the fact that the on-disk layout of the descriptor table is the same as the in-memory layout of an array of "struct os_blockgroup_descriptor_t" structures.
The "struct os_blockgroup_descriptor_t" structure is defined in ext2reader/inc/blockgroup_descriptor.h.
As a hint, our implementation of this function has about 7 lines of code in total, including some "asserts" that we put in there as sanity checks. As before, compile and run ext2reader until you pass all of our tests for this function.
The hard part in implementing this function is figuring out where the inode with a given inode number actually lives on disk. We've provided a lot of hints on how to figure this out...just follow the steps we've laid out. Here's a potential stumbling block you should avoid falling prey too: the first inode in the filesystem has inode number 1, not inode number 0. (Bleah!)
"struct os_inode_t" is defined in ext2reader/inc/inode.h. You'll want to familiarize yourself with it.
As before, compile and run ext2reader until you pass all of our tests for this function. As calibration, our implementation has about 14 lines of code.
This code is pretty tricky, but it's useful to get a sense for what it's doing and what's involved with getting it right.
We've finished most of the implementation of this function, but have left you one piece: figuring out the appropriate index into the indirection blocks for very large offsets into a big file that require using a triple indirection block (and hence also use a double, and single, indirection block) on the way to the direct block. Look for the comment "FINISH THE CODE IN THIS IF CLAUSE" to see where you need to complete the function.
Getting this last piece right is tricky, but the good news is that if you can't get it right, the rest of the project will work anyway, since the file system image we gave you isn't big enough to require a triple indirection block for any of its files. (Our test code will still test that you handle triple indirection blocks correctly, though!)
As calibration, our implementation of the missing piece has about 6 lines of code.
file_blockread() uses the offsets and indexes calculated by calculated_offsets() to do the actual reading. The only real complication is that ext2 supports the notion of "holes" in files: ranges of a file that logically exists containing zeroes, but which don't have any physical data blocks allocated to store the zeroes.
Read through the implementation to get a sense of how it works, and make sure you understand the interface to this function -- i.e., how to invoke it, what the arguments mean, and what gets returned by the function on success or failure.
As before, compile and run ext2reader until you pass all of our tests for this function. As calibration, our implementation has about 22 lines of code.
/foo/bar/baz
and split it into the following two pieces:
foo and /bar/baz
It allocates space for the top-most component, and returns a pointer to it through the "next_component" argument. It updates the path in place to re-write it to be the remaining components (i.e., converts /foo/bar/baz into /bar/baz in place).
You'll make use of this function in your implementation of scan_dir, so make sure you understand how it works. Reading the test code for this function (in testcode.c's check_pop_dir_component()) is a good way to confirm your understanding is correct.
You'll need to understand some details about the linked list structure of directory entries within a directory file to implement this function correctly. We've provided enough information in part A of this web page, as well as in the comments inside scan_dir() itself.
As before, compile and run ext2reader until you pass all of our tests for this function. As calibration, our implementation has about 15-20 lines of code.
You're almost done now...
If you want a challenge, delete our implementation of path_read(), and figure out how to implement it yourself...this is strictly optional, of course!
Then, create .tar.gz file containing the completed ext2reader/ directory; for example, run the following command to create the .tar.gz file:
$ tar -cvzf ext2reader.tar.gz ./ext2reader/
Use the turnin program on attu.cs to turn in your ext2reader.tar.gz file, under the project name "project3light".
As a reminder, we will be mechanical about our grading for this assignment. We'll go primarily by how may test code points your code accumulates, but we will deduct points for any compiler warnings that are emitted during compilation. (If your code doesn't compile and run, you might not earn any points at all!!)