Project 4 - File Systems
Administrivia: For this project, you will work with the same partners you
had in project 3
Out: Wednesday, March 1
Due: Friday, March 10 at 6:00 pm
Assignment Goals
-
To understand the problems that file system implementations must solve, and the
range of approaches that might be taken
-
To practice design (in this case, of file systems)
Background
Last year, this was a programming assignment, in which you modified the skeleton of a file system, compiled it with the linux 2.4 kernel, and tested it in VMware. Because of the department-wide change from Fedora Core 2 to Fedora Core 4, and the accompanying change from gcc 3 to gcc 4, the 2.4 kernel no longer compiles on department machines. Unfortunately,
the filesystem skeleton was created from 2.4 code and designed to integrate with the 2.4 kernel, and would need to be substantially restructured to work with the 2.6 kernel (which compiles fine on our machines, as you saw in project 1).
Since actual implementation is not an option this quarter, this project is now a pencil-and-paper design exercise, delving in to the details of what needs to be changed. On one hand, this is easier, since you don't need to test it; on the other hand, it's harder, since you can't test it.
Overview
The starting point for this assignment is a simplified file system, cse451fs (in /cse451/projects/cse451fs.tar.gz),
the design of which imposes strict limits on both the number of files that can
be stored and the maximum size of any one file. In particular, no matter
how big a disk you might have, this file system can hold only about 8,000
distinct files, no file can be larger than 13KB, and file names cannot be
longer than 30 characters. These restrictions result from the choice of
on-disk data structures used to find files and the data blocks of a given file,
that is, the superblock and inode representations.
Your assignment is to design modifications to the file system
that would achieve the following:
- Increase the maximum size of files
- Allow for longer file names
Design how you would implement these file system modifications: how
you will represent your directories on disk, how file data is indexed,
etc. There can still be a limit on any of these properties, but your
improvement needs to be more than simply altering a program constant.
You must support a maximum file size of at least 256KB,
which can be achieved with a single-indirect strategy as discussed in lecture.
Once you have accomplished these two tasks, you should tackle a third task:
- Increase the maximum size of files further
In particular, design a triple-indirect approach, as discussed in lecture,
and describe its implementation.
You're not actually implementing anything, but you still need to be intimately familiar with the code. You ought to identify all of the places that the code would need to actually change if you were to implement your modifications.
What we want
At the highest level, what we want is clear evidence that you understand file systems in general and the cse451fs filesystem in particular. Concretely, we want a discussion of how you'd accomplish
the extensions we're asking for. For each extension, you should say at a high level what needs to change and why, then go in to the specifics of what needs to change -- where, in which source file.
You are producing a detailed design document. It's prose, not code.
(We will evaluate the clarity of your reasoning and of your
presentation, as well as the quality of your solution.)
It is fine to include code segments in your document -- you can
include an original function for context if needed, and add
clear pseudocode for what needs to happen in what order;
you can add lines of code like what you would put in if actually
implementing the change; but include clear comments and a design
description. We need to be able to read your document and
assess correctness, without doing hand-simulation!
It is not acceptable to simply write pseudocode
in the form of "loose-syntax C".
Note that unlike "design documents" that you may have
done in other projects and other courses, we don't just want the top few issues -- we want all of the issues.
Every time you'd change something of any significance
in the code in a real
implementation, we want to hear about it in the writeup.
Details
- The filesystem is in /cse451/projects/cse451fs.tar.gz on forkbomb and is also here. A description of it is
here (necessary reading!).
-
A description of the ext2 file system and vfs (Virtual File
System) is
here (strongly recommended).
-
While real file systems are very concerned with performance, in your
design you can largely ignore it. That is, do not
worry much about how fast it would be if you implemented it.
-
A description of how dynamically loaded modules are handled in Linux is
here (not required reading but may answer odd questions that
arise).
Hints/Starting Points
-
Large Files:
-
An important function for creating and accessing the blocks of a file (that you
will almost certainly need to modify) is get_block() in super.c.
-
Look at cse451_truncate() in file.c in order to handle file
truncations (e.g., deleting a file uses this
-
Long File Names:
-
All the functions that you will need to modify for your kernel module are in dir.c.
-
Start with cse451_add_entry() and cse451_readdir() to create
and read directory entries with long file names.
-
-
General:
-
Be very aware of the "ripples" created by changing one place in your file system which affect other elements of the fs. This is where the inability to test will be unpleasant - there won't be errors to tell you what you've missed.
Writeup file
Please turn in a file called writeup that includes the design document described above, and answers to the following:
-
What other approaches did you consider and reject in your design? (This should be no more than a paragraph or maybe two.)
-
What concurrency-related issues does a file system have to deal with?
You probably didn't deal with any of it directly when implementing your
extensions, but what did you notice when looking at the rest of the code?
-
What would you have to do to make it efficient, or where are the main efficiency hurdles?
Aim to fit this report in something like 5 pages, not more than 10. Use whatever format you want, so long as it's searchable. So text, HTML, Word are all fine, but pdf or photos of whiteboard scribbles, not so much. Whatever it is, hand it in using turnin to project4