Project 4 - File Systems
Administrivia: For this project, you will work with the same partners you
had in project 3.
Out: Friday, May 20
Due: Friday, June 3, 11:59pm
Assignment Goals
-
To understand the problems that file system implementations must solve, and the
range of approaches that might be taken
-
To practice design (in this case of file systems)
-
To experience working in a "more sophisticated" environment: complex software,
a variety of tools, and teams of programmers
-
To gain further experience with concurrent software
Overview
The starting point for this assignment is a simplified file system, cse451fs,
the design of which imposes strict limits on both the number of files that can
be stored and the maximum size of any one file. In particular, no matter
how big a disk you might have, this file system can hold only about 8,000
distinct files, no file can be larger than 13KB, and file names cannot be
longer than 30 characters. These restrictions result from the choice of
on-disk data structures used to find files and the data blocks of a given file,
that is, the superblock and inode representations.
Here are the major steps involved in this assignment:
-
Make sure that you, individually, understand the mechanical aspects of the
"development environment" you'll be working in. These include how to
build the file system code, how to configure a ramdisk device
to host your file system, and how to run and test your file
system. A description of these mechanical aspects can be found
here.
-
Of the three limitations cited above you will need to improve the following
two:
- Increase the maximum size of files
- Allow for longer file names
Design how you want to implement these file system modifications on disk: how
you will represent your directories on disk, how file data is indexed,
etc. There can still be a limit on any of these properties, but your
improvement needs to be more than simply altering a program constant.
For full credit, you must support a maximum file size of at least 268KB,
which can be achieved with a single-indirect strategy as discussed in lecture.
If you encounter any inexplicable behavior while testing your file system, keep in
mind that it might be caused by a limitation of the testing program and not your
file system. For example ls only supports 256 character file names, so if
you create a longer name with your file system, ls will not show it
properly. In general, a good sanity check is to make sure your tests work on
regular file systems on spinlock/coredump before running them on your cse451fs.
If you still have time left over, you may do
additional improvements for up to 10% extra
credit on this project. Some ideas:
- Implement double and/or triple indirect to further increase the max file size
- Make the max file size arbitrarily large
- Increase the number of distinct files that the file system can handle
- Make the file system more efficient in terms of space usage or in terms
of CPU usage.
-
Alter the skeleton code (/cse451/projects/cse451fs.tar.gz) to
implement your file system. There are two major components to this.
One is that the user level program mkfs.cse451fs must be changed to
initialize the raw disk device with a valid, empty file system using your new
on-disk data structures. The other is to change the file system source (fsSource/)
itself.
Details
-
While real file systems are very concerned with performance, in your
implementation you can largely ignore it. That is, do not
spend a great deal of effort to produce a faster implementation. (For one
thing, because we're running on virtual machines on top of Windows, it's
unlikely you'd be able to measure much difference.)
-
A description of the skeleton version of the cse451fs file system is
here. (necessary reading)
-
A description of the ext2 file system and vfs (Virtual File
System) is
here. (strongly recommended)
-
A description of how dynamically loaded modules are handled in Linux is
here. (not required reading but may answer odd questions that
arise)
Hints/Starting Points
-
Large Files:
-
An important function for creating and accessing the blocks of a file (that you
will almost certainly need to modify) is get_block() in super.c.
-
Look at cse451_truncate() in file.c in order to handle file
truncations (e.g., deleting a file uses this). You don't need to worry
about this until you are sure that you can create and access your files.
-
Long File Names:
-
All the functions that you will need to modify for your kernel module are in dir.c.
-
Start with cse451_add_entry() and cse451_readdir() to create
and read directory entries with long file names. If listing with ls
doesn't appear correct, check how the directory structure looks on disk (using hexdump
or xxd).
-
General:
-
If you modify any of the data structures in cse451fs.h, you will
probably need to modify mkfs.cse451fs in addition to the kernel module
sources. Otherwise you probably can leave mkfs alone.
-
You might find the following UNIX utilities useful in your testing: dd, df, du, diff, head/tail.
Writeup file
Please turn in a file called writeup.txt or writeup.htm that addresses the following:
-
Describe the design for your file system modifications. Include enough details so that we
can understand your code based on this description. This part might also
include a discussion of other approaches you considered but rejected.
-
What concurrency-related issues does a file system have to deal with?
You probably didn't deal with any of it directly when implementing your
extentions, but what did you notice when looking at the rest of the code?
-
What methodology did you follow in order to test your file system (for
functionality)?
-
Does your implementation work? If not, what parts work and what parts don't?
How would you fix it if you had more time?
-
What do you like best about your design? What do you like least about
it? How would you improve your design?
- If you did any extra credit, describe what you did. Make sure to describe how
you implemented it, and how it solves the problem you set out to fix.
Aim to fit this report into about 2 pages or less.
The file you submit may be plain (ASCII) text or HTML, and should
contain the names of all the people who worked on the project
as well as your group name.
Turnin
You will be turning in 6 files:
- a tar.gz file containing all of your modified sources
- Your compiled mkfs.cse451fs program
- Your compiled fs module, cse451fs.o
- Your ramdisk module module, rd.o, that you used for testing
- An image file, bzImage, that can be used to boot a kernel supporting your FS with VMware.
- A single write-up file, writeup.txt
To create the source archive file, do a "make dist" from your top level project directory.
Use the turnin(1L) program under project name project4 by midnight
on the day it is due. Note: turnin will not work on coredump/spinlock,
so you'll need to use attu.