Exercise: big files

In this assignment you’ll increase the maximum size of an xv6 file. Currently xv6 files are limited to 140 sectors, or 71,680 bytes. This limit comes from the fact that an xv6 inode contains 12 “direct” block numbers and one “single indirect” block number, which refers to a block that holds up to 128 more block numbers, for a total of 12+128=140. You’ll change the xv6 file system code to support a “double indirect” block in each inode, containing 128 addresses of single indirect blocks, each of which can contain up to 128 addresses of data blocks. The result will be that a file will be able to consist of up to 16,523 sectors (or about 8 megabytes).

Preliminaries

First, grab the source code:

git clone -b big https://github.com/xiw/xv6.git

or if you have checked out xv6 before:

git pull; git checkout big

Start up xv6, and run big. It creates as big a file as xv6 will let it, and reports the resulting size. It should say 140 sectors for now.

What to look at

The format of an on-disk inode is defined by struct dinode in fs.h. You’re particularly interested in NDIRECT, NINDIRECT, MAXFILE, and the addrs[] element of struct dinode.

The code that finds a file’s data on disk is in bmap() in fs.c. Have a look at it and make sure you understand what it’s doing. bmap() is called both when reading and writing a file. When writing, bmap() allocates new blocks as needed to hold file content, as well as allocating an indirect block if needed to hold block addresses.

bmap() deals with two kinds of block numbers:

  • The bn argument is a “logical block”—a block number relative to the start of the file.
  • The block numbers in ip->addrs[], and the argument to bread(), are disk block numbers.

You can view bmap() as mapping a file’s logical block numbers into disk block numbers.

Your job

Modify bmap() so that it implements a double indirect block, in addition to direct blocks and a single indirect block. You’ll have to have only 11 direct blocks, rather than 12, to make room for your new double indirect block; you’re not allowed to change the size of an on-disk inode. The first 11 elements of ip->addrs[] should be direct blocks; the 12th should be a single indirect block (just like the current one); the 13th should be your new double indirect block.

You don’t have to modify xv6 to handle deletion of files with double indirect blocks.

If all goes well, big will now report that it can write 16,523 sectors. It will take big a few dozen seconds to finish.

If you change the definition of NDIRECT, you’ll probably have to change the size of addrs[] in struct inode in file.h. Make sure that struct inode and struct dinode have the same number of elements in their addrs[] arrays.

You should allocate indirect blocks and double indirect blocks only as needed, like the original bmap().

Hints:

  • Don’t forget to brelse() each block that you bread().
  • If your file system gets into a bad state, perhaps by crashing, you may need to delete fs.img (do this from your host OS, not xv6).

Make sure you understand why adding a double indirect block increases the maximum file size by 16,384 blocks (really 16383, since you have to decrease the number of direct blocks by one).

What would the maximum size of an xv6 file be if you further added a triple indirect block (and removed one direct block)?