In class you learned about the concept of virtual memory - where each running program does not reference memory using the actual addresses the processor passes to the memory circuitry, but rather uses "virtual" addresses that are translated into real or "physical" ones before they are used. The CPU and OS do the translation using a set of page tables to map small blocks or pages of virtual addresses to the actual memory locations.
The new homework assignment explores another use of virtual addressing - having virtual addresses refer to data that isn't normally stored in main memory at all, a technique sometimes called memory mapping. In particular, you will use virtual addressing to memory-map small blocks from a large file. That way, you can read and write data to the file by simply doing memory loads and stores, rather than having to wait for the hard disk to do the read and write operations directly.
For example, the freedb-tiny.dat file you'll use
in homework 6 is organized into 64-byte binary records.  Suppose you want
to read every 20th record from this file, modify each record, and then
write them back into the file in a different order from when you read them.  
You could use Linux's read() system call to read one record at a time 
into memory, but then you will have to suffer the overhead of calling
read() once for every single record you want, as well as
lseek() to skip the 19 records following each record you 
want.1
A similar combination of lseek() and write() would be
needed for every single write-back of a record.
Instead, you can just read the whole file into memory with one 
read() call, modify and reorder every 20th record 
in memory, and then when you're about to exit, write back the whole
file with one write() call.  This causes a lot less overhead 
from calls into Linux, saving a measurable amount of time.  The savings 
would be far greater still if the records were far larger, say about 1 MB 
in size.  Then each record would likely be in a different block on disk,
so that writing the modified records in a non-sequential order 
would force the disk to "seek" its plates for the next block to write, 
which takes far more time than simply moving the writer head to the
next block in sequence.  In essence, by using memory mapping to avoid these
repeated, costly disk operations, we are using main memory as a cache for
the hard disk.  Unlike the caches we have seen in class, this cache is managed
by the programmer, but it is a cache nonetheless.
Relevant book reading:
Section 9.8 of your textbook (pages 807-812) briefly discusses memory mapping.
It examines the first motivation for memory mapping - loading programs' 
instructions into memory from a program executable file.  It also talks about
the Linux mmap() system call, which memory-maps files
in a similar way to that described above, but also provides conveniences
like not actually reading in the whole file, but only reading in blocks
of the file when the corresponding memory addresses are read.  
To compile the hw6file program for homework 6, you'll use the 
make command.  Make was designed to compile complex
programs quickly, by only compiling those source code files that changed since
the last compilation.  It uses a file called makefile to guide
the compile and build process.
Before talking about the syntax of a makefile, let's review
how Linux turns a set of C source code (.c) files into an
program you can run with ./command.  
We discussed the compilation process a few weeks back in lecture, 
but to jog your memory, here is a simplified special case:
/a/main.c.  It uses
functions declared in several header files.  One such header file is
/a/program.h , which you wrote earlier, and request
in main.c with #include "program.h" .  Another
is the standard I/O library header, /usr/include/stdio.h ,
requested with #include <stdio.h> .
gcc main.c -o program to create
an executable program /a/program from main.c.
This actually does several things:
	gcc runs the C preprocessor, cpp, which
	interprets and then removes all lines in main.c that
	starts with #.  Your two #include directives
	cause cpp to insert the contents of /a/program.h
	and /usr/include/stdio.h into your program.  These files
	contain the declarations (names, argument, and return types)
	of the functions you use, but not their bodies.
	
	gcc then runs GCC's C compiler cc1 to
	compile main.c, turning it into an assembly language file
	main.s (not kept).  Functions that are declared but not used
	in the pre-processed main.c are quietly dropped; this may
	include functions declared in /usr/include/stdio.h or
	/a/program.h which you didn't use.  Functions that are 
	declared and used in main.c, but whose bodies are not
	provided, are included as textual labels in main.s,
	with special assembler commands to mark them as referring to instructions
	outside main.s.
	
	gcc then runs the GNU assembler as to turn
	main.s into the compiled, machine-language object file
	main.o.  As in main.s, functions used but not
	defined in main.o are marked as "external," to be defined
	by other object code.
	
	gcc runs the Linux linker ld to
	turn main.o into the executable file program.
	Ld does this by linking main.o to
	the C library, stored in the shared object file 
	/lib/libc.so.62.  
	This is done by
	copying the contents of main.o to program,
	and then storing the path to libc.so.6 in program
	so that when the program calls functions that are not stored in program,
	Linux knows to look in /lib/libc.so.6 for the code of these functions.
	By doing this, program can access C library functions like
	printf(), declared in the header file stdio.h,
	without having to include all the code of the C library, wasting space
	on extraneous and/or duplicated functions.  
	
Not all of this process has to be done at once.  In fact, 
when you write makefile files, you will often want 
to break up this process into pieces, because otherwise you will lose
the compilation speedup benefit of make.  For example, you
might split linking from the rest of program building, so that changing
a single source code file doesn't force you to recompile and relink your
whole program.
When you do homework 6, you'll need to extend this base compilation process
by linking in libraries other than the C library libc.so.6 .
For example, suppose your program calls functions from the ALSA Linux sound
library, called libasound.so .  For this, you would add
the option -l asound (conventionally written -lasound)
to your gcc command line.  This tells the linker to search
a set of system-defined directories3, 
including /lib and /usr/lib , 
for any shared object file called libasound.so
or libasound.so.N for some number N, and link that into your program,
so calls on ALSA functions are directed to libasound.so's code.
make for homework 6
As noted above, you compile hw6file by running make:
$ make
cc -g -Wall   -c -o artistCDs.o artistCDs.c
cc -g -Wall   -c -o listArtists.o listArtists.c
cc -g -Wall   -c -o main.o main.c
cc -g -Wall   -c -o printTree.o printTree.c
cc -g -Wall   -c -o RecordStore.o RecordStore.c
cc -g -Wall   -c -o FileFileReader.o FileFileReader.c
gcc artistCDs.o listArtists.o main.o printTree.o RecordStore.o FileFileReader.o -l readline -o hw6file
$ 
Make works by running a series of Linux commands; each command is 
shown before it is executed.
To delete hw6file and other compilation cruft, possibly in 
preparation for starting over, you can say make clean:
$ make clean
rm -f *.o *~ hw6file
$ 
clean is a make target - a subcommand
you can give to make to build only part of your program.
(In this case, you are not actually "building" anything, but the concept
is the same.)  make by itself builds the default target (below).
makefile
Make uses the following file, called makefile,
to guide compilation of hw6file:
CFLAGS=-g -Wall BASEOBJS=artistCDs.o listArtists.o main.o printTree.o RecordStore.o FILEOBJS=$(BASEOBJS) FileFileReader.o file: $(FILEOBJS) gcc $(FILEOBJS) -o hw6file clean: rm -f *.o *~ hw6file
The first three lines define symbols. Those symbols are "expanded" in commands issued later in the file, using the $ operator. For example, the line
FILEOBJS=$(BASEOBJS) FileFileReader.o
is expanded to
FILEOBJS=artistCDs.o listArtists.o main.o printTree.o RecordStore.o FileFileReader.o
because of the previous symbol definition
BASEOBJS=artistCDs.o listArtists.o main.o printTree.o RecordStore.o
In this way, makefile symbol definitions are much like the
C preprocessor's #define directives, which also define names
that get textually expanded wherever they appear later in the source file.
The next non-whitespace line is:
file: $(FILEOBJS)
After symbol expansion, this line says that to build the target
file, make
must first build artistCDs.o, listArtists.o, and so on.  
Make already knows
how to build .o files from .c files, so it looks for .c
files with those names to compile.  In the original makefile,
one .c source file is missing, so make fails.
However, if make were to succeed, it would leave all the object
files in the directory, so that if only one .c file were modified,
only that .o file has to be recompiled.
Following the line beginning file: is the Linux command
to execute to build target file, preceded by a single tab stop
(and only a tab stop, not a series of spaces):
gcc $(FILEOBJS) -o hw6file
(When gcc is given a series of object files as input rather 
than source files, it knows that all you want is to link the object files together,
so it doesn't bother with compiling or assembling.)
In general, each makefile target can be followed by any number
of Linux commands to "build" that target.  Every command line must begin
with a single tab stop.  Not 4 spaces, not 8 spaces, but what you get when
you type [Tab] in a simple, dumb editor (i.e. not Vim or Emacs).
Luckily, when it comes to makefiles, the smartest editors
work just like dumb ones.
clean
Because file
is the first target in the makefile,
it's the default target - the one we said make
tries to build if you just say make
without an explicit target argument.
To build a specific target, you specify its name after make;
so for example, make clean 
builds the clean target,
deleting files created by the build process.
The clean target doesn't have any listed prerequisites 
(nothing following the colon), so it can be "built" at any time, without
having to build any other targets.
Way back in the distant past, I talked briefly about using the debugger GDB to inspect your C programs. That section's notes contain most of what you might need, but here are some topics I neglected to cover then. I didn't get time to cover these topics today either, but here they are for your reference:
I said before that when you start a program under GDB with 
the start command, the debugger sets a breakpoint at the
first line of code of main(), so execution is halted just 
before your code starts running.  You can set your own breakpoints
with the break command.  To set a breakpoint at
the current point of execution, do this:
(gdb) break
To set a breakpoint when you enter the main() function, 
give the name of main() to break:
(gdb) break main
This represents the first source line in the body of main(),
after the prologue.
To set a breakpoint at line 50 of main.c, do this:
(gdb) break 'main.c:50
Note the leading apostrophe before main.c, which is not
matched by another close quote.  You can omit the source file name if
you are executing in the same file as the one you want to break in.
(This same syntax for specifying source lines also works with list.)
If you set a breakpoint in the wrong place, you can delete it. To do this, you need to know the ID number for the breakpoint, which is shown when you set it:
(gdb) break main
Breakpoint 1 at 0x100000f18: file crash.c, line 9.
You can also see all breakpoints with ID numbers with info break:
(gdb) info break
Num Type           Disp Enb Address            What
1   breakpoint     keep y   0x0000000100000f18 in main at crash.c:9
2   breakpoint     keep y   0x0000000100000f21 in main at crash.c:10
Once you have the breakpoint ID, you can delete the breakpoint with delete:
(gdb) delete 1 (gdb) info break Num Type Disp Enb Address What 2 breakpoint keep y 0x0000000100000f21 in main at crash.c:10 (gdb)
You can also temporarily disable breakpoints without having GDB forget about them altogether, then enable them when you want them again:
(gdb) disable 2 (gdb) info break Num Type Disp Enb Address What 2 breakpoint keep n 0x0000000100000f21 in main at crash.c:10 (gdb) enable 2 (gdb) info break Num Type Disp Enb Address What 2 breakpoint keep y 0x0000000100000f21 in main at crash.c:10 (gdb)
Here is a lecture tutorial on GDB from the University of Maryland. UW's CSE 303 also spent some time discussing GDB. If you feel up to it, you could also read the actual GDB manual.
read() 19 times while throwing
all of the results away, but this would cause a lot of wasted memory copies
into your address space.
libc and inserting it into
the executable program, is called static linking.
On desktop Linux, static linking is rarely used for large libraries like the 
C library, but on embedded platforms where dynamic linking is difficult
or impossible, static linking is favored.
-L (big-L, not little-l) 
option to gcc .  In that case, you'll probably also want
to change the path the preprocessor searches for header files (since you'll be
including the headers for the extra libraries you want); this is
the -I (big letter Eye) option to gcc .