CSE 374, Lecture 16: Make

More tools

So far we've learned about a number of tools for writing C programs. These tools are essential things for developing real software.

gdb
the compiler
valgrind

Today we'll learn about one more: make.

Motivation

Programmers spend a lot of time "building" software. In this case, "building" refers to the process of taking the source code and producing an executable. While our programs so far have been small, in larger programs "building" becomes more complicated.

Let's take a sample program (check the lecture notes for today) composed of three different subfiles. We could compile the program with the following gcc command:

    $ gcc -Wall -std=c11 -g -o talk main.c shout.c speak.c

Every time you want to compile your program, you'll need to run this same command.

If you retype the command every single time, you'll be doing a lot of typing: "SHAME!"
If you use bash's up-arrow or history: "SHAME!" (what happens when you log out?)
If you add an alias or bash script: "smart!"

Additionally, on larger projects, you can't or don't want to have one big (set of) command(s) that recompiles everything every time you change anything. That can take HOURS in a large piece of software!

Dependency tree

When we are running a gcc command to compile our program, gcc actually runs in stages:

1) Each .c file is compiled into a .o file (preprocessor -> compiler -> assembler). The .o file depends on the .c file (obviously) and all included header files. 2) Once all .o files are built, the executable is constructed by the linker which "links" the .o files together.

For our talk/shout/speak example, we can visualize the dependencies in the form of a tree, like this:

                          talk
                           |
            -----------------------------------
           |                     |             |
        shout.o               speak.o        main.o
           |                     |             |
       ---------------------   -------       ----
      |         |           | |       |     |    |
    shout.c  shout.h      speak.h  speak.c  |  main.c
                |            |              |
                 ---------------------------

shout.o depends on shout.c, shout.h, and speak.h
speak.o depends on speak.c and speak.h
main.o depends on main.c, shout.h, and speak.h
talk depends on shout.o, speak.o, and main.o

(Note that we've ignored the standard library header files for cleanliness of the diagram, but those would also be included in reality)

Using this dependency tree, what do we actually need to recompile if speak.c changes? We can look in the tree and see that speak.o depends on speak.c, so it will need to be recompiled, and consequently talk will need to be recompiled, since it depends on speak.o. Therefore we say that these dependencies are "recursive". Notice that changes in header files might cause more significant recompilation: everything will need to be rebuilt if speak.h changes, since it is used by all of the .o files.

How would we know that "speak.c has changed since we last compiled"? We can use the timestamp of speak.c and speak.o! If speak.c is OLDER than speak.o, then we don't need to recompile anything, but if speak.c is NEWER than speak.o, then we recompile. This is also recursive: if we want to build the "talk" executable, then we can look at the ages of its dependencies: shout.o, speak.o, and main.o. But additionally we need to dive into the dependencies of each of those .o files to see if they are newer than any of their dependencies.

Finally, how can we represent this dependency tree in terms a computer can understand? We consider each parent-and-children grouping as a triple:

    TARGET DEPENDENCIES COMMAND

The target is the parent; the dependencies are the children, and the command is a bash command to create the target from the dependencies. This is exactly how the program "make" works.

make + Makefile

We can use the program "make" - which is essentially a scripting environment with dependency analysis - to register our dependency tree and build our program efficiently.

To use make, we structure the dependency tree triples in a particular format and save it in a file called "Makefile".

    talk: shout.o speak.o main.o
            gcc -Wall -std=c11 -g -o talk shout.o speak.o main.o

    shout.o: shout.c shout.h speak.h
            gcc -Wall -std=c11 -g -c shout.c

    speak.o: speak.c speak.h
            gcc -Wall -std=c11 -g -c speak.c

    main.o: main.c shout.h speak.h
            gcc -Wall -std=c11 -g -c main.c

The format is the following:

    TARGET: DEPENDENCIES
            COMMAND

Conventionally, we name this file "Makefile", although it isn't required. You can provide a specific dependency file to make using the "-f" option, but if no "-f" option is provided, make will by default use the file called "Makefile".
The colon after the target name is required.
Lines with commands must start with a TAB and NOT SPACES. This is the opposite of everything we've told you to do so far; sorry about that, but make is weird.

To use the Makefile with the make program, you just run make on the command line and tell it what target to build:

    $ make talk

If you don't provide a target to build, make will pick the first one in the file:

    $ make  # uses "talk" since it is first

When you run make, it takes the target that was specified and processes it:

For each dependency of the target, recursively process it.
If the file timestamp of the dependencies is NEWER than the target's timestamp, then execute the shell command.

Varaibles

We're still doing a lot of typing and copy/pasting in our Makefile. However, we can use variables in Makefiles (like we did in shell scripts). One common way to use variables is to make the compiler and the flags passed to the compiler configurable:

    CC = gcc
    CFLAGS = -Wall

    talk: shout.o speak.o main.o
            $(CC) $(CFLAGS) -o talk shout.o speak.o main.o

    shout.o: shout.c shout.h speak.h
            $(CC) $(CFLAGS) -c shout.c

    speak.o: speak.c speak.h
            $(CC) $(CFLAGS) -c speak.c

    main.o: main.c shout.h speak.h
            $(CC) $(CFLAGS) -c main.c

Why do this?

It's now easy to change things once and affect many commands.
We can change variables on the command-line (overrides definitions in file) (for example "make CFLAGS=-g").
It becomes easy to reuse most of a Makefile on new projects.

It's also common to use variables to hold list of filenames:

    OBJFILES = shout.o speak.o main.o
    talk: $(OBJFILES)
            gcc -o talk $(OBJFILES)

Finally, just as there are special shell script variables, there are special variables in Makefiles:

    $@ - the target
    $^ - use in the command to refer to all of the dependencies of the target
    $< - use in the command to refer to the left-most dependency of the target

Be careful! With variables and special characters, your Makefiles can get really complicated and unreadable, which is bad. Use them judiciously - prefer a readable Makefile to one that is completely unreadable but highly optimized.

Phony targets

While most of the targets in a Makefile are composed of full triples, we can actually define targets that have no dependencies or targets that have no command.

One example of the former is the commonplace "clean" target. The clean target is a convention: it removes any generated files (eg .o files) as well as the complete executable, so that we can "start over" with just the source:

    clean:
            rm *.o talk *~

The "clean" target doesn't have any dependencies - if we run make with this target, it will just run the bash command (make assumes that since it has no dependencies, it must be remade):

    $ make clean

An example of a target with no commands is the conventional "all" target:

    all: talk

The "all" target has no command! Just dependencies. While this isn't very useful for our little toy example here, in complex programs where there is more than one final executable, the "all" target is used to build all executables in a single call to make.

Auto-generating dependencies

So far, we are still listing dependencies manually - we had to analyze the dependency tree and figure out what .o files depend on which source files. This is problematic because if you make a mistake (ie you forgot a header file), you can introduce subtle bugs into your program.

Make can't solve this problem for us: it has no understanding of the actual logic of C dependency trees. All it knows is which targets have which dependencies, and which command to run when refreshing the target. However, different languages and tools have different ways to solve this problem. Check out the "-M" and "-MM" gcc options to have gcc help you determine dependencies. This command is sometimes run as part of a "depend" target in a Makefile. But in any event, auto-generated dependency graphs are beyond the scope of this class.

Summary

make is a tool that combines scripting with dependency analysis to avoid unnecessary recompilation.
make isn't language-specific - it uses file timestamps for dependency analysis and shell commands, but it isn't tied to any particular type of thing.
make files have a way of starting simple and ending up unreadable. Try to keep them clean.
Common conventions like "make clean" and "make all" help developers keep their projects clean and usable.