The second made it possible to live with the first - we could figure out at assembly time what address an operand was at, say, simply by pretending to load the entire program into memory and seeing where the operand ended up.
In a real system, a program isn't the only thing in memory - the OS is there, plus probably many other programs that are running concurrently. So, real programs must be relocatable - able to run correctly no matter what address range in memory they end up being loaded into.
The problem: How can we make that happen?
Solution overview: A cooperation among the assembler, linker, and
loader, plus some help from the ISA.
We'll use an example in Cebollita. Real systems have more options, and so are more complicated, but the basic ideas are the same.
Here's what happens. (The file names in this image are links...)
Compiling transforms a program written in C to a functionally equivalent assembler program. Names in the C program just become names in the assembler program. (Compare extfunc.c and extfunc.s, for instance.)
Assembling transforms the symbolic instructions into hex. In Cebollita, all names are left unresolved,
however. (Have a look at any of the branch instructions in main.s -- they don't have targets in main.o.)
As well as containing the hex for the instructions, the .o ("object") files contain two important
tables mapping names to addresses. The first, called SymTab
in Cebollita, gives the address
of each symbol defined in the file that produced it (e.g., main.s) - that is, it gives the offset within
that file. (In fact, each file produces two segments: text
contains instructions,
and data
contains data. The value of a symbol is its offset within the segment in which it is
allocated - that's what SymTab
shows.)
The other table, RefTab
, is a list of all the places where symbols were used. Those places
need to be "fixed up" by having appropriate addresses inserted into them sometime before execution
can occur. For example, in main.o the RefTab
says that there is a use of symbol localfunc
at
offset 144 (decimal, and in the text segment, although that information isn't actually displayed). That's
offset 0x90, which in main.o contains
00000090 0x0c000000: jal 0x0
extfunc
will be loaded. (Why not figure that out now? At assembly
time all that is available is main.s. Because of separate compilation/assembly, not to mention libraries, the
assembler doesn't have enough information.)
Next comes the linker. It is provided with all the .o files needed to create the executable. It (a) decides on
an order in which to place the .o's, then (b) using the SymTab
's in each .o, computes new offsets
for each symbol within this aggregation of the .o's, and then (c) using the RefTab
's goes back and
fixes up all the instructions that were waiting to have references resolved.
In a.out, extfunc.o has been loaded ahead of main.o. The instruction that is at offset 0x90 in main.o is now at offset 0xb4. The instruction there, after having been fixed (also called "patched") by the linker is
000000b4 0x0c000009: jal 0x9
localfunc
now lives.
In Cebollita, all instructions are patched when the linker completes. That's because, like with SSI-x, it assumes
the a.out file will be loaded into memory starting at address 0. If that weren't the case (as in the diagram above), some instructions
would still need patching - the jal
above, for instance, couldn't be resolved by the linker, because
the actual address in memory of localfunc
wouldn't be known until load time.
The figure above shows this case.
Real systems must satisfy a wide range of demands, though, and are more complicated. Here's a description of a real .o and .exe file format, Microsoft's Common Object File Format, in case you want to see all the details. (The keyword in Unix systems is "elf"; I haven't found a clear description of that format, though.)
zahorjan@cs.washington.edu