As Dr. Zahorjan just barely touched on yesterday, the memory address space of a single process consists of several regions. From lowest addresses to highest, these are:
static
locals)
new
'd objects) - starts just above
statically allocated data, grows to higher addresses
Here is a vaguely pictorial view of the memory layout:
0x0000_0000 |--------------| | Instructions | | |--------------| | | Literals | | |--------------| | | Globals | | |--------------| | | | | | Heap | | | | | |--------------| | | | | | | | | | | | | | | V | | | | | | | | | | | | ^ | | | | | | | | | | | | | | |--------------| | | | | | Stack | V | | | | 0xFFFF_FFFF |--------------|
Dr. Zahorjan said yesterday that the stack contains local variables for each function call. I asked what was significant about local variables. We established that every procedure call must have separate space in memory for each local variable, because
I noted that this was fairly elementary stuff, what you learn in the first few weeks of CSE 142, but that you might not have thought of it explicitly before.
Building on this, we identified several data that must be kept for each call of a C function or Java method:
x+1
, are naturally
private to the procedure call that computes them.
All these data form the activation record, or AR, for the procedure call. Every active procedure call must have its own AR somewhere in memory.
In C and Java, each program (actually each thread) has a chain of active procedure calls, only one of which is actually executing code at the moment - the others are suspended waiting for a callee to return. The procedures must logically return in the opposite order from which they were called.
I said that this is naturally modeled by a stack, where the running procedure's
AR is on the top, its caller's AR is next from the top, and so on.
The "stack" in memory is intended to replicate this logical structure -
each procedure's AR is represented by a stack frame, with the
running procedure call's AR at the top of the stack. Though stack frames implement
ARs, they are not the same - as we'll see in the factorial
example, an x86 procedure's argument values are in its caller's stack frame,
even though they belong to the AR of the callee.
We maintain two pointers into the stack: the stack pointer or SP, which points to the last memory word pushed on the stack (that is, just below the top of the stack), and the frame pointer or FP, which points to the first word pushed in the current call's stack frame. The FP and SP bound the running procedure's stack frame - by using addresses relative to the FP or SP, each procedure's code can address any data on the stack frame without having to encode absolute, constant addresses.
When we call a function, we "push" a new frame onto the stack by subtracting from SP, saving FP onto the stack, and then updating FP to point to the bottom word of the new frame (just above the old SP, which pointed to the top word of the caller frame). Returning from a function requires us to "pop" the callee's frame by adding to SP, and then restoring the FP to the saved value stored on the stack.
Not all of a procedure call's AR is on the stack. As you saw with the Z86, processors have a small set of registers, very fast but very small memories. Many data must be stored in registers or fixed memory locations, including the program counter (PC), the condition codes, and the FP and SP. In addition, several processors, including the Z86, only allow ordinary computation to be done on registers; while the full x86 partially supports computation in memory, register-register operations are also common.
On most processors, all procedure calls share the same register set. Hence, the register values used by a procedure call are also part of the AR; when a procedure call is made, registers shared by the caller and callee must be saved onto the stack and restored back when the callee returns. The caller and callee divide this responsibility by adopting a split-register-set convention:
When one procedure calls another, both caller and callee must agree on several things - where the callee's arguments and return value are placed, the ways the caller and callee are allowed to use each other's stack frames, which registers are caller-saved and callee-saved, and so on. While the processor provides some support for these matters, in general they must be decided as a matter of programming convention. This calling convention must be agreed between the processor architects, the OS authors, the compiler authors, and the authors of previously compiled libraries. If there is disagreement, your program might fail to work because one designer assumed something was safe while another designer decided that such an action would be wrong or unsafe.
The x86 provides several instructions that are specialized to
procedure call and stack manipulation, but even so there are many
calling conventions on the x86. The one below
is the cdecl
convention used by gcc on Linux.
On the x86, the stack pointer and frame pointer are stored in registers.
The SP register is called %esp
(extended stack pointer),
while the FP register is called %ebp
(extended base pointer).
The "extended" comes from the fact that the x86 used to have a 16-bit
address space, and was extended to have a 32-bit one.
Both %esp
and %ebp
are callee-saved - because the
callee is always going to overwrite them, making them callee-saved reduces the
code size by not having to generate code to restore them at every call.
However, you don't need to explicitly save %esp
;
as shown below, the stack frame is laid out so that the top of the
previous stack frame is always a fixed offset from %ebp
,
so you can compute the old %esp
instead of storing it.
The 6 other x86 integer registers also have special names, rather than numbers. 3 of them are caller-saved:
%eax
%edx
%ecx
The other 3 are callee-saved:
%ebx
%esi
%edi
All of these are used for computation. %eax
is also used to store
the return value when a procedure returns; the other registers have special
purposes only in some relatively rare cases (such as multiplication and
division instructions).
On the x86, each call's argument values and return address are stored on the stack. A callee's arguments are stored at the top of its caller's stack frame, with the first argument at the top, the second argument just below, and so on.
Even though it too is supplied by the caller, the return address for a call is actually stored at the bottom of the callee's frame, just above the first argument to the callee. This odd split has mostly to do with the behavior of the special x86 instructions for procedure call. The second-from-bottom word of a callee's stack frame is generally the saved FP of its caller.
This word description gives rise to the following diagram of the stack frame layout, as used by the GCC C compiler on x86 Linux [1]:
direction |---------------------| ^ of stack | Space for arguments | <== %esp (SP) | growth | to called routines | | ^ |---------------------| | | | Local variables, | | | | temporary values | | | |---------------------| | Current | | Other callee-saved | | (callee's) | | registers | | stack | |---------------------| | frame | | Saved %ebp (FP) | <== %ebp (FP) | | |---------------------| | | | Return address | | | |---------------------|---------------------------- | | Argument 1 passed | | | | to callee | | | |---------------------| | | | Argument 2 passed | | | | to callee | | | |---------------------| | | | More arguments | | | | . | | Previous | | . | | (caller's) | | . | | frame | | . | | | | . | | | | Argument N | | | |---------------------| | | | Caller-saved | | | | registers | | | |---------------------| | | | Rest of stack | | | | | | | | | | | | | | | | | | | V | | | |---------------------| V
Some things to note about this stack layout:
cdecl
calling convention uses the above order, others use the opposite order.
In brief, this is how gcc on Linux for x86 implements procedure call:
CALL
instruction, the caller pushes the return
address on the stack and then jumps to the callee.
%ebp
onto the stack.
%ebp
to point to
where the old FP is located, establishing the bottom of the stack frame.
%eax
.
LEAVE
, the callee pops off its stack frame up to
and including the saved FP, while also restoring the saved FP into
%ebp
.
RET
, the callee pops the return address off
the stack and jumps back to it.
%eax
to memory or another register.
We will see an example of most of these steps in the factorial example below.
To study the calling sequence more closely, we'll use the following recursive factorial routine, which defines the factorial of all non-positive integers to be 1:
### Compiled from factorial_test.c , using ### gcc -Wall -Werror -Wextra -S factorial_test.c ### This hand-annotated version of the resulting assembly code comes from ### factorial_test.annotated.s . ### ### int factorial (int n); ### n ==> 8(%ebp) - (%ebp) is old %ebp, 4(%ebp) is return address, so ### 8(%ebp) is first and only argument ### Return value ==> %eax (the register) ### Temp. value t1 ==> %eax ### Temp. value t2 ==> -12(%ebp) (location on stack) ### N.B. %eax is overloaded as both return value and temp. value factorial: ## [prologue] pushl %ebp # Push old frame pointer %ebp on stack movl %esp, %ebp # %ebp := %esp (currently pointing to old %ebp) subl $40, %esp # Allocate 40-byte stack frame (no push/pop for us) ## if (n <= 0) { ## return (1); ## } cmpl $0, 8(%ebp) # Compute n-0 (== n) - don't store result, # but set condition codes jg fact_positive # if (n > 0) goto fact_positive movl $1, %eax # result := 1 jmp fact_end # goto fact_end fact_positive: ## else { ## int recursive_result = factorial(n-1); ## return (n * recursive_result); ## } movl 8(%ebp), %eax # t1 (%eax) := n subl $1, %eax # t1 -= 1 movl %eax, (%esp) # copy t1 to last word in our stack frame call factorial # push return addr on stack, goto recursive call movl %eax, -12(%ebp) # t2 (-12(%ebp)) := recursive call's result movl 8(%ebp), %eax # result := n imull -12(%ebp), %eax # result *= t2 (truncate upper word of product) fact_end: ## [epilogue] leave # pop stack frame (%esp := %ebp), pop and restore old %ebp ret # pop return address, return there (to caller) .size factorial, .-factorial
Let's consider the sequence of instructions executed when factorial(n)
makes a recursive call on factorial(n-1)
.
The call itself is implemented in the two-instruction sequence in
the fact_positive
code block:
movl %eax, (%esp) # copy t1 to last word in our stack frame call factorial # push return addr on stack, goto recursive call
Previously, we computed the argument to the recursive call, n-1
,
and stored it into register %eax
. Now, we push this argument
onto the stack right at the top of our stack frame. (We don't need to
decrease %esp
to push the value, because we have already
allocated a stack frame big enough to store it.) Ordinarily, we'd also
store the values of the caller-saved registers as well, %eax
,
%edx
, and %ecx
, but this is not necessary in this case -
we don't use %edx
or %ecx
, and we never need to use
the current value of %eax
again (notice that the source code
annotation doesn't use n-1
again).
With the recursive call argument loaded onto the stack, we use CALL
to make the recursive call. CALL
does three things:
%esp
by 1 word length (4 bytes), pushing space
for an additional word on the stack.
CALL
) onto the top of the stack frame, which is the same
word we just allocated.
factorial
)
we just left.
Before it can do any work, the factorial
procedure needs
to set up its stack frame. As discussed above, the first step in doing this
is to push the old frame pointer on the stack, so we can safely update the
frame pointer to point to the new frame. This is done with the instruction
pushl %ebp # Push old frame pointer %ebp on stack
Like CALL
, the PUSHL
instruction does multiple things:
%esp
by 1 word length.
%ebp
)
into the newly pushed stack word.
After the old FP is saved, the SP now points to the old FP. We want to establish that point as the bottom of our stack frame, so we set FP equal to SP:
movl %esp, %ebp # %ebp := %esp (currently pointing to old %ebp)
Unlike the Z86's four types of move operation, the x86 has only one move instruction that accepts various types of register, memory, and immediate operands.
Finally, we push space for the rest of the stack frame onto the stack, by decreasing the SP by the frame size:
subl $40, %esp # Allocate 40-byte stack frame (no push/pop for us)
Alternatively, we can just use PUSHL
to allocate stack space
when we need it, and its inverse instruction POPL
to free up
unused stack space. This is not the approach taken by gcc, however.
Once factorial
has finished its work and written its result
into the %eax
return value register, it is ready to pop off
its stack frame and return to its caller. This is done with a pair of
instructions. The first is
leave # pop stack frame (%esp := %ebp), pop and restore old %ebp
LEAVE
is a complex instruction, perhaps more so than even CALL
.
It does three things:
%esp
equal to %ebp
, so that only the saved
%ebp
and the return address remain.
0(%esp)
,
which is the saved %ebp
, and stores it into the register %ebp
.
%ebp
off the stack, by increasing %esp
by a word length.
Once LEAVE
is done, the only word in the stack frame is the return address;
%esp
contains the address of the return address in memory.
The last instruction in factorial
is
ret # pop return address, return there (to caller)
RET
is, roughly, the opposite of CALL
:
%esp
by a word.
Now that we have returned from the recursive call factorial(n-1)
,
our call factorial(n)
needs to restore the register set and stack
frame before resuming execution.
Because factorial(n)
doesn't save any caller-saved registers
before recursively calling itself, it doesn't need to restore those
registers' values. It does need to save the recursive result somewhere,
as it will overwrite the %eax
register where the recursive result
is currently stored. For this, it
simply flushes %eax
into the local variable section
of its stack frame:
movl %eax, -12(%ebp) # t2 (-12(%ebp)) := recursive call's result
References: