gdb
commands (set and use breakpoints, print register values, etc.).This assignment involves applying a series of buffer overflow attacks on an executable file called bufbomb
(for some reason, the textbook authors have a penchant for pyrotechnics).
You will gain firsthand experience with one of the methods commonly used to exploit security weaknesses in operating systems and network servers. Our purpose is to help you learn about the runtime operation of programs and to understand the nature of this form of security weakness so that you can avoid it when you write system code.
wget https://courses.cs.washington.edu/courses/cse351/23wi/files/labs/lab3.tar.gz
Running tar xvf lab3.tar.gz
will extract the lab files to a directory called lab3
with the following files:
bufbomb
- The executable you will attackbufbomb.c
- The C code used to compile bufbomb
(You don't need to compile it)lab3synthesis.txt
- For your synthesis question responsesMakefile
- For testing your exploits prior to submissionmakecookie
- Generates a "cookie" based on some string (which will be your username)sendstring
- A utility to help convert between string formatsLinux (and UNIX machines in general) use a different line ending from Windows and traditional MacOS in text files. The reason for this difference is historical: early printers need more time to move the print head back to the beginning of the next line than to print a single character, so someone introduced the idea of separate line feed \n
and carriage return \r
characters.
\r\n
pairs;\n
.In this lab, it is important that your lines end with line feed (\n
), not any of the alternative line endings. If you are working on the VM or attu or even another Linux (or Unix-like) system this will probably not be a problem, but if you working across systems, check your line endings. You can also use the Unix tool dos2unix
to convert the line endings from Windows to Unix line endings.
A cookie is a string of eight bytes (or 16 hexadecimal digits) that is (with high probability) unique to you. You can generate your cookie with the makecookie
program giving your UWNetID as the argument (note, you must use your UWNetID, CSE students should NOT use their CSEID — for some people these two IDs are different):
$ ./makecookie your_UWNetID
0x5e57e63274f39587
As an example, if your UW email address is thecookiemonster42@uw.edu
, you would run ./makecookie thecookiemonster42
.
While you are doing this, you might as well prepare the first file you need to test your code: UW_ID.txt
$ echo your_UWNetID > UW_ID.txt
Again, if your UW email address is thecookiemonster42@uw.edu
, you would run echo thecookiemonster42 > UW_ID.txt
. This will generate a text file containing your UWNetID followed by a single new line. You could also use a text editor, but you have to be careful about line endings, so the text editor approach is discouraged. Using the echo
command mentioned above ensures that your file format is consistent with what the autograder expects.
In most of the attacks in this lab, your objective will be to make your cookie show up in places where it ordinarily would not.
bufbomb
programThe bufbomb
program reads a string from standard input with the function getbuf()
:
unsigned long long getbuf() {
char buf[36];
volatile char* variable_length;
int i;
unsigned long long val = (unsigned long long)Gets(buf);
variable_length = alloca((val % 40) < 36 ? 36 : val % 40);
for(i = 0; i < 36; i++) {
variable_length[i] = buf[i];
}
return val % 40;
}
Don't worry about what's going on with variable_length
and val
and alloca
for now; all you need to know is that getbuf()
calls the function Gets
and returns some arbitrary value.
The function Gets
is similar to the standard C library function gets
—it reads a string from standard input (terminated by '\n
') and stores it (along with a null terminator) at the specified destination. In the above code, the destination is an array buf
having sufficient space for 36 characters.
Neither Gets
nor gets
have any way to determine whether there is enough space at the destination to store the entire string. Instead, they simply copy the entire string, possibly overrunning the bounds of the storage allocated at the destination.
If the string typed by the user to getbuf
is no more than 36 characters long, it is clear that getbuf
will return some value less than 0x28, as shown by the following execution example:
$ ./bufbomb
Type string: howdy doody
Dud: getbuf returned 0x20
It's possible that the value returned might differ for you, since the returned value is derived from the location on the stack that Gets
is writing to. The returned value will also be different depending on whether you run the bomb inside gdb or run it outside of gdb for the same reason.
Typically an error occurs if we type a longer string:
$ ./bufbomb
Type string: This string is too long and it starts overwriting things.
Ouch!: You caused a segmentation fault!
As the error message indicates, overrunning the buffer typically causes the program state (e.g., the return addresses and other data structures that were stored on the stack) to be corrupted, leading to a memory access error. Your task is to be more clever with the strings you feed bufbomb
so that it does more interesting things. These are called exploit strings.
bufbomb
must be run with the -u your_UWNetID
flag, which operates the bomb for the indicated UWNetID. (We will feed bufbomb your UWNetID with the -u
flag when grading your solutions.)bufbomb
determines the cookie you will be using based on this flag value, just as the program makecookie
does. Some of the key stack addresses you will need to use depend on your cookie.
Your exploit strings will typically contain byte values that do not correspond to the ASCII values for printing characters. The program sendstring
will help you generate these raw strings. sendstring
takes as input a hex-formatted string and prints the raw string to standard output. In a hex-formatted string, each byte value is represented by two hex digits. Byte values are separated by spaces. For example, the string "012345"
could be entered in hex format as 30 31 32 33 34 35
. (The ASCII code for decimal digit 2
is 0x32
. Run man ascii
for a full table.) Non-hex digit characters are ignored, including the blanks in the example shown.
If you generate a hex-formatted exploit string in a file named exploit.txt
, you can send it to bufbomb
through a couple of pipes (see CSE391 slides on piping and i/o redirection if you are unfamiliar with Unix pipes that take the output of one program and direct it as input to another program):
$ cat exploit.txt | ./sendstring | ./bufbomb -u your_UWNetID
Or you can store the raw bytes in a file and use I/O redirection to supply it to bufbomb
:
$ ./sendstring < exploit.txt > exploit.bytes
$ ./bufbomb -u your_UWNetID < exploit.bytes
With the above method, when running bufbomb
from within gdb
, you can pass in the exploit string as follows:
$ gdb ./bufbomb
(gdb) run -u your_UWNetID < exploit.bytes
You can also test all your exploits by running make test
. See Submission for more instructions.
One important point: your exploit string must not contain byte value0x0A
at any intermediate position, since this is the ASCII code for newline ('\n
'). When Gets
encounters this byte, it will assume you intended to terminate the string. sendstring
will warn you if it encounters this byte value.
When using gdb
, you may find it useful to save a series of gdb
commands to a text file and then use the -x commands.txt
flag. This saves you the trouble of retyping the commands every time you run gdb
. You can read more about the -x
flag in gdb
's man
page.
(You won't write assembly code for Level 0 and 1. You may wish to come back and read this section later after finishing these levels.)
Using gcc
as an assembler and objdump
as a disassembler makes it convenient to generate the byte codes for instruction sequences. For example, suppose we write a file example.s
containing the following assembly code:
# Example of hand-generated assembly code
movq $0x1234abcd,%rax # Move 0x1234abcd to %rax
pushq $0x401080 # Push 0x401080 onto the stack
retq # Return
The code can contain a mixture of instructions and data. Anything to the right of a '#
' character is a comment.
We can now assemble and disassemble this file:
$ gcc -c example.s
$ objdump -d example.o > example.d
The generated file example.d
contains the following lines:
0: 48 c7 c0 cd ab 34 12 mov $0x1234abcd,%rax
7: 68 80 10 40 00 pushq $0x401080
c: c3 retq
Each line shows a single instruction. The number on the left indicates the starting address (starting with 0), while the hex digits after the ':
' character indicate the byte codes for the instruction. Thus, we can see that the instruction pushq $0x401080
has a hex-formatted byte code of 68 80 10 40 00
.
If we read off the 4 bytes starting at address 8 we get 80 10 40 00
. This is a byte-reversed version of the data word 0x00401080
. This byte reversal represents the proper way to supply the bytes as a string, since a little-endian machine lists the least significant byte first.
Finally, we can read off the byte sequence for our code (omitting the final 0
's) as:
48 c7 c0 cd ab 34 12 68 80 10 40 00 c3
There are three functions that you must exploit for this lab. The exploits increase in difficulty. For those of you looking for a challenge, there is a fourth function you can exploit for extra credit.
The function getbuf
is called within bufbomb
by a function test
:
void test()
{
volatile unsigned long long val;
volatile unsigned long long local = 0xdeadbeef;
char* variable_length;
entry_check(3); /* Make sure entered this function properly */
val = getbuf();
if (val <= 40) {
variable_length = alloca(val);
}
entry_check(3);
/* Check for corrupted stack */
if (local != 0xdeadbeef) {
printf("Sabotaged!: the stack has been corrupted\n");
}
else if (val == cookie) {
printf("Boom!: getbuf returned 0x%llx\n", val);
if (local != 0xdeadbeef) {
printf("Sabotaged!: the stack has been corrupted\n");
}
validate(3);
}
else {
printf("Dud: getbuf returned 0x%llx\n", val);
}
}
When getbuf
executes its return statement, the program ordinarily resumes execution within function test
. Within the file bufbomb
, there is a function smoke
:
void smoke()
{
entry_check(0); /* Make sure entered this function properly */
printf("Smoke!: You called smoke()\n");
validate(0);
exit(0);
}
Your task is to get bufbomb
to execute the code for smoke
when getbuf
executes its return statement, rather than returning to test
. You can do this by supplying an exploit string that overwrites the stored return address in the stack frame for getbuf
with the address of the first instruction in smoke
. Note that your exploit string may also corrupt other parts of the stack state, but this will not cause a problem, because smoke
causes the program to exit directly.
When supplied with the correct exploit string, you should see the following output:
Smoke!: You called smoke()
All the information you need to devise your exploit string for this level can be determined by examining a disassembled version of bufbomb
.
Be careful about byte ordering.
You might want to use gdb
to step the program through the last few instructions of getbuf
to make sure it is doing the right thing. You can also print out the data in the stack to see the change.
The placement of buf
within the stack frame for getbuf
depends on which version of gcc
was used to compile bufbomb
. You will need to pad the beginning of your exploit string with the proper number of bytes to overwrite the return pointer. The values of these bytes can be arbitrary.
Check the line endings in your smoke.txt with od -c smoke.txt
or hexdump -C smoke.txt
.
Within the file bufbomb
there is also a function fizz
:
void fizz(int arg1, char arg2, long arg3,
char* arg4, short arg5, short arg6, unsigned long long val)
{
entry_check(1); /* Make sure entered this function properly */
if (val == cookie)
{
printf("Fizz!: You called fizz(0x%llx)\n", val);
validate(1);
}
else
{
printf("Misfire: You called fizz(0x%llx)\n", val);
}
exit(0);
}
Similar to Level 0, your task is to get bufbomb
to execute the code for fizz()
rather than returning to test
. In this case, however, you must make it appear to fizz
as if you have passed your cookie as its argument. You can do this by encoding your cookie in the appropriate place within your exploit string.
When supplied with the correct exploit string, you should see the following output:
Fizz!: You called fizz(<your cookie value>)
Note that in x86-64, the first six arguments are passed into registers and additional arguments are passed through the stack. Your exploit code needs to write to the appropriate place within the stack.
You can use gdb
to get the information you need to construct your exploit string. Set a breakpoint within getbuf
and run to this breakpoint. Determine parameters such as the address of the buffer buf
.
A much more sophisticated form of buffer attack involves supplying a string that encodes actual machine instructions. The exploit string then overwrites the return pointer with the starting address of these instructions. When the calling function (in this case getbuf
) executes its ret
instruction, the program will start executing the instructions on the stack rather than returning. With this form of attack, you can get the program to do almost anything. The code you place on the stack is called the exploit code. This style of attack is tricky, though, because you must get machine code onto the stack and set the return pointer to the start of this code.
Within the file bufbomb
there is a function bang
:
unsigned long long global_value = 0;
void bang(unsigned long long val)
{
entry_check(2); /* Make sure entered this function properly */
if (global_value == cookie)
{
printf("Bang!: You set global_value to 0x%llx\n", global_value);
validate(2);
}
else
{
printf("Misfire: global_value = 0x%llx\n", global_value);
}
exit(0);
}
Similar to Levels 0 and 1, your task is to get bufbomb
to execute the code for bang
rather than returning to test
. Before this, however, you must set global variable global_value
to your cookie. Your exploit code should set global_value
, push the address of bang
on the stack, and then execute a retq
instruction to cause a jump to the code for bang
.
When supplied with the correct exploit string, you should see the following output:
Bang!: You set global_value to <your cookie value>
You can use gdb
to get the information you need to construct your exploit string. Set a breakpoint within getbuf
and run to this breakpoint. Determine parameters such as the address of global_value
and the address of the buffer buf
.
Determining the byte encoding of instruction sequences by hand is tedious and prone to errors. You can let tools do all of the work by writing an assembly code file containing the instructions and data you want to put on the stack. Assemble this file with gcc
and disassemble it with objdump
. You should be able to get the exact byte sequence that you will type at the prompt. (A brief example of how to do this is included in the Generating Byte Codes section above.)
Keep in mind that your exploit string depends on your machine, your compiler, and even your cookie. Make sure your exploit string works on attu
or your VM, and make sure you include your UWNetID on the command line to bufbomb
.
Watch your use of address modes when writing assembly code. Note that movq $0x4, %rax
moves the value 0x0000000000000004
into register %rax
; whereas movq 0x4, %rax
moves the value atmemory location 0x0000000000000004
into %rax
. Because that memory location is usually undefined, the second instruction will cause a segmentation fault!
The movq
instruction cannot directly move an 8-byte immediate (e.g. $0x0123456789ABCDEF
) to a memory location. To move an 8-byte immediate to a memory location, you must first move it to a temporary location, like a register, then move it from the temporary location to the memory address.
Do not attempt to use either a jmp
or a call
instruction to jump to the code for bang
. These instructions use PC-relative addressing, which is very tricky to set up correctly. Instead, push an address on the stack and use the retq
instruction.
If you keep getting Segmentation fault
, make sure you are running your exploit within gdb
.
Our preceding attacks have all caused the program to jump to the code for some other function, which then causes the program to exit. As a result, it was acceptable to use exploit strings that corrupt the stack, overwriting the saved value of register %rbp
and the return pointer.
The most sophisticated form of buffer overflow attack causes the program to execute some exploit code that patches up the stack and makes the program return to the original calling function (test
in this case). The calling function is oblivious to the attack. This style of attack is tricky, though, since you must: (1) get machine code onto the stack, (2) set the return pointer to the start of this code, and (3) undo the corruptions made to the stack state.
Look back at the test
function from Level 0 (Smoke). Your job for this level is to supply an exploit string that will causegetbuf
to return your cookie back to test
, rather than the value 1. You can see in the code for test
that this will cause the program to go "Boom!
". Your exploit code should set your cookie as the return value, restore any corrupted state, push the correct return location on the stack, and execute a ret
instruction to really return to test
.
When supplied with the correct exploit string, you should see the following output:
Boom!: getbuf returned <your cookie value>
In order to overwrite the return pointer, you must also overwrite the saved value of %rbp
. However, it is important that this value is correctly restored before you return to test
. You can do this by either (1) making sure that your exploit string contains the correct value of the saved %rbp
in the correct position, so that it never gets corrupted, or (2) restore the correct value as part of your exploit code. You'll see that the code for test
has some explicit tests to check for a corrupted stack.
You can use gdb
to get the information you need to construct your exploit string. Set a breakpoint within getbuf
and run to this breakpoint. Determine parameters such as the saved return address and the saved value of %rbp
.
Again, let tools such as gcc
and objdump
do all of the work of generating a byte encoding of the instructions.
Keep in mind that your exploit string depends on your machine, your compiler, and even your cookie. Again, again make sure your exploit string works on attu
or the VM, and make sure you include your UWNetID on the command line to bufbomb
.
Reflect on what you have accomplished. You caused a program to execute machine code of your own design. You have done so in a sufficiently stealthy way that the program did not realize that anything was amiss.
execve
is system call that replaces the currently running program with another program inheriting all the open file descriptors. What are the limitations of the exploits you have performed so far? How could callingexecve
allow you to circumvent this limitation? If you have time, try writing an additional exploit that uses execve
and another program to print a message.
Start with a fresh copy of lab0.c
again. Make sure you are compiling with the command below (do not reuse the same executable from previous labs).
Go to part_2
and change the second argument to the first call to fill_array
so that you see the message "Segmentation fault" when you run part 2:
$ wget https://courses.cs.washington.edu/courses/cse351/23wi/files/labs/lab0.c
$ gcc -g -std=c99 -fomit-frame-pointer -o lab0 lab0.c
$ ./lab0 2
*** LAB 0 PART 2 ***
...
Segmentation fault
Examine the contents of memory in GDB to figure out what happened and answer the following questions:
In your own words, explain the cause of this specific segmentation fault. What value gets corrupted and why it causes segmentation fault? Which assembly instruction causes the segmentation fault to occur at the moment it is executed? (Please be specific: give the name of the instruction as well as the name of the function where it is found.) [3 pt]
It turns out that you can figure out when you will get a segfault in part_2
just by looking at the assembly code! There are a few instructions that contribute to determining the limit on the second argument to fill_array
. Name two of the most relevant instructions in part_2
, including their addresses in the form "<function+#>
" as you see in GDB. What is the purpose of each of these instructions? What is the minimum length needed to cause a segmentation fault? (Please briefly explain the calculation you did to find the minimum length.) [4 pt]
Someone claims that creating array
on the Heap would remove the possibility of segmentation faults. Do you agree? Briefly explain why or why not. [2 pt]
Please follow the formatting specified here. Our grading scripts won't be nice if you don't name the files like we've asked or if you include additional text in any of the files.
You should submit the following files:
UW_ID.txt
lab3synthesis.txt
smoke.txt
fizz.txt
bang.txt
UW_ID.txt
should contain your UW netid (not CSE netid if it is different). Please generate it using the command echo your_UWNetID > UW_ID.txt
and replace your_UWNetID with your netid.lab3synthesis.txt
should contain your answers to the synthesis questions. The other three files correspond to the different exploits and should only contain the hex-formatted exploit string. Note that they should have the data that is sent to sendstring
, not the data produced by sendstring
.
Before submitting your exploits, you can check them by placing them in the same directory as bufbomb
and running make test
. This will output a summary of your exploits (the Makefile looks for all the files ending with .txt
and sends the contents of each to bufbomb
, one by one) and whether they succeed.
Submit your FOUR completed files listed above to the "Lab 3" assignment on Gradescope.
If you completed the bonus question, submit your completed dynamite.txt
and your UW_ID.txt
files to the "Lab 3 Extra Credit" assignment on Gradescope.
After submitting, please wait until the autograder is done running and double-check that you passed the "File Check" and "Compilation and Execution Issues" tests. If either test returns a score of -1, be sure to read the output and fix any problems before resubmitting. Failure to do so will result in a programming score of 0 for the lab.
It is fine to submit multiple times up until the deadline, we will only grade your last submission. NOTE that if you do re-submit, you MUST RE-submit ALL files again.