CSE 374, Lecture 10: Scope in C

So far

What we've covered over the last two lectures (go back and review if these are still confusing for you):

A model of memory and the address space
Introduction to "pointers"
Arrays
Strings
How to use the standard library with #include
How to define constants with #define
printf/scanf
Declaration vs definition
- "Forward declarations" are often put in header files (.h) which are then #include included in the .c file

Booleans

Although C has many familiar types from Java, it does NOT have an actual "boolean" true/false type. We have to make pseudo-booleans. If the type of a variable is an number or character, then 0 will be "false" and anything else will be "true". If the type of a variable is a pointer, then NULL will be "false" and any other pointer will be "true". NULL is a special value for pointers that means "no pointer".

File structure

Typical structure of a C file, from top to bottom:

Includes for standard library files
Includes for other header files (ex yours, in the same directory)
defined constants
global variables (example: static int daysofmonth[] = { 31, 28, ... };)
any forward declarations
function definitions

Compiling files

We've seen that we use the "gcc" program to compile our C programs like so:

    $ gcc -o echo echo.c echo.h

There are a few options that we highly suggest you use when using gcc, however:

    $ gcc -Wall -std=c11 -g -o echo echo.c echo.h

"-Wall" enables more helpful messages from the compiler.
"-std=c11" tells the compiler to use the most recent standard version of C (some additional features).
"-g" adds extra debugging information for use when debugging (next lecture!)

Variables and Scope

Let's pivot a bit.

"Scope" refers to the lifetime of a variable and where it can be used. Every variable starts life with "allocation" (when it is assigned a location in memory) and ends life with "deallocation" (when it is no longer assigned a location). Different types of variables have different rules about when they can be used.

                            Allocated       Deallocated              Scope
    Global variables
       int x = 5;           before main     after main               whole progaram
       int main(...) {}
    Static global vars
       static int x = 5     before main     after main               source file
       int main(...) {}
    Static local vars
       void foo() {         before main     after main               function
         static int x = 5;
       }
    Local variables
      void foo() {          when reached    closing curly brace "}"  within enclosing curly braces
        int x = 5;
      }

What does "static" really mean outside of these two specific global and local variable types? Unfortunately the name is applied inconsistently :( So this is more about memory of how it behaves in different contexts.

Side note: allocation != initialization. We learned that Java initializes variables with 0-equivalents whenever they are allocated, but C doesn't do this. If a variable is allocated by nothing is stored there, then it will be storing random bits from whatever was in memory before.

Variables and left-values vs right-values

What exactly IS a variable? We've been imprecise about this so far, but a "variable" is simple a LABEL that we use to refer to a particular location in memory. When the program is compiled, all variable names are erased and replaced with their actual memory locations. So the name of this label is just for us to use while programming.

Given this, what's the difference between a variable and a pointer?

A variable is a name for a location in memory; each variable has a value, which is what is actually stored at that location in memory.
A pointer is a type of variable; a variable of type pointer has a location (where the pointer itself is stored in memory) and it has a value (which is an address of another piece of data).

When we are assigning to a variable, there is something to the left of the equals sign and something to the right of the equals sign. Any reference to a variable on the left of the equals sign is referred to as an "l-value" or "left-expression"; any reference to a variable on the right-hand side is called an "r-value" or a "right-expression."

         x = x + 1
         ^   ^
    l-value  r-value

In the intro courses, we're pretty sloppy about the difference between an l-value and an r-value, but in C it is useful to understand the difference between them, as it relates to addresses and locations in memory as well as variables and pointers. There are three "laws" or rules that you should remember about l-values and r-values:

Law #1: Left-expressions get evaluated to locations (addresses).
Law #2: Right-expressions get evaluated to values.
Law #3: Values include numbers and pointers (addresses).

The key difference is the "rule" for variables:

As a left-expression, a variable is a location ("the label").
As a right-expression, a variable gets evaluated to its location's contents ("the value").
Most things do not make sense as left expressions (ex "9 = x;" is gibberish).

Example:

        int x = 1;        // Stores the VALUE 1 at a LOCATION which has the LABEL x.
        x = 2;            // Stores the VALUE 2 at the LOCATION x.
        int* xPtr = &x;   // Stores the VALUE of the address of x at a LOCATION which has the label xPtr.
        *xPtr = 3;        // Stores the VALUE 3 at the LOCATION indicated by the address stored in xPtr.
        int** xx = &(&x); // This doesn't work. The r-value needs to resolve to a value. &x does indeed
                          //   represent a value (the address of x), but &(&x) refers to the address of
                          //   the address of x - which doesn't make sense since the address of x is just
                          //   a number and is not actually stored in any variable.

Argument scope

Function arguments are very similar to local variables; their storage and scope follows the same rules. Unlike local variables, arguments are initialized by the CALLING FUNCTION, which copies the values into the function's variables. Since arguments are copies, assigning directly to an argument has no effect on the caller; the variable has been copied!

However, if an argument is a variable of a pointer type, then the value that was copied into the function is an address. As long as we have an address, we can follow that reference ("dereference the pointer" with the * operator) to modify the space pointed-to by that argument pointer. This can have an effect on the caller's world.

I put together a series of mystery exercises to work through and understand scope. Try them out.

Dangling pointers

Once we understand variable scope, we've uncovered a new class of problems called "dangling pointers". If you can store an address in a pointer, what if you store a pointer to a location in memory that goes out of scope (for example, because it is a local variable)?

    int* f(int x) {
      int *p;
      if (x) {
        int y = 3;
        p = &y; /* ok */
      } /* ok, but p now "dangling", points to y which went out of scope */
      /* y = 4 does not compile, y not in scope */
      *p = 7; /* could CRASH but probably not */
      return p; /* uh-oh, but no crash yet */
    }
    void g(int *p) { *p = 123; }
    void h() {
      g(f(7)); /* HOPEFULLY YOU CRASH (but maybe not) */
    }