CSE413 Notes for Friday, 1/26/24

I began by asking whether anyone had heard about the concept of nullable types. Nobody raised their hand. I mentioned that I had heard an interesting talk by Anders Heljsberg, the designer of the C# programming language, in which he described Microsoft's decision to include nullable types in C# 2.0.

I said that in C# 2.0, you can declare a variant of int called int? that is a nullable version of int. I asked what that might look like and someone said it sounds like an int that can be set to null. That's exactly right. In version 2.0 of C# you can say things like:

        int? x;
        x = null;
Why would you want to do that? Anders said that Microsoft was doing this to allow C# code to more easily interoperate with SQL applications. Anyone who uses Excel should be familiar with this concept. There is a difference between leaving a data cell blank versus filling in a value like 0. I often have to yell at the TAs not to enter a 0 unless a student earned an actual score of 0. Otherwise our computations for values like minimum, median, and average will give misleading results.

So then I asked people if this reminded them of anything. Does Microsoft's "int?" correspond to anything in OCaml? Someone said it's an int option, which is right. OCaml's option type allows us to create nullable types. Of course, then you have to either use pattern matching or Option.get to pull a value back out of an option type, which is a bit of a pain. Microsoft has decided to have the compiler do a lot of the work for you to implicitly call the equivalent of Option.get and the equivalent of the Some constructor to "wrap" a value into an option.

One of the things I like about OCaml is that it is usually possible to write clean, simple code without having to redesign the language or to build such support into the compiler. The option type is easy to define in terms of standard OCaml:

        type 'a option = None | Some of 'a;
The get function is also easy to write using standard OCaml:

        let get(opt) = 
            match opt with
             | None -> invalid_arg("option is None")
             | Some value -> value
Then I started a new topic: lexical scope versus dynamic scope. This is just one example of a number of related topics that have to do with the static properties of a program versus the dynamic properties of a program. The terms compile time and run time are related terms because we can think of these as the static properties that can be deduced ahead of time by a program like a compiler versus the dynamic properties that are apparent only when the program actually executes.

Lexical scope is a static property, which is why it is sometimes referred to as static scope (e.g., in the wikipedia entry about scope). Lexical scope will be familiar because Java uses it. Consider, for example, the following program:

        public class Test {
            public static int x = 3;
        
            public static void one() {
        	x *= 2;
        	System.out.println(x);
            }
        
            public static void two() {
        	int x = 5;
        	one();
        	System.out.println(x);
            }
        
            public static void main(String[] args) {
        	one();
        	two();
        	int x = 2;
        	one();
        	System.out.println(x);
            }
        }
In Java, every set of curly braces introduces a new lexical scope (a new region of the program known as a block). In the program above, there is an outer scope for the overall class and inner scopes for each of the three methods:

         class Test
        +------------------+
        |                  |
        |  method one      |
        | +--------------+ |
        | |              | |
        | +--------------+ |
        |                  |
        |  method two      |
        | +--------------+ |
        | |              | |
        | +--------------+ |
        |                  |
        |  method main     |
        | +--------------+ |
        | |              | |
        | +--------------+ |
        +------------------+
We want to pay attention to the identifier "x" as it is used in each of these scopes. There is a global variable x declared in the outer scope. All three methods refer to x and two of the three declare a local x:

         class Test
        +------------------+
        | global int x     |
        |                  |
        |  method one      |
        | +--------------+ |
        | | refers to x  | |
        | +--------------+ |
        |                  |
        |  method two      |
        | +--------------+ |
        | | local int x  | |
        | | refers to x  | |
        | +--------------+ |
        |                  |
        |  method main     |
        | +--------------+ |
        | | local int x  | |
        | | refers to x  | |
        | +--------------+ |
        +------------------+
In both method two and method main, the local definition of x is the one that is used inside that method. This is actually the same in both lexical scope and dynamic scope. The big question has to do with the x in method one. It is not defined in method one, so which x is used? The answer is familiar to all of us. The reference to x in method one is a reference to the global variable x.

Because all of the manipulations in method one are on the global variable x, we know that it doubles from 3 to 6 to 12 to 24 and we know that the other methods refer to their local variables. As a result, people were able to easily tell me that the output produced by the program would be 6, 12, 5, 24, 2.

I pointed out that in this example the scopes aren't very deeply nested, but the scopes can actually be fairly deeply nested because we can have a situation like a for loop inside a while loop inside an if/else inside a method inside a class.

I said that I wanted to think about a hypothetical language that we might call "Dynamic Java" which uses dynamic instead of lexical scope. Let's consider how the same program would execute in Dynamic Java. There would be a dynamic scope opened up when we first invoked the Test class. Most people don't realize that this kind of thing happens in Java, but you'd see it very clearly if you add a static initializer to the class. In Java, you can add code like the following that is executed when the class is first accessed:

        public class Test {
            static {
                System.out.println("in static initializer");
            }

            ...
        }
So when we access this class, we get a dynamic scope for the class itself that has the global variable x inside it:

         class Test
        +-------------------------+
        | global int x            |
        |                         |
        | ...                     |
        +-------------------------+
So far this is the same as in the lexical scope case. But now we have to consider the sequence of methods that are called to determine which dynamic scopes are created. We start by calling method main, which means we'll have a scope for that call:

         class Test
        +-------------------------+
        | global int x            |
        |                         |
        |  method main            |
        | +---------------------+ |
        | |                     | |
        | +---------------------+ |
        +-------------------------+
Method main calls three methods: one followed by two followed by one. Each call introduces a new scope, so we end up with three inner scopes:



         class Test
        +-------------------------+
        | global int x            |
        |                         |
        |  method main            |
        | +---------------------+ |
        | |  method one         | |
        | | +-------------+     | |
        | | |             |     | |
        | | +-------------+     | |
        | |                     | |
        | |  method two         | |
        | | +-------------+     | |
        | | |             |     | |
        | | +-------------+     | |
        | |                     | |
        | |  method one         | |
        | | +-------------+     | |
        | | |             |     | |
        | | +-------------+     | |
        | +---------------------+ |
        +-------------------------+
This is very different than in the lexical scope case. With lexical scope these were all at the same outer level and there was only one scope for method one. But we aren't done even with this picture. Remember that method two calls method one, which means that there is an inner scope produced by that call:

         class Test
        +-------------------------+
        | global int x            |
        |                         |
        |  method main            |
        | +---------------------+ |
        | |  method one         | |
        | | +-------------+     | |
        | | |             |     | |
        | | +-------------+     | |
        | |                     | |
        | |  method two         | |
        | | +-----------------+ | |
        | | |                 | | |
        | | |  method one     | | |
        | | | +-------------+ | | |
        | | | |             | | | |
        | | | +-------------+ | | |
        | | +-----------------+ | |
        | |                     | |
        | |  method one         | |
        | | +-------------+     | |
        | | |             |     | |
        | | +-------------+     | |
        | +---------------------+ |
        +-------------------------+
Now consider what happens when we include information about variable declarations and references:

         class Test
        +-------------------------+
        | global int x            |
        |                         |
        |  method main            |
        | +---------------------+ |
        | |  method one         | |
        | | +-------------+     | |
        | | | refers to x |     | |
        | | +-------------+     | |
        | |                     | |
        | |  method two         | |
        | | +-----------------+ | |
        | | | local int x     | | |
        | | |                 | | |
        | | |  method one     | | |
        | | | +-------------+ | | |
        | | | | refers to x | | | |
        | | | +-------------+ | | |
        | | | refers to x     | | |
        | | +-----------------+ | |
        | |                     | |
        | | local int x         | |
        | |                     | |
        | |  method one         | |
        | | +-------------+     | |
        | | | refers to x |     | |
        | | +-------------+     | |
        | |                     | |
        | | refers to x         | |
        | +---------------------+ |
        +-------------------------+
The key thing to pay attention to is the interpretation of the variable x in method one. On the first call to method one, the only x we will have seen is the global one, so this call doubles the global variable to 6 and prints it out. But on the second call to method one, we see the x that is local to method two. So we double it from 5 to 10 and print it out both in method one and in method two. Then we return to main and a local variable x is introduced. This local variable is the one that method one finds on the third call to the method, so it doubles it from 2 to 4. This value is then reported by main. So in Dynamic Java this program would produce the output 6, 10, 10, 4, 4.

Someone then asked about the relationship between dynamic scope and the call stack. I said that was an excellent way to think about this issue. It's easy to get the impression that dynamic scope would be difficult to implement. In fact, it's very easy to implement if you are writing an interpreter because with dynamic scope you search for the most recently allocated version of a variable on the call stack. If the current method has allocated such a variable, you use it. If not, you see if the method that called this one has allocated such a variable and so on.

I said that dynamic scope isn't used very often because programmers generally find it confusing. The early languages that used it tended to be languages that were interpreted rather than compiled. There is, however, a notable exception. Nobody seemed to know, so I mentioned that it's something we teach in a 300-level class. Still nobody seemed to know, so I mentioned that shell scripts use dynamic scope. We considered the following short example:

        #!/bin/sh
        
        x="hello"
        
        foo()
        {
            echo $x
        }
        
        bar()
        {
            local x="foo"
            foo
        }
        
        foo
        bar
        echo $x
This script produces the following output:

        hello
        foo
        hello
On the first call to the function one we echo the global variable x. On the second call we echo the variable x that is local to function two. Just to prove that nothing funny is going on, I printed x after the call to function two at the end to show that the global variable is still set to "hello".

Then I showed two quick examples in OCaml. I asked people what this code would produce as a result:

        let x = 3
        let f(n) = x * n
        f(8)
        let x = 5
        f(8)
Several people correctly predicted that both calls on f(8) produced the result 24. So the function is using the initial binding of x to 3 even after we rebind x to something else. I then asked about a fairly complex example using let:

        let y = 2
        let f(n) =
            let x =
                let n = 3
                in 10 * (n + y)
            and y = 100 * n
            in x + y + n
        
        f(10)
Inside function f, when we go to compute the expression 10 * (n + y), we have to figure out which n and y to use. We find a local definition for n in that innermost scope, so we use that. There is a definition for y in the containing scope, but that definition comes after this one and order matters. So in this case, the y that is found is the y from the global environment and we compute the answer as 10 * (3 + 2) or 50.

We then compute the value of y to be 100 * n. In this case, we don't use the n that was used to compute x because that is an inner scope (you don't look inside a scope, only outside to outer scopes). So we find the parameter being passed to the function, which is 10 in this case. We use that and compute y to be 1000.

The final expression we have to evaluate is x + y + n. We find local definitions for x and y (50 and 1000) and n refers to the value passed as a parameter. Notice that it is not a conflict to have n redefined in an inner scope to be 3. At this level we see just the parameter n. So if the function is called with f(6), the result is 1060.


Stuart Reges
Last modified: Wed Feb 14 17:18:45 PST 2024