CSE341 Notes for Wednesday, 4/15/09

We began by looking at another example of higher-order functions and currying. On homework assignment 1 there was a problem to write a function that would return the number of negatives in a list. That's a little too close to a problem on the current homework, so I said that we'd write a function to return the number of zeros in a list.

The function filter is very helpful here. We just need a function to determine if a value is equal to 0. This is a good place to use an anonymous function:

        fn x => (x = 0)
We can use that to filter a list:

        filter(fn x => (x = 0), lst)
We then return the length of this list as the answer:

        fun numZeros(lst) = length(filter(fn x => (x = 0), lst))
Then we explored how to use currying to replace the anonymous function with an equivalent expression. What is the underlying function that we're applying? The operator equals. If you just refer to:

        op=
you get a noncurried version of the function. We can pass this as an argument to the curry function that I've included in utility.sml to produce a curried version of the operator:

        curry op=
Now we need to partially instantiate the function. The problem is that our test begins with x:

        x = 0
But it doesn't have to begin with 0, because this is equivalent:

        0 = x
This is now fairly easy to partially instantiate:

        curry op= 0
When we typed the expression above into the ML interpreter, it responded with this:

        val it = fn : int -> bool
Just as we would hope, the expression evaluates to a function that takes an int as an argument and that returns a boolean value (whether or not the int is equal to 0). I mentioned that in writing ML code, it is useful to type these little code snippets into the interpreter to check your solution. If you rely on always typing in the full version of a function, you might have trouble locating syntax errors or bugs. It's better to test it in pieces.

Using this expression, we were able to define a new version of the function:

        fun numZeros2(lst) = length(filter(curry op= 0, lst))
Then I asked how we could use currying to eliminate the function definition and replace it with a val declaration. The problem is that we have two different functions that we want to apply: length and filter. Whenever you have two or more functions to apply, you know you're going to need the composition operator. So the basic form of our answer is going to be:

        val numZeros3 = length o (call on filter)
In other words, at the highest level what we're doing is to compose a call on length with a call on filter. But how do we rewrite the call on filter? In the code above, we are using the noncurried version of filter. I have included in utility.sml curried versions of the higher-order functions called map2, filter2 and reduce2. The general form of a call on filter2 would be:

        filter2 (predicate function) (list)
In our case, we are trying to eliminate the list parameter so that we can write this using a val declaration rather than a standard function declaration. In other words, we want a partially instantiated call on this function where we supply the predicate but not the list. We have to use filter2 instead of filter and we have to be careful to use parentheses to indicate the grouping of the expression that returns our predicate function:

        filter2 (curry op= 0)
Ullman has good examples in the section on curried functions in chapter 5 about how to properly use parentheses in an expression like this. Without the parentheses, ML tries to think of this as:

        (filter2 curry) op= 0
And that generates an error because the curry function is not a predicate. Putting the filter2 expression into our original expression, we get our third version of the function definition:

        val numZeros3 = length o (filter2 (curry op= 0))
In this case we actually don't need parentheses around the call on filter2, but it's generally easier to include some extra parentheses than to have to learn all of the subtleties of ML precedence rules.

Then I discussed another programming language concept that is important to understand: the concept of type safety. The concept of type safety has generally replaced the older terminology about a language being strongly typed.

Type safety is a concept that is usually described in terms of its opposite. We talk about the concept of type errors and say that a language is type safe if it doesn't allow any type errors to occur. The poster child for type errors is C and its close relative C++, so it's easiest to beat up on C and C++ in talking about how not to achieve type safety.

I mentioned that Corky Cartwright who maintains the DrScheme program we'll be using later in the quarter (a fan of functional programming) once described type safety to me by saying that any given set of bits stored on the computer should not be interpreted in two different ways.

In C and C++, for example, you can freely cast variables from one type to another. For example, you might define an array of 4 characters as follows:

        char text[4];
        text[0] = 'f';
        text[1] = 'u';
        text[2] = 'n';
        text[3] = '\0';
Each character takes up one byte of memory in the computer (8 bits). So the overall array takes up 4 bytes in memory. That also happens to be the amount of space that an int takes up and in most implementations of C and C++, so you can ask the language to reinterpret those 4 bytes as an int rather than as an array of 4 characters:

        int n = (int) text;
This code works in C++. It assigns the variable n the value -1079956272. The thing that makes this so bad in C++ is that the compiler doesn't do anything to convert the data from one form to another. It is allowing you to simply pretend that those 4 bytes represented an int instead of an array of characters.

In a type-safe language like Java, casting is limited to casts that make sense. You can cast an int to a double or a double to an int, but in that case an actual conversion takes place (the bits in one form are converted to appropriate bits of the other form). You aren't allowed to cast a String to an int or to otherwise reinterpret bits without conversion.

We saw that you can make this even worse when you use the "address of" operator in C and C++ to reinterpret. Using our variable called text, we can assign an int pointer to point to it and then use the pointer to double the int:

  int* p = &text;
  *p *= 2;
The string couldn't even be displayed after we did this operation. Obviously it makes no sense to manipulate these four bytes as text in one part of the program and to manipulate them as an int in another part of the program.

Probably the more egregious error occurs in C and C++ when you access unitialized variables. For example, if you construct an array using C's malloc operator or using C++'s new operator, you are just assigned a chunk of memory. The memory isn't initialized in any way. But the memory you are allocated may have been used previously for some other data, so you have a set of "dirty" bits that are not going to be potentially reinterpreted as being of another type. Java avoids this by initializing all arrays and objects when they are constructed.

In fact, this behavior of Java managed to break a lot of merge sort algorithms that were included in standard textbooks. They had been translated from C++ textbooks where the code worked. They relied on allocating many large arrays and expected the allocation to take O(1) time. In Java that wasn't true because Java insisted on initializing each array element to guarantee type safety.

Another place that this comes up is with local variables. When you make two different method calls in a row:

        f1();
        f2();
Java has to allocate the local variables for f1 while it executes and then deallocate them when it's done and then do the same for f2. This is generally done with a stack. You allocate space on the stack for f1's local variables while it executes, then pop the stack. Then do the same for f2. But that can have a curious side effect. If you fail to initialize your local variables in f2, then they have whatever value is left over from f1. In general, this will be a garbage value because the types won't match. In a type-safe language like Java, it is illegal to examine the value of a local variable without first initializing it.

I showed the following C++ program that manages to pass values from one function to another through the use of local variables:

        #include <iostream>
        using namespace std;
        
        void f1() {
          int x;
          double y;
          x = 15;
          y = 38.9;
        }
        
        void f2() {
          int a;
          double b;
          cout << a << endl;
          cout << b << endl;
        }
        
        int main() {
          f1();
          f2();
          return 0;
        }
This program printed the values 15 and 38.9 from f2 even though they were initialized in f1. Of course, this only works because we declared the local variables in the exact same order (an int followed by a double). If you switch the order in f2, then you get something very different.

One final example is that C and C++ don't check the bounds for simple arrays. You might have an array that is declared to have 100 elements, but C and C++ allow you to ask for element 5000 or element -250. It simply computes a location in memory and interprets the bits it finds there as being of the element type of the array. That's one of the reasons that it is so common for C and C++ programs to crash with a "segmentation fault", because the language allows you to make potentially dangerous references to random parts of memory.

These are examples of ways in which C and C++ are not type safe. You should not be able to refer to variables that were never initialized, you should not be able to arbitrarily reinterpret bits in a different way and you shouldn't be able to reach into random parts of memory and to treat it as any kind of data you like.

These concerns became far more important when the world wide web came along because all of a sudden we wanted to include applets in our web pages. To convince a user to run an arbitrary bit of code on their computer, you have to give them some kind of assurance that your program is well behaved. That was possible to do with Java because the designers of the language took type safety very seriously. The were able to make a convincing case that Java programs ran in a "sandbox" that severely limited the potential damage they could do. It's almost impossible to give similar guarantees about code written in languages like C and C++.

It was interesting to see Microsoft deal with this issue in the 1990's. At first they were experimenting with Java, but when they started making changes to the libraries Sun sued them. Instead, Microsoft designed a new language called C# that ended up looking a lot like Java. C# can make similar guarantees of type safety with one notable exception. C# has a special keyword "unsafe" that can be attached to methods and classes. This allows C# programs to work with legacy code written in unsafe languages, but programmers are encouraged to avoid unsafe code whenever possible.


Stuart Reges
Last modified: Sat Apr 25 17:36:15 PDT 2009