CSE503: Software Engineering
Lecture 21 (February 26, 1999)

David Notkin

v     Lackwit (and LCLint intro, if time)

      Examples of approaches that check partial specifications and that exploit type information

v     Lackwit (OCallahan & Jackson)

      Code-oriented tool that exploits type inference

      Answers queries about C programs

        e.g., locate all potential assignments to this field

      Accounts for aliasing, calls through function pointers, type casts

      Efficient

        e.g., answers queries about a Linux kernel (157KLOC) in under 10 minutes on a PC

v     Placement

      Lexical tools are very general, but are often imprecise because they have no knowledge of the underlying programming language

      Syntactic tools have some knowledge of the language, are harder to implement, but can give more precise answers

      Semantic tools have deeper knowledge of the language, but generally dont scale, dont work on real languages and are hard to implement

      The goal of Lackwit, thus, is to use some semantic basis, in a scalable way, on a real language (C)

v     It is a static tool

      Can work on incomplete programs

      Make assumptions about missing code, or supply stubs

v     Sample queries

      Which integer variables contain file handles?

      Can pointer foo in function bar be passed to free()? If so, what paths in the call graph are involved?

      Field f of variable v has an incorrect value; where in the source might it have changed?

      Which functions modify the cur_veh field of map_manager_global?

v     Lackwit analysis

      Approximate (may return false positives)

      Conservative (will not return false negatives) under some conditions

        Cs type system has holes

        Lackwit makes assumptions similar to those made by programmers (e.g., no out-of-bounds memory accesses)

        Lackwit is unsound only for programs that dont satisfy these assumptions

v     Query commonalities

      There are a huge number of names for storage locations

        local and global variables; procedure parameters; for records, etc., the sub-components

      Values flow from location to location, which can be associated with many different names

      Archetypal query: Which other names identify locations to which a value could flow to or from a location with this given name?

      Answers can be given textually or graphically

v     An example query

      Query about the cur_veh field of map_manager_global

      Shaded ovals are functions extracting fields from the global

      Unshaded ovals pass pointers to the structure but dont manipulate it

      Edges between ovals are calls

      Rectangles are globals

      Edges to rectangles are variable accesses

      Claim

        This graph shows which functions would have to be checked when changing the invariants of the current vehicle object

        Requires semantics, since many of the relationships are induced by aliasing over pointers

v     Underlying technique

      Use type inference, allowing type information to be exploited to reduce information about values flowing to locations (and thus names)

      But what to do in programming languages without rich type systems?

      Trivial example

        DollarAmt getSalary(EmployeeNum e)

      Relatively standard declaration

      Allows us to determine that there is no way for the value of e to flow to the result of the function

        That is, e can (and surely would) affect what DollarAmt gets returned, but e itself (nor any computation that is based on e) will be able to flow to the return value

      Because they have different types

      Consider an alternative

        int getSalary(int e)

      Another, perhaps more common, way to declare the same function

      This doesnt allow the direct inference that es value doesnt flow to the function return

        Because they have the same type

v     But maybe one could analyze the program to determine this anyway; Lackwit does so by using a type inference mechanism for precision

v     Lackwits type system ignores the C type declarations

      Computes new types in a richer type system

      Incomplete type information

      Cover example from the paper

      void* return1st(void* x, void* y) {
return x; }

      (a refb, b) f a refb

      The type variable a indicates that the type of the contents of the pointer x is unconstrained

      But it must be the same as the type of the contents of the return value

      Increases the set of queries that Lackwit can answer with precision

v     Polymorphism

char* ptr1;
struct timeval* ptr2;
char** ptr3;

return1st(ptr1,ptr2);
return1st(ptr2,ptr3);


      Both calls match the previous function declaration

           This is solved (basically) by giving return1st a richer type and instantiating it at every call site

             (c refb, d) d c refb

        (e refa, f) c e refa

v     Type stuff

      Modified form of Hindley-Milner algorithm W

      Efforts made to handle

      Mutable types

      Recursive types

      Null pointers

      Uninitialized data

      Type casts

      Declaration order

v     Morphin case study

      Robot control program of about 17KLOC

      Vehicle object contains two queue objects

      Client was investigating combining these two queues into one

      Queried each queue object to discover operations performed and their contexts

      The two graphs each contained 171 nodes

      But each graph had only five nodes highlighted as accessor nodes

      Example

        These five matches helped identify code to be changed

        grep would have returned false matches and missed matches when parameters were passed to functions

        Context-sensitivity needed to distinguish the two queue objects because both are passed as arguments to the same queue functions

v     Recap

      Helps find relationships among variables in a C program

      Exploits type inference to understand values flowing to locations and thus names

      Approximate, although safe under many (most?) conditions

      Reasonably efficient

      Although I didnt show the numbers, they are now better than reported in the ICSE paper