Examples of approaches that check partial specifications and that exploit type information
Lackwit (O’Callahan & Jackson)
Code-oriented tool that exploits type inference
Answers queries about C programs
e.g., "locate all potential assignments to this field"
Accounts for aliasing, calls through function pointers, type casts
Efficient
e.g., answers queries about a Linux kernel (157KLOC) in under 10 minutes on a PC
Placement
Lexical tools are very general, but are often imprecise because they have no knowledge of the underlying programming language
Syntactic tools have some knowledge of the language, are harder to implement, but can give more precise answers
Semantic tools have deeper knowledge of the language, but generally don’t scale, don’t work on real languages and are hard to implement
The goal of Lackwit, thus, is to use some semantic basis, in a scalable way, on a real language (C)
It is a static tool
Can work on incomplete programs
Make assumptions about missing code, or supply stubs
Sample queries
Which integer variables contain file handles?
Can pointer
foo in function bar be passed to free()? If so, what paths in the call graph are involved?
Field
f of variable v has an incorrect value; where in the source might it have changed?
Which functions modify the
cur_veh field of map_manager_global?
Lackwit analysis
Approximate (may return false positives)
Conservative (will not return false negatives) under some conditions
C’s type system has holes
Lackwit makes assumptions similar to those made by programmers (e.g., "no out-of-bounds memory accesses")
Lackwit is unsound only for programs that don’t satisfy these assumptions
Query commonalities
There are a huge number of names for storage locations
local and global variables; procedure parameters; for records, etc., the sub-components
Values flow from location to location, which can be associated with many different names
Archetypal query
: Which other names identify locations to which a value could flow to or from a location with this given name?
Answers can be given textually or graphically
An example query
Query about the cur_veh field of map_manager_global
Shaded ovals are functions extracting fields from the global
Unshaded ovals pass pointers to the structure but don’t manipulate it
Edges between ovals are calls
Rectangles are globals
Edges to rectangles are variable accesses
Claim
This graph shows which functions would have to be checked when changing the invariants of the current vehicle object
Requires semantics, since many of the relationships are induced by aliasing over pointers
Underlying technique
Use type inference, allowing type information to be exploited to reduce information about values flowing to locations (and thus names)
But what to do in programming languages without rich type systems?
Trivial example
DollarAmt getSalary(EmployeeNum e)
Relatively standard declaration
Allows us to determine that there is no way for the value of
e to flow to the result of the function
That is, e can (and surely would) affect what DollarAmt gets returned, but e itself (nor any computation that is based on e) will be able to flow to the return value
Because they have different types
Consider an alternative
int getSalary(int e)
Another, perhaps more common, way to declare the same function
This doesn’t allow the direct inference that e’s value doesn’t flow to the function return
Because they have the same type
But maybe one could analyze the program to determine this anyway; Lackwit does so by using a type inference mechanism for precision
Lackwit’s type system ignores the C type declarations
Computes new types in a richer type system
Incomplete type information
Cover example from the paper
void* return1st(void* x, void* y) {
return x; }
(arefb
, b) ®
f
arefb
The type variable a indicates that the type of the contents of the pointer x is unconstrained
But it must be the same as the type of the contents of the return value
Increases the set of queries that Lackwit can answer with precision