v
Lackwit (and LCLint intro, if time)
Ø Examples of
approaches that check partial specifications and that exploit type information
v Lackwit (O’Callahan & Jackson)
Ø Code-oriented tool that exploits type
inference
Ø Answers queries about C programs
§
e.g., “locate
all potential assignments to this field”
Ø Accounts for aliasing, calls through
function pointers, type casts
Ø Efficient
§
e.g., answers
queries about a Linux kernel (157KLOC) in under 10 minutes on a PC
v Placement
Ø Lexical tools are very general, but are
often imprecise because they have no knowledge of the underlying programming
language
Ø Syntactic tools have some knowledge of the
language, are harder to implement, but can give more precise answers
Ø Semantic tools have deeper knowledge of the
language, but generally don’t scale, don’t work on real languages and are hard
to implement
Ø The goal of Lackwit, thus, is to use some
semantic basis, in a scalable way, on a real language (C)
v It is a static tool
Ø Can work on incomplete programs
Ø Make assumptions about missing code, or
supply stubs
v Sample queries
Ø Which integer variables contain file
handles?
Ø Can pointer foo in function bar be passed to free()?
If so, what paths in the call graph are involved?
Ø Field f
of variable v has an
incorrect value; where in the source might it have changed?
Ø Which functions modify the cur_veh field of map_manager_global?
v Lackwit analysis
Ø Approximate (may return false positives)
Ø Conservative (will not return false
negatives) under some conditions
§
C’s type
system has holes
§
Lackwit makes
assumptions similar to those made by programmers (e.g., “no out-of-bounds
memory accesses”)
§
Lackwit is
unsound only for programs that don’t satisfy these assumptions
v Query commonalities
Ø There are a huge number of names for
storage locations
§
local and
global variables; procedure parameters; for records, etc., the sub-components
Ø Values flow from location to location,
which can be associated with many different names
Ø Archetypal
query: Which other names
identify locations to which a value could flow to or from a location with this
given name?
Ø Answers can be given textually or
graphically
v An example query
Ø Query about the cur_veh field of
map_manager_global
Ø Shaded ovals are functions extracting
fields from the global
Ø Unshaded ovals pass pointers to the structure
but don’t manipulate it
Ø Edges between ovals are calls
Ø Rectangles are globals
Ø Edges to rectangles are variable accesses
Ø Claim
§
This graph
shows which functions would have to be checked when changing the invariants of
the current vehicle object
§
Requires
semantics, since many of the relationships are induced by aliasing over
pointers
v Underlying technique
Ø Use type inference, allowing type
information to be exploited to reduce information about values flowing to
locations (and thus names)
Ø But what to do in programming languages
without rich type systems?
Ø Trivial example
§
DollarAmt getSalary(EmployeeNum e)
Ø Relatively standard declaration
Ø Allows us to determine that there is no way
for the value of e to flow
to the result of the function
§
That is, e
can (and surely would) affect what DollarAmt gets returned, but e itself (nor
any computation that is based on e) will be able to flow to the return value
Ø Because they have different types
Ø Consider an alternative
§
int getSalary(int e)
Ø Another, perhaps more common, way to
declare the same function
Ø This doesn’t allow the direct inference
that e’s value doesn’t flow to the function return
§
Because they
have the same type
v But maybe one could analyze the program to
determine this anyway; Lackwit does so by using a type inference mechanism for
precision
v Lackwit’s type system ignores the C type
declarations
Ø Computes new types in a richer type system
Ø Incomplete type information
Ø Cover example from the paper
Ø void*
return1st(void* x, void* y) {
return x; }
Ø (a
refb,
b) ®f a
refb
Ø The type variable a indicates that the type of the contents of the pointer x is
unconstrained
Ø But it must be the same as the type of the
contents of the return value
Ø Increases the set of queries that Lackwit
can answer with precision
v Polymorphism
char*
ptr1;
struct timeval* ptr2;
char** ptr3;
…
return1st(ptr1,ptr2);
return1st(ptr2,ptr3);
Ø Both calls match the previous function
declaration
Ø
This is
solved (basically) by giving return1st a richer type and instantiating it at
every call site
§
(c refb,
d) ®d c
refb
§
(e refa,
f) ®c e
refa
v Type stuff
Ø Modified form of Hindley-Milner algorithm
“W”
Ø Efforts made to handle
Ø Mutable types
Ø Recursive types
Ø Null pointers
Ø Uninitialized data
Ø Type casts
Ø Declaration order
v Morphin case study
Ø Robot control program of about 17KLOC
Ø Vehicle object contains two queue objects
Ø Client was investigating combining these
two queues into one
Ø Queried each queue object to discover
operations performed and their contexts
Ø The two graphs each contained 171 nodes
Ø But each graph had only five nodes highlighted
as “accessor” nodes
Ø Example
§
These five
matches helped identify code to be changed
§
grep would
have returned false matches and missed matches when parameters were passed to
functions
§
Context-sensitivity
needed to distinguish the two queue objects because both are passed as arguments
to the same queue functions
v Recap
Ø Helps find relationships among variables in
a C program
Ø Exploits type inference to understand values
flowing to locations and thus names
Ø Approximate, although safe under many
(most?) conditions
Ø Reasonably efficient
Ø Although I didn’t show the numbers, they
are now better than reported in the ICSE paper