Pointer analysis
Requirements
You will implement a pointer analysis for Whirlwind. Your analysis
should keep track of the following: pointers to variables, pointers to
dynamically allocated memory, pointers to structures, pointers to
fields of structures and pointers to arrays. You analysis should also
perform lhs indirection elimination using the
whirlwind framework. In particular, you will transform statements of
the form *x := e
into y := e if it is know that x can only point to y.
The information that your analysis will compute at each program point
is a map from
locations to sets of locations. A location is an abstract piece of
memory. A dataflow fact [l ->
{l1, l2, l3}] says that the content of location l may contain a pointer to l1, l2 or l3. There are several
kinds of locations:
- Var(x) represents the memory location of variable x.
- AllocSummary(L) is a summary node: it represents all the memory
locations that have been allocated at allocation site L.
- Field(l, f) represents field f of location l. So, for example,
Field(Var(x), f) represents the location of field f of variable x.
As another example, Field(AllocSummary(L), f) summarizes all the fields
f of structures allocated at allocation site L.
- ArrayElemSummary(l) represents the elements of the array at
location l. So for example, ArrayElemSummary(Var(x)) represents the
elements of the array variable x. As another example,
Field(ArrayElemSummary(Var(x)),
f) represents all the fields f of the array elements of x (this assumes
that x is an array of structures that contain a field f).
An example
The following example shows a simple program and the information your
analysis should compute. Before looking at the example, there are a few
things that you need to keep in
mind about WIL:
- In WIL, the structure selection operator (the dot operator)
operates on pointers to structures, not on structures: given a pointer p to a structure, p.f returns a pointer to the
field f of the structure
pointed to by p. Thus,
what in C you would write as x.f,
in WIL you would write as *((&x).f).
Notice how in the example below pf3
points to field f3 of gs.
- In WIL, the array selection operator (the bracket operator)
operates on pointers to arrays: given a pointer p to an array, p[i] returns a pointer to the
ith array element of a.
Thus, if a is an array,
what in C you would write as a[i],
in WIL you would write as *((&a)[i]).
Notice how in the example below, ppint
points to the 5th element of array a.
![](pointer_analysis_example.jpg)
Skeleton code
You will start out with the following skeleton
code. This code provides a lot of the framework for
implementing pointer analysis. In particular, it provides a PtsToAnalysis object and the
definition of the analysis domain. This includes a Location hierarchy, a class
that represents sets of locations, and a class that defines the pointer
information. Your task is to fill in the implementation of the methods
that have a TODO in their body.
You will find it useful to look at the files cse.cecil and
constant-prop-cfg.cecil in
the whirlwind code base for examples of complete optimizations written
over the CFG. You can also see how transformations are performed by
looking at the *try_to_simplify methods in those files.
The nodes to look at
In order to implement pts_to_targets
and pts_to_analyze
in the skeleton code you will need to consider various WIL nodes. The
file ir-node.cecil
contains a definition of all the IR nodes in WIL. In particular, the
following nodes should be useful for your pointer analysis:
- StmtNode: this abstract
node represents statements. Various sub-classes are listed below.
- AssnOrInitNode: this
abstract node represents an assignment. You can get the lhs and rhs of
the assignment by sending the lhs and rhs message to the AssnOrInitNode
object. There are various sub-classes of AssnOrInitNode, which are
listed below. In general, you should be able to write code that handles
all of the sub-classes by just handling the AssnOrInitNode case.
- StoreNode: this
abstract node (which is a sub-class of AssnOrInitNode) represents an
assignment through a pointer. So the assignment *x := y is a StoreNode, whose lhs field is the IDExprNode for
x and whose rhs
field
is the IDExprNode for y.
- VarAssnNode: this
abstract node (which is a sub-class of AssnOrInitNode) represents an
assignment to a variable. Sub-classes of this node include
initializing assignments (InitVarAssnNode), such as x :== y, and regular
assignments such as x := y VarAssnNode.
- VarDeclInitNode: this
node (which is a sub-class of AssnOrInitNode) represents an
initializing declaration of a variable, for example decl x :== y.
- ValueNode: this abstract
node represents a value. The rhs of an assignment is always a
ValueNode. ValueNode is a super class of all of the following nodes.
- LHSNode: this abstract
node represents an L-value, which is an expression that can appear on
the lhs of an assignment.
- IDExprNode: this node
represents evaluating a reference to an identifier. So, in the
assignment x := y, the
expression y is an
IDExprNode. It turns out that x
here is also an IDExprNode,
since VarAssnNode also uses IDExprNode to represent the lhs variable
being assigned to. This, however, is somewhat of a misuse of IDExprNode
since IDExprNode represents evaluating an identifier and returning its
value, whereas the lhs variable in an assignment is not being
evaluated, it is being assigned to.
- FxnCallNode: this node
represents a function call.
- DerefNode: this node
represents a dereference of a pointer. So, in the assignment x := *y, the expression *y is a DerefNode, whose base
field is an IDExprNode.
- RecordAccessNode: this
node represents the . operator. So, in the assignment x:= y.f, the expression y.f is a RecordAccesNode, whose
base field is an IDExprNode, and whose id field is the string "f".
- ArrayAccessNode: this
node represents the [] operator. So, in the assignment x := y[5], the expression y[5] is an ArrayAccessNode,
whose base field is an IDExprNode.
- AllocNode: this
represents a dynamic memory allocation. So, in the assignment x := new S, the expression new S is an AllocNode.
- VarAddrNode: this node
represents taking the address of a variable. So,
in the assignment x := &y,
the expression &y is
a VarAddrNode whose target
field is an IDExprNode.
- NullLiteralNode: this
node represents the value NULL.
If you are ever confused by what a node is, you can always use print_string to see it's string
representation, and basic_print_string
to see what kind of object it is. So, for example, if have an
IDExprNode that represents a reference to variable y, then if you call print_string on it, you will
get "y", and if you call basic_print_string,
you will get the some string that tells you, among other things, that
you're looking at an IDExprNode.