Pointer analysis

Requirements

You will implement a pointer analysis for Whirlwind. Your analysis should keep track of the following: pointers to variables, pointers to dynamically allocated memory, pointers to structures, pointers to fields of structures and pointers to arrays. You analysis should also perform lhs indirection elimination using the whirlwind framework. In particular, you will transform statements of the form *x := e into y := e if it is know that x can only point to y.

The information that your analysis will compute at each program point is a map from locations to sets of locations. A location is an abstract piece of memory. A dataflow fact [l -> {l1, l2, l3}] says that the content of location l may contain a pointer to l1, l2 or l3. There are several kinds of locations:

Var(x) represents the memory location of variable x.
AllocSummary(L) is a summary node: it represents all the memory locations that have been allocated at allocation site L.
Field(l, f) represents field f of location l. So, for example, Field(Var(x), f) represents the location of field f of variable x. As another example, Field(AllocSummary(L), f) summarizes all the fields f of structures allocated at allocation site L.
ArrayElemSummary(l) represents the elements of the array at location l. So for example, ArrayElemSummary(Var(x)) represents the elements of the array variable x. As another example, Field(ArrayElemSummary(Var(x)), f) represents all the fields f of the array elements of x (this assumes that x is an array of structures that contain a field f).

An example

The following example shows a simple program and the information your analysis should compute. Before looking at the example, there are a few things that you need to keep in mind about WIL:

In WIL, the structure selection operator (the dot operator) operates on pointers to structures, not on structures: given a pointer p to a structure, p.f returns a pointer to the field f of the structure pointed to by p. Thus, what in C you would write as x.f, in WIL you would write as *((&x).f). Notice how in the example below pf3 points to field f3 of gs.
In WIL, the array selection operator (the bracket operator) operates on pointers to arrays: given a pointer p to an array, p[i] returns a pointer to the ith array element of a. Thus, if a is an array, what in C you would write as a[i], in WIL you would write as *((&a)[i]). Notice how in the example below, ppint points to the 5th element of array a.

Skeleton code

You will start out with the following skeleton code. This code provides a lot of the framework for implementing pointer analysis. In particular, it provides a PtsToAnalysis object and the definition of the analysis domain. This includes a Location hierarchy, a class that represents sets of locations, and a class that defines the pointer information. Your task is to fill in the implementation of the methods that have a TODO in their body.

You will find it useful to look at the files cse.cecil and constant-prop-cfg.cecil in the whirlwind code base for examples of complete optimizations written over the CFG. You can also see how transformations are performed by looking at the *try_to_simplify methods in those files.

The nodes to look at

In order to implement pts_to_targets and pts_to_analyze in the skeleton code you will need to consider various WIL nodes. The file ir-node.cecil contains a definition of all the IR nodes in WIL. In particular, the following nodes should be useful for your pointer analysis:

StmtNode: this abstract node represents statements. Various sub-classes are listed below.

AssnOrInitNode: this abstract node represents an assignment. You can get the lhs and rhs of the assignment by sending the lhs and rhs message to the AssnOrInitNode object. There are various sub-classes of AssnOrInitNode, which are listed below. In general, you should be able to write code that handles all of the sub-classes by just handling the AssnOrInitNode case.
StoreNode: this abstract node (which is a sub-class of AssnOrInitNode) represents an assignment through a pointer. So the assignment *x := y is a StoreNode, whose lhs field is the IDExprNode for x and whose rhs field is the IDExprNode for y.
VarAssnNode: this abstract node (which is a sub-class of AssnOrInitNode) represents an assignment to a variable. Sub-classes of this node include initializing assignments (InitVarAssnNode), such as x :== y, and regular assignments such as x := y VarAssnNode.
VarDeclInitNode: this node (which is a sub-class of AssnOrInitNode) represents an initializing declaration of a variable, for example decl x :== y.

ValueNode: this abstract node represents a value. The rhs of an assignment is always a ValueNode. ValueNode is a super class of all of the following nodes.

LHSNode: this abstract node represents an L-value, which is an expression that can appear on the lhs of an assignment.
IDExprNode: this node represents evaluating a reference to an identifier. So, in the assignment x := y, the expression y is an IDExprNode. It turns out that x here is also an IDExprNode, since VarAssnNode also uses IDExprNode to represent the lhs variable being assigned to. This, however, is somewhat of a misuse of IDExprNode since IDExprNode represents evaluating an identifier and returning its value, whereas the lhs variable in an assignment is not being evaluated, it is being assigned to.
FxnCallNode: this node represents a function call.
DerefNode: this node represents a dereference of a pointer. So, in the assignment x := *y, the expression *y is a DerefNode, whose base field is an IDExprNode.
RecordAccessNode: this node represents the . operator. So, in the assignment x:= y.f, the expression y.f is a RecordAccesNode, whose base field is an IDExprNode, and whose id field is the string "f".
ArrayAccessNode: this node represents the [] operator. So, in the assignment x := y[5], the expression y[5] is an ArrayAccessNode, whose base field is an IDExprNode.
AllocNode: this represents a dynamic memory allocation. So, in the assignment x := new S, the expression new S is an AllocNode.
VarAddrNode: this node represents taking the address of a variable. So, in the assignment x := &y, the expression &y is a VarAddrNode whose target field is an IDExprNode.
NullLiteralNode: this node represents the value NULL.

If you are ever confused by what a node is, you can always use print_string to see it's string representation, and basic_print_string to see what kind of object it is. So, for example, if have an IDExprNode that represents a reference to variable y, then if you call print_string on it, you will get "y", and if you call basic_print_string, you will get the some string that tells you, among other things, that you're looking at an IDExprNode.