Program slicing is an approach to selecting semantically related statements from a program [Weiser]
In particular, a slice of a program with respect to a program point is a projection of the program that includes only the parts of the program that might affect the values of the variables used at that point
The slice consists of a set of statements that are usually not contiguous
Basic ideas
If you need to perform a software engineering task, selecting a slice will reduce the size of the code base that you need to consider
Debugging was the first task considered
Weiser even performed some basic user studies
Claims have been made about how slicing might aid program understanding, maintenance, testing, differencing, specialization, reuse and merging
Example
read(n)
i := 1;
sum := 0;
product := 1;
while i <= n do begin
sum := sum + i;
product := product * i;
i := i + 1;
end;
write(sum);
write(product);
read(n)
i := 1;
product := 1;
while i <= n do begin
product := product * i;
i := i + 1;
end;
write(product);
In this example, the second program is a slice of the first, slicing on the
product variable at the write statement.
For Weiser, a slice was a reduced, executable program obtained by removing statements from a program
The new program had to share parts of the behavior of the original
Weiser computed slices using a dataflow algorithm, given a program point (criterion)
Using data flow and control dependences, iteratively add sets of relevant statements until a fixpoint is reached
Ottenstein & Ottenstein defined an alternative approach that clearly dominates
Build a program dependence graph (PDG) representing a program
Select node(s) that identify the slicing criterion
The slice for that criterion is the reachable nodes in the PDG
PDG for the example (at the bottom of the file)
Thick lines are control dependences
Thin lines are (data) flow dependences
True PDGs are somewhat more complicated than this
Vertices in the graph represent (a) assignment states and (b) predicates in the program
Edges represent control and data flow dependences
Control dependences always start at a predicate (or the entry node, a special node representing the beginning of the computation)
They are labeled with a boolean
Intuitively, node w is control dependent on node v if the predicate of node v evaluates to the label on the edge from v to w
That is, what happens at w controls whether or not v executes
An assignment statement followed immediately by another assignment statement have no control dependence between them, since the second one always executes when the first one does
This is roughly similar to the notion of a basic block
Data dependences represent the possible flow of values through the program
(Roughly) there is a data dependence (edge) from node v to node w if v includes an assignment to some variable x, and then w includes a use of (that specific) x.
These can be separated into (at least) loop-independent and loop-carried dependences, which roughly distinguish whether the relationship is across iterations of a loop or not
Def-order dependences can also be used; these aren't needed for all analyses, but ensure that only equivalent programs have isomorphic PDGs.
Procedures
What happens when you have procedures and still want to slice?
Weiser extended his dataflow algorithm to interprocedural slicing
The PDG approach also extends to procedures
But interprocedural PDGs are a bit hairy (Horwitz, Reps, Binkley used SDGs)
Representing conventional parameter passing is not straightforward
In the Reps/Horwitz paper is an example of an SDG
program
P := 3.14;
rad := 3;
if DEBUG then rad :=4 fi;
call Mult3(P,rad,rad,area);
call Mult3(2,P,rad,circ);
output(area);
output(circ);
end;
procedure Mult3(op1,op2,op3,result)
result := 0p1*op2*op3;
end;
Essentially a linked collection of PDGs (one per procedure), with linking vertices for calling protocols
Value-result parameter passing is used; it's much harder to handle by-reference mechanisms
If you slice an SDG-like structure using standard reachability, you can get a very large slice
Specifically, if you slice from inside a procedure, with reachability you trace back to all possible calls sites of the procedure; since calls and returns are represented by separate edges, it's roughly like taking a return to a call site that didn't make the call
There are a number of solutions to this, include the summary edges imposed in SDGs
Technical issues
How to slice in the face of unstructured control flow?
Must slices be executable?
What about slicing in the face of pointers?
What about those pesky preprocessor statements?
Dynamic slicing
These algorithms have all been static
They work for all possible inputs
There is also work in dynamic slicing, trying to find slices that satisfy some execution streams over sets of inputs
Korel and Laski characterize dynamic slices in terms of a trajectory that captures the execution history of a program in terms of a sequence of statements and control predicates
(Potential) applications of slicing
Debugging
Program differencing
Semantic versions of diff
Program integration
Merging versions together
Testing
Slicing can be used to define more rigorous testing criterion than a conventional data flow testing criterion
Recap
Cool idea
Cool supporting technology
Difficult on practical programs
May
be coming closer to feasible after almost 20 years of research