CSE503: Software Engineering
Lecture 20 (February 24, 1999)

David Notkin

v     Slicing

      Program slicing is an approach to selecting semantically related statements from a program [Weiser]

      In particular, a slice of a program with respect to a program point is a projection of the program that includes only the parts of the program that might affect the values of the variables used at that point

      The slice consists of a set of statements that are usually not contiguous

v     Basic ideas

      If you need to perform a software engineering task, selecting a slice will reduce the size of the code base that you need to consider

      Debugging was the first task considered

      Weiser even performed some basic user studies

      Claims have been made about how slicing might aid program understanding, maintenance, testing, differencing, specialization, reuse and merging

v     Example

i := 1;
sum := 0;
product := 1;
while i <= n do begin
sum := sum + i;
product := product * i;
i := i + 1;

i := 1;

product := 1;
while i <= n do begin
product := product * i;
i := i + 1;


      In this example, the second program is a slice of the first, slicing on the product variable at the write statement.

v     For Weiser, a slice was a reduced, executable program obtained by removing statements from a program

      The new program had to share parts of the behavior of the original

      Weiser computed slices using a dataflow algorithm, given a program point (criterion)

        Using data flow and control dependences, iteratively add sets of relevant statements until a fixpoint is reached

v     Ottenstein & Ottenstein defined an alternative approach that clearly dominates

      Build a program dependence graph (PDG) representing a program

      Select node(s) that identify the slicing criterion

      The slice for that criterion is the reachable nodes in the PDG

      PDG for the example (at the bottom of the file)

        Thick lines are control dependences

        Thin lines are (data) flow dependences

v     True PDGs are somewhat more complicated than this

      Vertices in the graph represent (a) assignment states and (b) predicates in the program

      Edges represent control and data flow dependences

      Control dependences always start at a predicate (or the entry node, a special node representing the beginning of the computation)

        They are labeled with a boolean

        Intuitively, node w is control dependent on node v if the predicate of node v evaluates to the label on the edge from v to w

        That is, what happens at w controls whether or not v executes

        An assignment statement followed immediately by another assignment statement have no control dependence between them, since the second one always executes when the first one does

        This is roughly similar to the notion of a basic block

      Data dependences represent the possible flow of values through the program

        (Roughly) there is a data dependence (edge) from node v to node w if v includes an assignment to some variable x, and then w includes a use of (that specific) x.

        These can be separated into (at least) loop-independent and loop-carried dependences, which roughly distinguish whether the relationship is across iterations of a loop or not

      Def-order dependences can also be used; these aren't needed for all analyses, but ensure that only equivalent programs have isomorphic PDGs.

v     Procedures

      What happens when you have procedures and still want to slice?

      Weiser extended his dataflow algorithm to interprocedural slicing

      The PDG approach also extends to procedures

      But interprocedural PDGs are a bit hairy (Horwitz, Reps, Binkley used SDGs)

        Representing conventional parameter passing is not straightforward

      In the Reps/Horwitz paper is an example of an SDG

P := 3.14;
rad := 3;
if DEBUG then rad :=4 fi;
call Mult3(P,rad,rad,area);
call Mult3(2,P,rad,circ);

procedure Mult3(op1,op2,op3,result)
result := 0p1*op2*op3;

v     Essentially a linked collection of PDGs (one per procedure), with linking vertices for calling protocols

      Value-result parameter passing is used; it's much harder to handle by-reference mechanisms

      If you slice an SDG-like structure using standard reachability, you can get a very large slice

        Specifically, if you slice from inside a procedure, with reachability you trace back to all possible calls sites of the procedure; since calls and returns are represented by separate edges, it's roughly like taking a return to a call site that didn't make the call

        There are a number of solutions to this, include the summary edges imposed in SDGs

v     Technical issues

      How to slice in the face of unstructured control flow?

      Must slices be executable?

      What about slicing in the face of pointers?

      What about those pesky preprocessor statements?

v     Dynamic slicing

      These algorithms have all been static

      They work for all possible inputs

      There is also work in dynamic slicing, trying to find slices that satisfy some execution streams over sets of inputs

        Korel and Laski characterize dynamic slices in terms of a trajectory that captures the execution history of a program in terms of a sequence of statements and control predicates

v     (Potential) applications of slicing


      Program differencing

      Semantic versions of diff

      Program integration

      Merging versions together


        Slicing can be used to define more rigorous testing criterion than a conventional data flow testing criterion

v     Recap

      Cool idea

      Cool supporting technology

      Difficult on practical programs

      May be coming closer to feasible after almost 20 years of research

      Little data on the size of slices

      Will it be more than a cool idea?

        My guess? No (but I wouldnt bet the farm on it)