David Notkin University of Washington Department of Computer Science & Engineering Winter 2002 |
What information did you need? | ||
What information was available? | ||
What tools produced the information? | ||
Did you think about other pertinent tools? | ||
How accurate was the information? | ||
Any false information? Any missing true information? | ||
How did you view and use the information? | ||
Can you imagine other useful tools? |
Reasoning about a maintenance task is often done in terms of a model of the source code | ||
Smaller than the source, more focused than the source | ||
Such a source model captures one or more relations found in the system’s artifacts | ||
We’ve talked about many possible relations: calls, uses, registers-in, names, #includes, etc. | ||
Source models are extracted using tools | ||
Any source model can be extracted in multiple ways | ||
That is, more than one tool can produce a given kind of source model | ||
The tools are sometimes off-the-shelf, sometimes hand-crafted, sometimes customized |
It would be best if every source model extracted was perfect | ||
All entries are true and no true entries are omitted | ||
For some source models, this is possible | ||
Inheritance, defined functions, #include structure, etc. | ||
For some source models, achieving the ideal may be difficult in practice | ||
Ex: computational time is prohibitive in practice | ||
For many other interesting source models, this is not possible | ||
Ideal call graphs, for example, are uncomputable |
These include all true information and maybe some false information, too | ||
Frequently used in compiler optimization, parallelization, in programming language type inference, etc. | ||
Ex: never misidentify a call that can be made or else a compiler may translate improperly | ||
Ex: never misidentify an expression in a statically typed programming language |
These include only truth but may omit some true information | ||
Often come from dynamic extraction | ||
Ex: In white-box code coverage in testing | ||
Indicating which statements have been executed by the selected test cases | ||
Others statements may be executable with other test cases |
May include some false information and may omit some true information | ||
These source models can be useful for maintenance tasks | ||
Especially useful when a human engineer is using the source model, since humans deal well with approximation | ||
It’s “just like the web!” | ||
Turns out many tools produce approximate source models |
Source model extractors can work | ||
statically, directly on the system’s artifacts, or | ||
dynamically, on the execution of the system, or | ||
a combination of both | ||
Ex: | ||
A call graph can be extracted statically by analyzing the system’s source code or can be extracted dynamically by profiling the system’s execution |
Usually, the engineer must iterate to get a source model that is “good enough” for the assigned task | |
Often done by inspecting extracted source models and refining extraction tools | |
May add and combine source models, too |
Given a software system, rename a given variable throughout the system | ||
Ex: angle should become diffraction | ||
Probably in preparation for a larger task | ||
Semantics must be preserved | ||
This is a task that is done infrequently | ||
Without it, the software structure degrades more and more |
Our preferred source model for the task would be a list of lines (probably organized by file) that reference the variable angle | ||
A static extraction tool makes the most sense | ||
Dynamic references aren’t especially pertinent for this task |
Let’s start with grep, the most likely tool for extracting the desired source model | ||
The most obvious thing to do is to search for the old identifier in all of the system’s files | ||
grep angle * |
It’s hard to determine which files to search | |||
Multiple and recursive directory structures | |||
Many types of files | |||
Object code? Documentation? (ASCII vs. non-ASCII?) Files generated by other programs (such as yacc)? Makefiles? | |||
Conditional compilation? Other problems? | |||
Care must be taken to avoid false negatives arising from files that are missing |
grep angle [system’s files] | ||
There are likely to be a number of spurious matches | ||
…triangle…, …quadrangle… | ||
/* I could strangle this programmer! */ | ||
/* Supports the small planetary
rovers presented by Angle & Brooks (IROS ‘90) */ |
||
printf(“Now play the Star Spangled Banner”); | ||
Be careful about using agrep! |
Some languages allow identifiers to be split across line boundaries | ||
Cobol, Fortran, PL/I, etc. | ||
This leads to potential false negatives | ||
Preprocessing can hurt, too | ||
#define deflection angle ... deflection = sin(theta); |
It is also important to check, before applying the change, that the new variable name (degree) is not in conflict anywhere in the program | ||
The problems in searching apply here, too | ||
Nested scopes introduce additional complications |
In this case, grep is a lexical tool but the renaming task is a semantic one | ||
Mismatch with syntactic tools, too | ||
Mismatches are common and not at all unreasonable | ||
But it does introduce added obligations on the maintenance engineer | ||
Must be especially careful in extracting and then using the approximate source model |
Even after you have extracted a source model that identifies all of (or most of) the lines that need to be changed, you have to change them | |
Global replacement of strings is at best dangerous | |
Manually walking through each site is time-consuming, tedious, and error-prone |
After extracting a good source model by iterating, the engineer can apply the renaming to the identified lines of code | |
However, since the source model is approximate, regression testing (and/or other testing regimens) should be applied |
Griswold developed an approach to meaning-preserving restructuring | |||
Make a local change | |||
The tool finds global, compensating changes that ensure that the meaning of the program is preserved | |||
What does it mean for two programs to have the same meaning? | |||
If it cannot find these, it aborts the local change |
Swap order of formal parameters | ||
It’s not a local change nor a syntactic change | ||
It requires semantic knowledge about the programming language | ||
Griswold uses a variant of the sequence-congruence theorem [Yang] for equivalence | ||
Based on PDGs (program dependence graphs) | ||
It’s an O(1) tool | ||
The user touches only one place |
The actual tool and approach has limited power | |||
Can help translate one of Parnas’ KWIC decompositions to the other | |||
Too limited to be useful in practice | |||
PDGs are limiting | |||
Big and expensive to manipulate | |||
Difficult to handle in the face of multiple files, etc. | |||
May encourage systematic restructuring in some cases | |||
Some related work specifically in OO by Opdyke and Johnson | |||
Question: How do you find appropriate restructuring? |
Star diagrams [Griswold et al.]
Meaning-preserving restructuring isn’t going to work on a large scale | ||
But sometimes significant restructuring is still desirable | ||
Instead provide a tool (star diagrams) to | ||
record restructuring plans | ||
hide unnecessary details | ||
Some modest studies on programs of 20-70KLOC |
The root (far left) represents all the instances of the variable to be encapsulated | |
The children of a node represent the operations and declarations directly referencing that variable | |
Stacked nodes indicate that two or more pieces of code correspond to (perhaps) the same computation | |
The children in the last level (parallelograms) represent the functions that contain these computations | |
Compared small teams of programmers on small programs | ||
Used a variety of techniques, including videotape | ||
Compared to vi/grep/etc. | ||
Nothing conclusive, but some interesting observations including | ||
The teams with the star diagram tools adopted simpler strategies for handling completeness and consistency |
Star diagrams may not be “the” answer | ||
But I like the idea that they encourage people | ||
To think clearly about a maintenance task, reducing the chances of an ad hoc approach | ||
They help track mundane aspects of the task, freeing the programmer to work on more complex issues | ||
To focus on the source code |