CSE 584 Software Engineering Fall 1998,
Reading 4: Tools
Lackwitt: A program Understanding Tool based on Type Inference.
The Lackwitt tool is introduced fill the niche of program discovery and understanding. The entire scheme hinges on type inference. Type inference works by introducing a new type system in which representation information is encoded within the types. The richer and more expressive type system enables the computation of representation sharing. I think one of the very attractive things about Lackwitt is that it is highly automated. This is because the type inference procedure is fully automatic. I especially liked the fact that this tool has been used to analyze a project of about 17K lines of code to provide information useful enough for migration. It would be interesting to see how the tool will fare with a project of about 1/2 million lines of code. This tends to be a reasonable measure of performance for an industrial type tool. One of the common problems with this type of tool is the lack of ease of use regarding how the actual results are gathered from the tool. Generally the problem is that you get too much information out of the tool and it is hard to know what part is the real information you are interested in and what part is simply redundant. This problem is compounded with graphical displays, which tend to become so over populated that human inspection is practically impossible. This problem is addressed in Lackwitt by only showing nodes of interest in the results graph. As for how much detail there should be in the node itself, the authors hope to add an interactive feature that would allow the user to specify the level of detail that they would want to see. Overall I like the ideas behind the tool and it is the kind of tool that I would not mind investing about a day to try.
Dynamically Discovering Likely Program Invariants to Support Program Evolution
The subject matter discussed addresses the growing need of program migration assistance tools. In my experience, more than half of a program's development is spent in fixing bugs. I think the primary reason for this imbalance in true development versus program correction is that a lot of the corrective measures themselves introduce more bugs in the program. This is caused by among other things the inability to easily identify and specify what should not change during bug fixing. The method outlined is based on the discovery of invariants within a program. In order to 'spy' on the program, there is need for instrumentation. Generally the problem with instrumentation is deciding how much or where to instrument within the program. It does seem that the choice of procedure entry and exit points and loop heads made by the authors, would at least provide the minimal degree of instrumentation points. For a real working tool, I would like to see a higher degree of flexibility where I could say apply different levels of instrumentation to different modules or classes. One of the drawbacks of this method is that it requires the use of a test suite to exercise the instrumentation points. For one it is not clear how easy it would be to come up with representative test cases and secondly it suffers from the common problem of all dynamic analysis in that it would only exercise code paths that are exposed by the given set of inputs and not any more. However, as the authors point out, the method can be used in conjunction with other forms of static analysis in order to understand invariants' implications. Again as in other tools, presentation of results' information is critical and the authors do address it well and make good suggestions for summarizing and trimming information in a working version of the tool. It would be great to see these ideas developed into a more comprehensive and concrete prototype encapsulating all the ideas presented into a tool that can be applied to large-scale code bases.
Static Detection of Dynamic Memory Errors
The paper deals with the extension of LCLint to uncover memory management related errors in programs. I was not so interested in any discussions on detection of memory leaks because there has been a lot of work done in this area and it pretty well understood. I was more drawn to problems like misuse of null pointers and dangerous aliasing. The authors use annotations to make assumptions about function interfaces, variables and types. The constraints necessary to satisfy these assumptions are checked at compile-time revealing any violations of the constraints thus exposing potential bugs. This method has great advantages in that it is built on a tool that has been used with great success on large programs. This means that the learning curve for users who are already familiar with LCLint would not be steep at all. However adding annotations is an iterative process where with each iteration, LCLint detects some anomalies, annotations are added or discovered bugs are fixed and LCLint is run again to propagate the new annotations. I think this could be cumbersome for a large program. The authors agree that the method would benefit from work done to automate the annotation process. It also appears that the method is still lacking in the presentation of results. There is need to manage the amount of spurious messages and it is not clear how to determine what messages are indeed indications of bugs and which ones are redundant. And again since the method employs static analysis, there will be bugs that are not uncovered simply because they are dynamic in nature and would need a more dynamic approach. Something that is mentioned in closing that could be a great idea is the adding of annotations while a program is being developed. This approach might be the most practical for a method of this nature and might very well be a worth investment.
The use of Program Dependence Graphs in Software Engineering
The program dependence graph (pdg) is a language independent program representation. It can be a great tool in providing program understanding, determining differences in program versions and creation of new programs by joining pieces of old programs. Overall the concepts seem to be generally more complex and involved than was the case for all the other three tools/methods. There was the general pattern of doing something simple for one-procedure programs and then having to do something much more involved for multiple procedure programs. Well seeing as there are very few one-procedure programs in the real world which would need a tool for understanding, then the tool is most complex where it is most needed. The authors seem to tackle quite a few topics in one paper and honestly I kept getting lost between whether I was slicing or differencing or integrating. Apparently the work has been implemented in a tool but we are not told about the performance or ease of use of the tool only that even more additional techniques are implemented in the tool. My guess is that the tool is very experimental and very difficult to use because it tries to do too much. That said, the program dependence graph appears to be a powerful construct that could be used in a lot of ways to ease software program understanding and evolution