Report on Static Analysis Tools by Shaheeda Parveen, CSE 584

The use of Program Dependence Graphs in Software Engineering

This paper describes the program dependence graph and discusses how program dependence graphs together with operations such as program slicing, can form the basis for powerful programming tools that aids in I)understanding what an existing program does and how it works ii)understanding the differences between several versions of a program iii)integration of different programs into one program. This paper categorizes the problems mentioned above into three classes I)slicing problems if the problem involves one program ii)Differencing problems if the problem involves two programs and iii)Integration problems if it involves three or more programs. Most of the solutions to these problems proposed in this paper depend on the static analysis of the programs.

One of the solutions is using Dependence Graphs. There are two different kinds of dependence graphs presented with some examples. I)Program dependence graphs which represent single procedure programs and ii)System dependence graphs to analyze programs with multiple procedures and functions. There is also a variation of Dependency graphs called Program Representation Graphs, defined in this paper, combine features of program dependence graphs and static single assignment forms. Algorithms to solve slicing problems for single-procedure and multi-procedure programs are discussed with examples. The algorithm for solving single-procedure slicing involves computing forward and backward slicing using traversals of program dependence graphs. This algorithm has a draw-back caused as a result of not accounting for calling context. The algorithm for multi-procedure uses system dependence graphs that correctly handles the calling-context problem and involves two passes.

Some techniques for Program differencing for single-procedure programs and multi-procedure programs are presented that solve three problems. 1)Finding the difference between two versions of a program, given a correspondence between the two versions 2)Finding the difference when the correspondence between the versions is not given. 3)Given two programs finding the changed behaviors between them. This paper also presents techniques for program integration for single-procedure programs and multi-procedure programs in the following scenarios. 1)When correspondence among components of two program Bases, Base A and Base B, are given. 2)When correspondence is not given. Although the algorithms discussed in this section are semantics-based program integration techniques, they do not support a full-fledged programming language. Nonetheless these techniques are a step-ahead from text based program integration algorithms and they provide a theoretical foundation for building semantics based program integration tool.

One of the challenges we in the industry face as we approach the release date of a product is verifying a particular bug-fix to see if it caused any side-effects elsewhere in the product. Most of the testers automate their test cases and run all the test cases on the version that has a bug-fix. But this is not fool-proof since a scenario need not be covered by the set of test cases. But a tool that determines the semantic or behavioral differences between two versions of a product, could minimize surprises just before shipping and might help improve the quality. Although this paper demonstrates the techniques on multi-procedure programs, we still don’t know how well these techniques scale when being applied to commercial products. Also the performance and usability of these tools is in question. Even if somebody built tools like this and made them popular in commercial world, would people switch from using their automated tests to verify a version?

 

Dynamically Discovering Likely Program Invariants to Support Program Evolution

This paper describes techniques for dynamically discovering invariants along with an instrumentation and inference engine that implements these techniques. The paper also includes the results of the application of this engine to a set of programs from Gries’s work and another set of programs from Siemens. The basic approach used by the invariant detection engine consisted of instrumenting the source program to trace the variables of interest, running the instrumented program over a set of test cases and inferring invariants over both the instrumented variables and derived variables that are not manifest in the original program. Invariant detector analyzes the output of an instrumented program and lists the invariants detected at each instrumented program point. In addition to running the engine with the test input data associated with test cases, the engine is also passed some derived variables. Passing derived variables to the invariant inference engine allows it to infer invariants that are not hard-coded into its list. Since the variable derivation and invariant inference benefit from access to previously-computed invariants, derived variables are introduced in stages to make sure that invariants have been computed before any variables are derived. This paper also demonstrated that derived invariants can help a programmer to evolve a program that contains no explicitly stated invariants by using the Replace program from Siemens suite.

Some scalability test tests are performed and the results are given in three measurements: The effect of number of variables on invariant detection time, the effect of test suite size on invariant detection time and the effect of test suite size on what invariants are detected. These measurements are made by instrumenting and running the Simens Replace program on 500 test cases randomly selected from 5442 supplied with the program. It has been determined that the most critical factor of invariant detection time is the number of variables over which invariants are checked. The Test Suite Size has less significant effect on invariant detection runtime, and less affect on invariant precision. On a program with large data set and large number of derived invariants these techniques might prove to be useless just because of the quantity. To ease this problem the authors have developed a tool with a User Interface that allows filtering or fine-tuning of the data that is presented to them.

The techniques discussed in this paper have sufficient drawbacks to prevent every individual product groups from attempting to instrument the code. It seems like these techniques are more suitable to act as a foundation for somebody to develop a commercial tool or an integrated environment that solves the problems discussed in this paper. Some of significant drawbacks come from the instrumentation technique. Just as quoted in this paper, Instrumentation is conceptually simple, but requires care in practice.

Static Detection of Dynamic Memory Errors

This paper briefly outlines some of the common bugs that are inherently written in code and how LC Lint, which is a very commonly used tool to detect most of these problems fails to detect some of them. It then extends LCLint to detect a broad class of important errors including misuses of null pointers , failures to allocate and deallocate memory, uses of undefined or deallocated storage and dangerous or unexpected aliasing.

This paper recommends Annotations as a means to explicitly define what would have been otherwise assumptions. Annotations are also proposed to be used to detect most of the above mentioned errors.

Annotations can be used LCL specifications or directly in source code as syntactic comments. The annotation specification presented in this paper, effectively solves the following problems. i)Problems with de-referencing null pointers, efficiently provides a way of returning undefined storage from a function intentionally, solves the memory allocation/de-allocation problems. The annotations specification is analyzed and applied to an example to demonstrate that annotations can be added to an existing program to improve its documentation, maintainability and in the process detecting errors.

I agree that if the code is annotated correctly according to the specifications, it might lead to better maintainable code and to improve the quality. But I think if the code is annotated at the time of development, the annotation might be accurate, since the person generating the code know exactly at the time what the different parameters or global variables could be or how they could be used. But this depends on the discipline of the programmer. On the other hand if the annotations are added to the existing code at some time after the code was written, whether the code is being annotated by the same programmer who wrote the code or by another programmer, it runs the risk of not being annotated precisely. In conclusion while annotating the code is a great concept, it still suffers with some of the problems we face practically everyday.

 

Lackwit: A Program Understanding Tool Based on Type Inference

This paper introduces a tool called Lackwit that applies type inference analyses to C programs for program understanding tasks. This tool uses a new type system in which representation information in encoded in the types. It expresses representation constraints as type constraints and obtains and solves them by type inference. The type inference algorithm computes new types for variables and textual expressions of a program based on their actual usage. Given a variable or textual expression of interest, its inferred type is examined and conclusions are drawn about the variable’s usage and behavior, and by finding other occurrences of the type it is possible to track propagation and aliasing relationships in the global program.

The result of type inference phase is a mapping from source variables and functions to type signatures in the type system. This paper provides ways to extract interesting information and display it graphically in a way that has a clear relationship to the source program. One of the basic ways is to produce a graph summarizing the information about a single component of a variable.

This paper compares Lackwit with other tools known for analyzing source code for program understanding, for three different feature: Semantic Depth, Global display, Scalability. It is shown that this is the only tool that has all the three features. It them demonstrates its functionality with an example. This tools suffers from some draw-backs like scalability, poor query interface etc.

In conclusion, tools that analyze code for program understanding or to determine differences between different versions, are still not very widely used. For these tools to become popular, these tools should be offered as a suite of tools integrated into a development environment. Also one common problem with all the static analysis tools is the amount of data presented to the user. Some tools do provide sophisticated filtering, but it is not very intuitive. An easy and universal way of presenting the analyzed data, should be evolved.