Radu Bacioiu, CSE 584, fall 1998. Assigned Readings - Evolution.
While reading the papers in this package I couldn't help going back to the question asked in class (Oct. 27th) - "Why isn't the use of tools more wide-spread in the industry." I have to admit that the answer that seemed to prevail in class - namely "lack of demand" - surprised me. Personally I think the problem stems from the fact that programmers have little faith in finding an easy-to-use tool that scales well and is reliable. This reminds me of a Dilbert cartoon where a little devil comes to Dilbert holding a software CD and saying: "Here's a suite of applications that work together like one." To which Dilbert, without even turning his head from his computer, says: "Go away." That's exactly my feeling when I hear about yet another groundbreaking tool. You wouldn't expect somebody who works in Microsoft's "Office" group to tell this joke, but then you have to recognize that a lot more effort and resources have been put into building "suites" of applications than in building Software Engineering tools.
To repeat what I stated above,
a tool should be at least:
1) Easy to use - the amount of time invested in learning the tool should be significantly lower than the time you expect to spend doing the same job by hand
2) Scalable - complexity of tool output should grow at most linearly with the overall size of the code.
3) Reliable - doesn't "miss out" things like obfuscated pointer references to an object, etc.
I was pleasantly surprised to find grep mentioned in a few of the papers in this package as a reference tool. Grep has a proven record of being widely used by programmers everywhere and the authors of the respective papers implicitly recognize its reference status. If you think about it, grep does abide by the three rules above. Is my set of rules complete?
Now given that programmers have seen at one time or another tools that grossly came in below their expectations one can't condemn them too harshly for being skeptical about Software Engineering tools. From this point of view one can consider the problem as being "lack of demand," but in my opinion the lack of demand is just a side effect of the lack of "good" tools (in the sense of obeying at least the rules above) on the market.
A while ago I had to do some maintenance work which consisted in taking some code, isolating the communication layer and replacing it with services provided by a newly developed OLE component. Documentation was practically inexistent and the owner of the code was working in a different group (we actually got the code from a different group). A good tool that would have allowed me to understand the structure of the old code (perhaps a tool similar to the one described in the RIGI paper) would have probably saved me one to two months in project that lasted for about seven months.
One last (unrelated?) comment I have is that complex modern software products have to deal with threads, synchronization, event-driven (implicit) invocation, etc. However, tools described in research papers usually boast with analyzing batch-processing programs. Well, great! But how much of this research does apply to the real world code? Do threads and implicit invocation make the software qualitatively different from the software attacked by research papers? My opinion is that yes, they do.
1. Software Aging - D.L. Parnas
The author states that software products have become complex enough to make thinking long term (beyond the immediate release of a product) a must for most successful commercial applications. He does a good job pointing out the (apparently contradictory) sources of aging: failure to modify a program (adapting it to current needs) and modifying it (unwisely). However, designing for change is easier said than done. I have seen code that was designed for change, but the change comes too often where the designer didn't expect it. What you end up having is code that is generic enough to be slow but not generic enough to be easily changed - the worst of both worlds. The role of documentation and code reviews is also considered. Parnas claims that the reason code reviews aren't popular is that "Many programmers regard programming as an art and resent the idea that anyone could or should review the work that they have done." However, in my experience code reviews are rare merely because they tend to be time consuming not only for the reviewee but also for the reviewer.
Some of the comments in this paper hit fairly close to home - in both a good and a bad way. For instance, I have been in the position where restructuring a fragment of my code would have been the right path to take (see section 7.5: Major surgery - restructuring) but I shied away from it - and now I regret it. On the bright side, I am in the habit of adding comments to tricky/poorly commented code after I spend some time deciphering it (see section 7.2: Retroactive documentation). This way another developer who happens to come across the same code will (hopefully) spend less time trying to understand what's going on.
Regarding the feasibility of formally (legally?) defining the Software Engineering profession, I'm fairly skeptical at this point. Formal knowledge about problems in Software Engineering does help a developer in his work (or so I hope by taking this class), but to go as far as requiring a license (Parnas calls it "industry's seal of quality") to work in the software industry might be taking regulations too far.
2. Automated Support for Encapsulating Abstract Data Types - R.W. Bowdidge and W.G. Griswold
The idea in this paper seems very appealing - a tool that takes a piece of code and gives you an image of its structure: Wow! However, on a closer look one will find serious problems regarding the three "principles" I have stated in my little introduction.
Here's a quote from the paper that gives me a funny feeling about the scalability of the tool: "[...] presenting the text of the program in a window, allowing the tool user to select a portion of a program and apply [...]." If only source code were that small, localized and simple!
Then there is the question of reliability - more specifically, if I want to take this tool and apply it to my newly assigned C project, would it miss weird pointer references that modify some data? My fears are confirmed in an inconspicuous footnote, which states that "In the C programming language, however, coercion and pointer arithmetic can disguise an alias."
The Star Diagram described in this paper also requires the user to know something about the structure of the program, and namely to know exactly what code is relevant to a specific data structure. This is not an exploration tool (like the RIGI tool or the Reflexion model) but rather a re-modularization tool. But if a programmer understands the source so well as to be able to tell exactly where each piece of code resides, then his job is already half done!
3. Reengineering with Reflexion Models: A Case Study - G.C. Murphy and D. Notkin
Not to suck up to David, but I think the tool described in this paper is the most down to Earth from the "Software Evolution" set of readings. In contrast to the Star Diagram it tries to achieve a more attainable goal - gaining a high level understanding of the code - with the most interaction from the user. The RIGI tool comes close to these goals but it is more ambitious when it tries to rigorously define the interfaces between various modules in the code.
The process presented in this paper is an iterative process. The user starts from a high-level model, a source model (paper is stingy with the details about how the source model is derived) and a map that associates code with the modules in the high-level model. From these elements a reflexion model is computed. The user evaluates the reflexion model by comparing it to the actual code. Where discrepancies are observed the user either modifies the map to account for an unexpected interaction in the reflection model or gains knowledge about a real interaction in the code he didn't know about - thus the tool serves its purpose.
One thing that came to mind when reading this paper was that if "the engineer" would have come up w/ a more complex high-level model (comprising of say 100 modules instead of 13) then probably the work involved in applying the Reflexion Model would have been much harder. While the definition of the map can be incrementally refined, starting with the wrong high-level model could make a big difference in the outcome of the modeling process.
Another important point made in this paper is that "pretty pictures" aren't always the most useful way to convey information to the human user. This goes against the implicit or explicit (but nonetheless unfounded) assumption in other papers that "a text-based restructuring tool sometimes presents information inappropriate for the task and obscures information that is required." (Bowdidge and Griswold) The "unwritten" assumption here is that a graphical presentation of the data will always be better than a textual one. When I think of the tools that work (my smart editor, grep) and the tools that don't (you name it) the balance favors the textual presentation of data.
4. A Reverse Engineering Approach to Subsystem Structure Identification - H.A. Mueller, M.A. Orgun, S.R Tilley and J.S. Uhl
The RIGI system described in this paper is a source exploration tool. During the process of extracting the structure from the code, the user has a significant role in pruning out the "omnipresent nodes" and establishing the right thresholds necessary to achieve the "right" partition into modules. The authors recognize the inherent limitations of any automated tool and try to give the user enough control to steer the modularization process in the right direction. In contrast to the Star Diagram tool, RIGI doesn't make code changes but only helps the user to extract the structure of a software application. The author highlights common pitfalls and suggests how a user might go around them. Debug code, error reporting code and common library functions will all appear everywhere (all are omnipresent nodes) but the user should know how to hide the impact of the non-structural code (debug code and error reporting code) but keep the other around (library functions). Grouping functions by name can help if the code was written using consistent naming conventions but it can be misleading otherwise.
The software analyzed by the authors in this paper using the RIGI tool is a structured piece of software having more or less a tree-like hierarchy. Is this a requirement for RIGI to work properly? What problems should one expect if the tool is applied to spaghetti code? The authors don't comment on this aspect (should we call it versatility?) of their tool.