Copyright (c) 1997 by the University of Washington. All rights reserved. This is a proprietary and confidential document. Under no circumstances may this document be used, modified, copied, distributed, or sold without the express written permission of the copyright holder.

Optimizing Programs Using Etch

As a general binary-rewriting facility, Etch can manipulate binary executables for purposes of instrumentation or optimization. Optimization may involve changing the organization of a program, e.g., to move code in order to improve memory system behavior or to rearrange instructions for a specific processor pipeline. Some optimizations may require several passes: an instrumentation pass to observe program behavior, and a modification pass to optimize based on that observation.

Two optimizations have been implemented using Etch:

  • Procedure layout. In this optimization, Etch is used with the xprof tool to record a profile for a training run of the program. The profile is processed to generate a weighted call graph (WCG), from which an optimized procedure layout is computed. The optimized procedure layout is applied to the original executable using the optional layout specification to Etch.

  • Fluff removal. This inputs an exact profile of a training run of the application, again generated using xprof. The profile is processed to identify sequences of instructions that are rarely or never used. These instruction sequences are moved out of the normal program flow, with the optimized layout again applied using the optional layout specification to Etch.
  • The post-processing of profile information for optimization uses a number of simple commands which are described in greater detail in the following section on Optimization Tools.

    Optimization Steps

    You can access Etch optimizations through the actions Profile for Code Layout, Optimize for Code Layout, and Measure Optimized Performance, in the Visual Etch user interface. These actions profile and optimize an executable and the DLLs it uses, and provide a simple procedure for comparing the performance of the optimized application to that of the original. Below is the step-by-step procedure for optimization using Visual Etch. This example uses perl5 as the application, which is run using a training script called test.pl.

    1. Select an application. [perl5\perl.exe]

    2. Select a script to run the application. [perl5\weblint\test.pl]

    3. Run the DLLwatch tool to ensure that Visual Etch knows about all the DLLs used by the application.

    4. Click on "Modules" to open the Modules window.

    5. Check "Etch" in the Modules window for each of the modules you want to optimize. [perl.exe, perl100.dll, any other you want to specialize]

    6. In the main Etch window, select action "Profile for Code Layout".

    7. Click on "Execute action" to run the profile.

    8. After profile data has been collected, use the "Optimize for Code Layout" action to generate an optimized executable.

    9. On a Pentium or Pentium Pro, use the "Measure Optimized Performance" action to compare the performance of the original and optimized executable using hardware performance counters.

    Note most of these actions are the same as any other Visual Etch tool. Only steps 6, 8, and 9 are specific to optimization.

    The above procedure implies testing and training on the same input. You train and test on different inputs by changing the input script or command line arguments before using the "Measure Optimized Performance" action. Training with multiple runs of the program is possible, but requires that you use the command line interface for optimization.

    Running Etch Optimization From the Command Line

    The optimization scenario above uses three tools ("Actions") that are available through the Visual Etch user interface. In general, we expect that using Visual Etch in this way is the easiest way to optimize. However, if necessary, you can optimize without using Visual Etch by explicitly invoking several optimization tools, as well as Etch, on the command line. This section is intended to document the use of these command line optimization tools.

    xprof (etch tool)

    The modules xprof-inst.dll and xprof-rt.dll implement the Etch profiling tool for optimization. When xprof is used to instrument and profile an application module AMODULE.XYZ (EXE or DLL), the results are recorded in four files:

    xprof (command)

    Xprof is a utility for processing profiles generated with the Etch xprof tool. It should be used as:

            xprof -x AMODULE.XYZ xprof-option
    

    Xprof provides four options for profile processing:

    xp2edge

    Like the xprof command, xp2edge processes output files from the xprof profiling tool. Xp2edge reads xprof profiles and generates a file containing weighted-call-graph edges, to be used as input to ppcfg. It takes two arguments, the original name of the profiled module, and the file in which the call graph edges are to be written:

            xp2edge AMODULE.XYZ AMODULE.XYZ.cfg
    

    prc2edge

    This command takes a file of procedure boundaries generated by Etch, (using the -Bp switch) and converts it into a set of unit-weight edges, to be used as input ppcfg. Two edges are created for each procedure, one a call from address 0x1 to the beginning of the procedure, and one call from address 0x1 to the end of the procedure. These edges are used to define the beginnings and endings of procedures, using the heuristic that the targets of calls are the beginnings of procedures.

    ppcfg

    This tool implements the Pettis and Hansen procedure layout algorithm.

            ppcfg outputfile edgefile ...
    

    The first argument is the output file where the optimized layout should be written. This is followed by one or more files specifying control flow graph edges. Profile information from multiple training runs may be used by using all profiling data as input to ppcfg, although edge weights for different runs are not summed together. Procedure boundaries from Etch should be converted using prc2edge and included as input files to ppcfg.