Copyright (c) 1997 by the University of Washington. All rights reserved.
This is a proprietary and confidential document. Under no circumstances
may this document be used, modified, copied, distributed, or sold
without the express written permission of the copyright holder.
Optimizing Programs Using Etch
As a general binary-rewriting facility, Etch can manipulate
binary executables for purposes of instrumentation or optimization.
Optimization may involve changing the organization of a program,
e.g., to move code in order to improve memory system behavior
or to rearrange instructions for a specific processor pipeline.
Some optimizations may require several passes: an instrumentation
pass to observe program behavior, and a modification pass to optimize
based on that observation.
Two optimizations have been implemented using Etch:
The post-processing of profile information for optimization uses a
number of simple commands which are described in greater detail in the
following section on Optimization Tools.
Optimization Steps
You can access Etch optimizations through the actions
Profile for Code Layout,
Optimize for Code Layout, and
Measure Optimized Performance,
in the Visual Etch user interface. These actions profile and optimize an
executable and the DLLs it uses, and provide a simple procedure
for comparing the performance of the optimized application to that of
the original. Below is the
step-by-step procedure for optimization using Visual Etch. This example
uses perl5 as the application, which is run using a training
script called test.pl.
- Select an application. [perl5\perl.exe]
- Select a script to run the application. [perl5\weblint\test.pl]
- Run the DLLwatch tool to ensure that Visual Etch knows
about all the DLLs used by the application.
- Click on "Modules" to open the Modules window.
- Check "Etch" in the Modules window for
each of the modules you want to optimize. [perl.exe,
perl100.dll, any other you want to specialize]
- In the main Etch window, select action "Profile for Code Layout".
- Click on "Execute action" to run the profile.
- After profile data has been collected, use the
"Optimize for Code Layout" action to generate an optimized executable.
- On a Pentium or Pentium Pro, use the
"Measure Optimized Performance"
action to compare the performance of the original and optimized
executable using hardware performance counters.
Note most of these actions are the same as any other Visual Etch
tool. Only steps 6, 8, and 9 are specific to optimization.
The above procedure implies testing and training on the same input.
You train and test on different inputs by changing the input script or
command line arguments before using the "Measure Optimized
Performance" action. Training with multiple runs of the program is
possible, but requires that you use the command line interface for
optimization.
Running Etch Optimization From the Command Line
The optimization scenario above uses three tools ("Actions") that
are available through the Visual Etch user interface. In general, we
expect that using Visual Etch in this way is the easiest way to
optimize. However, if necessary, you can optimize without using
Visual Etch by explicitly invoking several optimization tools, as well
as Etch, on the command line. This section is intended to document
the use of these command line optimization tools.
xprof (etch tool)
The modules xprof-inst.dll and xprof-rt.dll implement
the Etch profiling tool for optimization. When xprof is used to instrument
and profile an application module AMODULE.XYZ (EXE or DLL), the results
are recorded in four files:
- AMODULE.XYZ.xpi - This file is created during instrumentation and
contains information from a static analysis of AMODULE.XYZ. For
each instruction, the file includes: (1) the instruction address, (2)
the type of the instruction (call, branch, etc.), and (3) the instruction
target - valid only for calls, branches, etc.
- xprof.xoi - This file is created during instrumentation; it
provides a correspondence between names of etched modules and a unique
integer on [0..) appropriate for use as an index into an array of modules.
For each module etched with xprof as a part of an application, this file
contains a single line with the module index, number of instructions in
the module, and module name (excluding directory components), each separated
by white space. When xprof is used with a .exe file, xprof.xoi is truncated
and after instrumentation will contain a single entry for the .exe file.
When xprof is used to instrument a DLL, a new line will be appended to
the file. Several assumptions are implicit in the management of the xprof.xoi
file:
- Only one application per directory will be etched.
- Each application will have exactly one module with a .exe extension.
- The .exe module of an application will be etched first.
- Each DLL used by an application will have a unique name.
Note: The xprof.xoi module database could have problems if you select and deselect
modules during multiple cached optimization runs.
- AMODULE.XYZ.xpp - This file is created when the application
is run. It contains profile information for the application, more specifically,
the number of times each instruction in the application is executed using
one 32-bit counter for each instruction.
- AMODULE.XYZ.xpe - This file is created when the application
is run. It contains control flow information for control transfer instructions
with targets that cannot be resolved statically. Each
control flow graph edge is stored in ASCII as:
source-pc target-pc weight instruction-type
The first line in the file gives the number of edges in the file. The
format of this file allows it to be used directly by ppcfg.
xprof (command)
Xprof is a utility for processing profiles generated with the Etch xprof
tool. It should be used as:
xprof -x AMODULE.XYZ xprof-option
Xprof provides four options for profile processing:
- -s - Statistics. Provide summary statistics about the profile.
- -p - Profile. Print an ASCII version of the profile to standard
out. Consecutive instructions with the same counts are coalesced into a
single block. Data occurring between such instructions will be included
in the block.
- -c - Coverage. Print coverage information from a profile. This
is similar to the -p option except that only ranges with profile counts
of zero are printed.
- -d < in-layout > out-layout - Dead code option. This option
reads in a code layout specification of the format used by Etch, and moves
ranges of instructions with zero references to the end of the text segment.
It is used to post-process a procedure layout, implementing the fluff removal
optimization.
xp2edge
Like the xprof command, xp2edge processes output files from the xprof
profiling tool. Xp2edge reads xprof profiles and generates a file containing
weighted-call-graph edges, to be used as input to ppcfg. It takes two arguments,
the original name of the profiled module, and the file in which the call
graph edges are to be written:
xp2edge AMODULE.XYZ AMODULE.XYZ.cfg
prc2edge
This command takes a file of procedure boundaries generated by Etch,
(using the -Bp switch) and converts it into a set of unit-weight edges,
to be used as input ppcfg. Two edges are created for each procedure, one
a call from address 0x1 to the beginning of the procedure, and one call
from address 0x1 to the end of the procedure. These edges are used to define
the beginnings and endings of procedures, using the heuristic that the
targets of calls are the beginnings of procedures.
ppcfg
This tool implements the Pettis and Hansen procedure layout algorithm.
ppcfg outputfile edgefile ...
The first argument is the output file where the optimized layout should
be written. This is followed by one or more files specifying control flow
graph edges. Profile information from multiple training runs may be used
by using all profiling data as input to ppcfg, although edge weights for
different runs are not summed together. Procedure boundaries from Etch
should be converted using prc2edge and included as input files to ppcfg.