Copyright (c) 1997 by the University of Washington. All rights reserved. This is a proprietary and confidential document. Under no circumstances may this document be used, modified, copied, distributed, or sold without the express written permission of the copyright holder.

Etching Applications from the Command Line

Contents

1. Introduction
2. An Example of Using the Etch Command Line Interface
3. Etch Command Line Tools

1. Introduction

Etch can be used to transform executables in two ways. Normally, the user starts Visual Etch, specifies various parameters through the menus provided, and then clicks the Visual Etch "Execute Action" button to run the experiment. Selecting "Execute Action" causes Visual Etch and its helper scripts to invoke Etch on the modules selected, and then to execute the instrumented executable. As an alternative, you can run Etch directly at the Windows command line. Invoking Etch directly in this way requires that you understand more about the Etch process and the steps required to instrument an executable and its associated DLLs. This section describes that Etch command line interface, the steps required to etch an executable, and the individual tools required to use the command line interface.

2. An Example of Using the Etch Command Line Interface

As a step-by-step example of running Etch directly, we will show you how to etch an application, MS Word for Windows, using the command line interface. In this example, we use Etch to instrument the application with the instruction counting tool, "countiref." We first show how to etch the winword.exe executable itself, and then show how to etch some of the DLLs it uses in addition to the executable.

This example introduces the tools required to use Etch from the command line; we provide a more detailed description of those commands and their options in Section 3.

2.1 The Environment

Before you can use Etch from the command line, you must initialize your environment. First, update your path to include the directory where the Etch executable and runtime DLLs are kept. Assuming that you have installed Etch in c:\etch, you would add the bin directory to your path as follows:

X:\> set path=%path%;c:\etch\bin

Next you need to update your path to include the directory where the Etch tool DLLs are kept:

X:\> set path=%path%;c:\etch\apps\bin

With your path initialized as above, you are now ready to use Etch from the command line.

2.2 Instrumenting The Executable

Instrumenting only the executable with Etch is straightforward. You simply need to tell Etch the base name of the executable, the tool that you want to use, and the location of the executable. The general format of the command is:

X:\> etch -app <basename> -t <toolname> <executable>

where:

In our example, the basename is "winword" and the toolname is "countiref." We will assume that the executable is in the current directory as "winword.exe"; in this case we would specify the command:

X:\> etch -app winword -t countiref winword.exe

Executing this command will produce a new executable, winword-countiref.exe, in the current directory. This executable is a version of winword.exe instrumented to count instructions using the countiref tool. Etch found this tool by looking for countiref-inst.dll in your path (which you updated to include the Etch tool directory, as described above).

When run, winword-countiref.exe will produce an output file named countiref.output in the current directory. This file is the output of the tool, and will show the number of instructions executed by the Word application. Try running winword-countiref.exe, and then exit after Word finishes loading. Then invoke "type countiref.output" to show the number of instructions Word just executed.

2.3 The Executable and DLLs

Due to the dependencies and interactions among the etched executable, etched DLLs, and unetched DLLs, etching executables and their DLLs requires seven steps: (1) determining the complete set of DLLs used, (2) deciding which DLLs to etch, (3) etching the executable and chosen DLLs, (4) patching calls to LoadLibrary, (5) patching import tables, (6) creating an etch.config file, and (7) running the etched application. We will use Word again as our example application to illustrate this process; a perl script that performs the steps below can be found in util\etchscripts\etchword.pl in the Etch source tree.

  1. First we must determine the complete set of DLLs that the application uses. An application can use a DLL in two ways, either by statically linking to the DLL or by explicitly (dynamically) loading the DLL during runtime. Etch provides a utility, called DLLwatch, for discovering this set of DLLs. For Word, we would use DLLwatch as follows:

    X:\> dllwatch winword.exe

    This will run winword in the context of DLLwatch. Exercise Word in the way that you intend to use it for measurement. When you exit Word, the DLLwatch tool will generate a file, dllwatch.output, that shows all of the DLLs loaded by Word.

  2. We must decide which DLLs to etch. For the sake of this example, we will etch gdi32.dll and pscrptui.dll in addition to the executable, winword.exe.

  3. Next we etch the executable and the chosen DLLs. To do this, we use Etch on each module independently, much like we did when etching the executable alone. For our example, we execute the following commands:

    X:\> etch -app winword -t countiref -wrap Etch etchwrap.dll kernel32.dll -outbase 0x300000 -o winword-etch.exe winword.exe

    X:\> etch -app winword -t countiref -wrap Etch etchwrap.dll kernel32.dll -outbase 0x300000 -o gdi32-etch.dll %windir%\system32\gdi32.dll

    X:\> etch -app winword -t countiref -wrap Etch etchwrap.dll kernel32.dll -outbase 0x300000 -o pscrptui-etch.dll %windir%\system32\pscrptui.dll

    These commands will create three new modules in the current directory, winword-etch.exe, gdi32-etch.exe, and pscrptui-etch.dll, respectively.

    The command line for etching these modules is rather complex, so we'll go over the arguments individually and discuss what each of them do.

    • We've already seen the "-app winword" argument, which specifies an identifying tag for this application, and the "-t countiref" argument, which specifies which tool to use. Note that we need to specify the same application tag, "winword," and the same tool, "countiref," with each command.

    • The "-wrap Etch etchwrap.dll kernel32.dll" argument is used to wrap specific calls into kernel32.dll with calls customized by Etch. Literally, the arguments tell Etch to find all procedures proc exported by kernel32.dll such that Etchproc is also exported by etchwrap.dll, and to replace calls to the procedures proc in the binary being transformed with calls to Etchproc. By doing so, Etch is able to wrap calls to LoadLibrary, GetModuleHandle, and GetModuleFileName so that the Etch versions in etchwrap.dll can operate on the etched or patched versions of modules instead of their originals.

    • The "-outbase 0x300000" argument specifies that the etched modules should be loaded at virtual address 0x300000. We do this so that etched system DLLs do not crowd out unetched system DLLs (like user32.dll) from their original load addresses. Although the etched modules all have the same load address, the runtime loader will take care of relocating conflicting modules when the program starts up. We could also have taken care to specify nonconflicting load addresses if we had wanted to prevent relocating the modules at startup.

    • The "-o outputname" argument specifies the name of the output file.

  4. The previous step etched the modules in Word that we wanted to instrument. We must now patch all of the remaining modules in Word that were found by DLLwatch (and listed in dllwatch.output). To do this, we use the same wrapping mechanism that we used during etching, and for the same reasons - to ensure that the application always uses etched or patched versions of DLLs instead of the original versions. We use Etch with the same "-wrap Etch etchwrap.dll kernel32.dll" command line argument as above. We also use the "-outbase 0x300000" argument to change the load address of patched modules, and the "-o outputname" to specify the name of the patched modules. Putting all of this together for our example, we would invoke the following:

    X:\> etch -wrap Etch etchwrap.dll kernel32.dll -outbase 0x300000 -o advapi32-patch.dll %windir%\system32\advapi32.dll

    X:\> etch -wrap Etch etchwrap.dll kernel32.dll -outbase 0x300000 -o comctl32-patch.dll %windir%\system32\comctl32.dll

    X:\> etch -wrap Etch etchwrap.dll kernel32.dll -outbase 0x300000 -o mpr-patch.dll %windir%\system32\mpr32.dll

    X:\> etch -wrap Etch etchwrap.dll kernel32.dll -o user32-patch.dll %windir%\system32\user32.dll

    X:\> ...

    We repeat this step for each of the DLLs that Word uses, both statically and dynamically, except for kernel32.dll and ntdll.dll. These two DLLs are special, and should not need to be patched.

    Note: We do not relocate user32.dll using the "-outbase" argument (see the last command above); the kernel assumes that user32.dll will be loaded at its predefined load address, so the application will fail to run if user32.dll is relocated.

  5. Next we must patch the import tables of every module used by the application. We patch import tables so that the modules load the etched and patched versions instead of the originals. To patch the import tables, we use the dllwalk "-map" command. The map command requires you to have a file listing the modules that you have etched and the new names that you have given the etched versions. For our example, we'll create a file called "winword.dmf" with the following entries:
    	 winword.exe:winword-etch.exe
    	 gdi32.dll:gdi32-etch.dll
    	 pscrptui.dll:pscrptui-etch.dll
    
    Now that we have this file, we can use dllwalk:

    X:\> dllwalk -map winword.dmf winword-etch.exe
    X:\> dllwalk -map winword.dmf gdi32-etch.dll
    X:\> dllwalk -map winword.dmf pscrptui-etch.dll
    X:\> dllwalk -map winword.dmf advapi32-patch.dll
    X:\> dllwalk -map winword.dmf comctl32-patch.dll
    X:\> dllwalk -map winword.dmf mpr-etch.dll
    X:\> ...

    We repeat the use of dllwalk on every module used by Word. Note, however, that, as with the patching step, we do not map the imports of kernel32.dll and ntdll.dll.

  6. We must create the file etch.config. This file contains a line for each module used by the application. For modules that are etched or patched, the line contains: the full path of the original module, followed by "->", followed by the name of the transformed module. For modules that are neither etched nor patched, the line contains only the name of the module. Also, any line that begins with a '#' is considered a comment and ignored. For this example, the etch.config file would appear as follows:

    # etched modules
    C:\MSOFFICE\WINWORD\WINWORD.EXE->WINWORD-ETCH.EXE
    C:\WINNT\SYSTEM32\GDI32.DLL->GDI32-ETCH.DLL
    C:\WINNT\SYSTEM32\PSCRPTUI.DLL->PSCRPTUI-ETCH.DLL
    # patched modules
    C:\WINNT\SYSTEM32\ADVAPI32.DLL->ADVAPI32-PATCH.DLL
    C:\WINNT\SYSTEM32\COMCTL32.DLL->COMCTL32-PATCH.DLL
     .
     .
     .
    # untouched modules
    KERNEL32.DLL
    NTDLL.DLL
    
  7. Finally, we are ready to run the etched application. To run Word with the executable winword.exe and the DLLs gdi32.dll and pscrptui.dll instrumented to count instructions, we simply need to run winword-etch.exe:

    X:\> winword-etch.exe

    When you exit from Word, the countiref tool will create a file "countiref.output" that shows the total number of instructions executed, as well as a breakdown among the etched modules.

3. Etch Command Line Tools

Etch is composed of a number of separate programs that can be invoked from the command line with a variety of command line flags, as demonstrated in the example above. This section describes those programs, which have the following functions:

3.1 The Etch Binary-Rewriting Engine (etch)

Etch is the core engine for reading and rewriting a binary. As noted above, it is easier to let Visual Etch figure it all out for you. However, for the advanced user, or for the curious, here are the basic options supported from the Etch command line.

Note that it is possible to use invoke Etch on a binary without applying a tool. For example, it is possible to create a new binary with a different image base (-outbase), or with some DLL calls replaced (-wrap), without applying any instrumentation. (In the current implementation of Etch, any invocation will produce a new import table and new export table, even if there were no changes to the import or export information.)

Instrumentation

There are two ways of invoking Etch to apply a specific instrumentation tool to a module.

The -t, or -i and -r, flags specify the instrumentation and runtime DLLs to use in rewriting the binary. When the '-t tool' flag is used, the instrumentation and runtime DLL names are assumed to be "tool-inst.dll" and "tool-rt.dll". Etch expects to find the instrumentation and tool DLLs in the directory specified in the TOOL_ENV environment variable.

The -app flag specifies a tag to identify the application being rewritten. The same tag should be specified for the executable and all the DLLs etched as part of a single application. (The -app flag can be omitted if instrumentDll does not export InstrumentProgram.)

An Example

For example,

etch cache hello.exe

will produce hello-cache.exe.

Alternatively, you can use

etch [FLAGS] -i instrumentDll -r runtimeDll target

This is identical to the previous invocation, except that instead of specifying a toolname, the user explicitly specifies the names of the two DLLs that make up the tool. If a full path is not specified Etch will check the current directory as well as the %ETCHTOOLS% directory.

For example, these two invocations of Etch are equivalent:

etch -i cache/cache-inst -r cache/cache-rt -o hello-cache.exe hello.exe
etch -t cache hello.exe

Miscellaneous Options

-o output
specify output filename
-outbase (hex)address
specify the image base of the resulting binary. (default: same as the input binary)
-stamp arg
No effect other than to augment the stamp and Etch command line recorded in the output file. (the utility program 'readetchstamp' can be used to retrieve this information.)
-log file
Write statistics and error messages to file.
-x
Turn off checks for compiler-specific constructs during code discovery.
-Ur
Keeps going even if there are no relocations in the binary. Etch requires that the binary have relocation information to produce correct binaries. Setting this flag would make etch produce bad binaries but is useful for tools that use etch only to analyze the binary (e.g. a disassembler)
-p
Generates a map between the new PC and old PC that can be used by the tool runtime to translate addresses using the EtchOldTargetToNew and EtchNewTargetToOld API calls.

Optimization (code layout)

-l layout
use instruction layout in the file 'layout'.

The layout is an ASCII file describing the order in which the output binary should be written. Each line in the file is a whitespace separated pair of hex numbers representing a base and limit of PCs in the uninstrumented program to be written. In addition, the special tokens BEGIN and END can be used in place of numbers to represent the beginning and end of the section. (At present code layout is only supported for programs containing a single esection.)

Note that the range 'x y' represents pcs from [x .. y-1], NOT [x..y].

For example, to interchange two procedures of length 0x100 at addresses 0x1000 and 0x2000, the following layout file would suffice:

  BEGIN 	0x1000
  0x2000	0x2100
  0x1100	0x2000
  0x1000	0x1100
  0x2100	END

Typically code layout files are generated automatically as part of the code layout optimization tool.

Using debugging information

Files compiled with debugging information (Codeview format) can be inspected with the -d option.
-d debug
use debugging information in the file

If the cvdump utility has been used to generate debugging information, then this flag can be used to specify the name of the file containing the resulting information.

Generating additional information

Some Etch options let you generate some additional information about the structure of a program.
-Bp file
dump procedure boundaries into the specified file
-Bb file
dump basic block boundaries into the specified file
-Bu file
dump unknown address ranges into the specified file

Context save protocol

These two flags control how careful Etch is about preserving state between instructions when inserting calls to the tool runtime. The default is to not save floating point state, and to assume that data above the top of the stack is dead.
-Cf
save floating point registers on procedure call
-Cs
preserve 48 bytes above stack on procedure call

Code discovery

The flags control the heuristics used during code discovery. By default, code discovery is aggressive; in practice, this rarely results in data being misidentified as code. These options indicate that less aggressive heuristics should be used.

-O0
Use only the entry point as the root set for code discovery.
-O1
Use the entry point, exports, and relocations as root sets for code discovery. For exports and relocations, use heuristics to determine whether they refer to code or data.
-O2
Same as -O1, but also iteratively traverse the remaining unknown byte sequences and use heuristics to determine whether they refer to code or data.

Replacing DLL Calls

-wrap PREFIX WrapperDll TargetDll
For each function 'xyz' such that xyz is imported by target from targetDll, and PREFIXxyz is exported by wrapperDll, replace all calls in target to xyz with calls to PREFIXxyz.
This is used internally by Etch to replace calls to the KERNEL32.DLL functions LoadLibrary and GetModuleHandle with calls to the Etch runtime library.

Help and version information:

etch -h
etch -?
print command line help and exit
etch -v
print version information and exit

3.2 Extracting Debugging Information (cvdump)

The cvdump tool is used to extract CodeView debugging information from a win32 executable or DLL. The user must run cvdump on each component: once on the .exe file, and once on each DLL that the user plans to instrument. As with Etch, normally there is no reason to run cvdump from the command line: Visual Etch has all the machinery to invoke cvdump correctly.

Each executable or DLL must have been compiled to include the CodeView debugging information, and the only revision of CodeView debug info that cvdump understands is NB09. Cvdump will not attempt to extract debugging information if the revision number is not NB09 or NB11. NB09 can be generated by Microsoft Visual C++ versions 2.0 and 4.0. The user must give the /pdb:none option to the Visual C++ linker in order to get debug information included in the .exe or .dll file, otherwise Visual C++ puts the debug info into a .pdb file, and cvdump cannot read .pdb files.

The cvdump tool is invoked by:

cvdump -p hw.exe > cv.out 

After cvdump is run on each component, the outputs from all components must be combined. The combined outputs must then be postprocessed by the cvfilter perl script, before the output is given to Etch, using the -d option. For example, assuming that we had two components on which we had run cvdump, we would execute the statements:

type cv1.out cv2.out | perl cvfilter.pl > cv.new 
etch -t mytool -d cv.new hw.exe 

Note: Some compilers split up procedures into multiple non-contiguous segments. In this case, cvdump will label the segment containing the procedure entry point with the procedure name and label other segments with the procedure name followed by a semicolon ";".

3.3 Determining Implicitly-Loaded DLLs (dllwalk)

The dllwalk utility is used to identify the set of implicitly loaded DLLs used by an application or a DLL, and also to modify the import table of a single DLL to import etched or patched DLLs instead of the originals.

dllwalk -list executable
Print the list of all static DLLs that the executable will load when it runs. This list is based solely on the import tables of the executable and the DLLs it imports (and the DLLs they import, etc.)

dllwalk -map mapfile executable
Update the import table of <executable> to reflect the mapping of original DLL names to etched or patched DLL names in mapfile. mapfile is an ASCII file containing one <original name>:<new name> pair per line. For example, if the file x.dmf contained the line "SHELL32.DLL:SHELL32-etch.DLL", then running

dllwalk -imagebase module
Print the image base of the specified module to stdout.
dllwalk -batch list batchfile
For each module listed in batchfile, print the list of all static DLLs that the executable will load when it runs. This is the equivalent of running "dllwalk -list" on each module in the batchfile separately. Modules should be listed using their absolute paths.
dllwalk -batch imagebase batchfile
Print the imagebase for each module listed in batchfile. This is the equivalent of running "dllwalk -imagebase" on each module in the batchfile separately. Modules should be listed using their absolute paths. dllwalk -map x.dmf y.dll

would change all references to SHELL32.DLL in y.dll to SHELL32-etch.DLL.

dllwalk -map will apply the changes only to the executable or DLL listed on the command line, not to any any of the DLLs that it imports.

3.4 Determining All Loaded DLLs (dllwatch)

The DLLwatch utility is used to identify the set of all DLLs loaded by an application. This set includes implicitly-loaded DLLs listed in the import tables of the application modules, as well as the explicitly-loaded DLLs loaded by the application during runtime.

dllwatch application
Execute application and have dllwatch keep track of all DLLs it loads. A list of DLLs loaded is placed in the file dllwatch.output in the current directory. Typically, there is one "unknown" DLL listed in dllwatch.output; this DLL is actually ntdll.dll, which does not identify itself properly.

3.5 Monitoring DLL usage and references to DLLs (monitor)

The monitor utility runs executable-name and reports the DLLs it loads, as well as any instructions executed in the original (unetched) text segment of etched modules. In addition to writing some information to standard output and standard error, it summarizes the resulting information in the file monitor.log.

Usage is

  monitor executable-name

3.6 Reading the stamp in an etched output file (readetchstamp)

The readetchstamp simply reads the version, time stamp and command line from an etched executable and prints it to standard output.

Usage is

  readetchstamp executable-name