An Etch tool works in two phases: the instrumentation phase and the execution phase. In the instrumentation phase, Etch reads in a Win32 binary -- either an executable or a dynamic-linked-library(DLL) -- and modifies this binary based on the instrumentation code supplied by the tool creator. The instrumentation code defines (exports) a set of callback procedures, whose interface is given in this document. The Etch engine calls those procedures at certain times, as it processes the executable. At each callback, Etch passes information about the input executable to the instrumentation routines. Based on that information, the callback routines can query Etch for further information or instruct Etch as to what modifications should be made to the executable. In the execution phase, the user runs the binary that was created as output of the instrumentation phase.
Building a new tool with Etch therefore requires (1) an instrumentation DLL "<toolname>-inst.dll" and optionally, (2) a runtime DLL "<toolname>-rt.dll". The instrumentation DLL must contain any callback routines that the tool creator wants Etch to invoke. The instrumentation routines may cause Etch to insert procedure calls into the executable, in which case these procedures must be implemented in the runtime DLL.
We provide a header file, "etch-api.h", to assist programmers in building instrumentation callbacks. This header file gives the tool creator access to a set of support routines for obtaining information about the binary and for rewriting the executable.
Any routines whose names and arguments were added to the executable by the instrumentation routines should have implementations in the runtime DLL, "<toolname>-rt.dll". One common task for a runtime routine is to measure some hardware event (such as cache-misses or CPU-cycles) over a certain code region. We provide a programming interface to the Pentium hardware performance counters that tool writers can use to measure such events.
Since Etch drives the instrumentation process, the programmer implementing the instrumentation code needs to understand how and when Etch will invoke the callbacks in the instrumentation DLL that he provides. The order is as follows:
InstrumentInit() For each module that is etched as part of the total program: For each procedure in the program: InstrumentProcedure(Before) For each basic block in the Procedure: InstrumentBasicBlock(Before) For each instruction in the Basic Block: InstrumentInstruction(Before) InstrumentInstruction(After) InstrumentBasicBlock(After) InstrumentProcedure(After) InstrumentModule(Before) InstrumentModule(After) InstrumentProgram(Before) InstrumentProgram(After) InstrumentCleanup()Etch invokes all of these instrumentation routines in an intuitive order except for InstrumentModule(Before) and InstrumentProgram(Before), which aren't invoked until after all of the procedures, basic blocks, and instructions are instrumented. We do this because it is often convenient to allocate data structures just before the program runs (or before the module is initialized) based on the results of instrumentation. For example, our branch prediction tool allocates a table whose size is the number of unique branch targets in the application, and this number isn't known until InstrumentInstruction() has been called on each instruction.
Invocations to the instrumentation routines InstrumentInstruction, InstrumentBasicBlock, and InstrumentProcedure pass a handle to corresponding component found. For example, InstrumentProcedure passes a ProcNum, which is an integer used to refer to the procedure on behalf of which the callback is made. If debug information is available, procedure ProcGetName can be used to find the name of a procedure, given its ProcNum. Similarly, the Etch call to InstrumentInstruction passes a handle to the instruction on behalf of which the call is made; the instruction handle, of type InstPtr, can be used to determine the instruction type, its PC, its byte encoding, and so on, when passed to the appropriate support routines.
Note that InstrumentModule(After) is only called for DLLs, not executables. Furthermore, it may be called after InstrumentProgram(After) in some cases.
Note that InstrumentProcedure(After) will be invoked once for every exit point in the procedure. Also, any code inserted during Procedure(After) will be inserted directly before the control flow instruction (typically ret) corresponding to the exit point.
Note that if the last instruction of a basic block is a control flow instruction, then any code inserted during BasicBlock(After) will be inserted directly before the instruction.
Note that if the instruction is a control flow instruction, then any code inserted during InstrumentInstruction(After) will be inserted before the instruction. The inserted code will be executed in the expected order with respect to other inserted code (e.g., after code inserted during InstrumentInstruction(Before), BasicBlock(Before), Procedure(Before), etc.), but will also be executed before the control flow instruction.
The following instrumentation support routines are provided by (exported by) Etch. They can be called by instrumentation routines during the instrumentation phase. The support routines are used to instruct Etch to make changes to the executable or to query Etch for additional information about the executable or its components.
The support routines that insert Call instructions save all general-purpose integer registers and flags. If the "save floating point state" flag is set for the module, floating-point state is saved as well.
void InsertInst(char *newInst, int newInstLen);
void ReplaceInst(char *newInst, int newInstLen);
A set of arguments can be passed to the procedure; the details of how to do this are described in the Argument Passing section.
void InsertCallLoadRefs(InstPtr inst, char *ProcName, int argc, void **argv, ArgType *argt);
A set of arguments can be passed to the procedure; the details of how to do this are described in the Argument Passing section.
void InsertCallStoreRefs(InstPtr inst, char *ProcName, int argc, void *argv, ArgType *argt);
A set of arguments can be passed to the procedure; the details of how to do this are described in the Argument Passing section.
char *InstGetBytes(InstPtr inst);
int InstGetLength(InstPtr inst);
Address_t InstGetPC(InstPtr inst);
Address_t InstGetRVA(InstPtr inst);
IType InstGetType(InstPtr inst);
Address_t InstGetBranchTarget(InstPtr inst);
DCInstruction_t InstGetInfo(InstPtr inst);
int InstIsExternalCallSite(InstPtr inst);
IsExternalCallSite() is not yet implemented.
int InstIsDllCallSite(InstPtr inst);
char *InstGetDllProcName(InstPtr inst)
char *InstGetDllFileName(InstPtr inst)
Address_t BbGetPC(BbPtr bb)
int BbGetLength(BbPtr bb)
int BbGetNumInsts(BbPtr bb)
char *ProcGetName(int ProcNum);
int ProcGetNum(char *ProcName);
int ProcGetNumFromAddress(Address_t address);
With the second routine, the procedure is specified using the ProcPtr argument passed into the InstrumentProcedure callback. Since the ProcPtr passed in is always valid, this routine will always return a valid address. However, because it uses a ProcPtr, this routine can only be used in the InstrumentProcedure callback.
Address_t ProcGetEndAddr(int ProcNum)
Address_t ProcGetEndAddr(ProcPtr proc)
With the second routine, the procedure is specified using the ProcPtr argument passed into the InstrumentProcedure callback. Since the ProcPtr passed in is always valid, this routine will always return a valid address. However, because it uses a ProcPtr, this routine can only be used in the InstrumentProcedure callback.
int ProcIsThunk (ProcPtr proc);
char *ModuleGetName()
char *ModuleGetPath()
char *ModuleGetEtchedName()
int DebugGetProcCount();
The instrumentation support procedures InsertCall(), InsertCallLoadRefs(), and InsertCallStoreRefs() all allow the programmer to specify a set of arguments to the procedure call that is being inserted into the instrumented binary. The arguments that will be passed to the procedure are specified by the argc, argv, and argt parameters.
The argt parameter is an array of type ArgType; it defines how the other parameters (argc and argv) are interpreted. Following are the valid ArgType values and their definitions:
Each tool callback to InstrumentInstruction(), InstrumentBasicBlock(), and InstrumentProcedure() provides an argument called procNum, which for modules containing debug information is the unique procedure number associated with that unit. Typically, the measurement tool keeps an array of event counts whose size is the total number of unique procedures; it then increments the appropriate count by using the procNum argument as an index into the array. When the module being etched does not have Codeview debug information, the procNum argument will always be -1.
Another use for debug information is to support selective instrumentation. The memdebug tool provides an example for this style of use. One can use the ProcGetName() function along with strcmp() function to find the function one wants to instrument. Then, one can check the procNum argument in the InstrumentInstruction() callback to instrument only those instructions whose procnum matches the desired number. The Malloc and Free Usage tool (memdebug) uses this technique to instrument the procedure entry point and all the procedure exit points for any function whose name is malloc, free, or realloc.
Detailed descriptions of how individual API calls behave differently with and without debug information can be found in the previous sections of this document.
The following is a list of future possible uses for debug information within Etch (none of these have been implemented).
Etch provides an interface that allows instrumentation tools to change the ordering of the instructions in the instrumented or optimized binary. This interface is provided primarily to support code layout optimization.
To specify a layout, the tool writer must provide a file containing PC ranges, where the PCs correspond to text addresses in the original (uninstrumented) binary. Typically this file is generated automatically by some other analysis tool. To use the layout, Etch must be invoked with the '-l layout_file' option.
When applying the specified layout, Etch automatically applies checks and transformations to guarantee the correctness of the transformed code. In particular, Etch checks to make sure that the layout does not split instructions or data embedded in the text section. Etch also inserts branches to fall through code as needed.
The format of the layout file consists of multiple lines, each with a pair of whitespace-separated hex numbers specifying a base and limit address. Each (base,limit) pair indicates that the instrumented text corresponding to the original pc range [base..limit-1] should be written to the output binary. Warning: code that resides in any range not included in the layout file will not be included in the instrumented binary.
Two special tokens are recognized, 'BEGIN' and 'END', which correspond to the first PC in the text section and the (last PC in the text section)+1 respectively.
Consider, for example, the following layout file, for a program whose original text section ran from 0x0000 to 0x0F00. In effect, this layout just moves the code from 0x0100 to 0x01FF to the beginning of the output text section, followed by the remainder of the program.
layout file:
0100 0200
BEGIN 0100
0200 END
So the order of the instructions in the resulting binary would be:
0x100
0x101
...
0x1FF
0x000
...
0x0FF
0x200
...
0xF00
This section describes support for obtaining the hardware performance counter values (supported on both the Pentium and PentiumPro) and manipulating these results. The files are contained in two DLLs: hwcounter.dll and math64.dll. hwcounter.dll contains the routines for obtaining and manipulating the performance counters while math64.dll contains the routines for manipulating 64-bit numbers. (The math64.dll library is provided for backward compatibility only; new tools should use the builtin Visual C++ support for 64-bit integers.)
struct { unsigned int cycle_lo; unsigned int cycle_hi; unsigned int ctr0_lo; unsigned int ctr0_hi; unsigned int ctr1_lo; unsigned int ctr1_hi; } counter_t;
From math64.h:
struct { DWORD lo, hi; } Int64, *PInt64
Int64 defines a 64-bit unsigned integer.
Makes counter c1 and c2 the active counters.
Resets the active counters to zero, including the cycle counter.
Reads the active counters and puts the result in ctr. This puts in two results in the active counters. Note that this call takes about 4,000 cycles so this call should not be used to time events that will require more precision than this enables.
Returns the system type: one of SYS_P5_WNT, SYS_P6_WNT, SYS_P5_W95, SYS_P6_W95, SYS_UNKNOWN.
Returns the name of counter number c on the system specified.
Closes the device driver file associated with the counters.
PInt64 Add64(PInt64 op1, PInt64 op2, PInt64 result)
As an example of how to write a simple tool, this section presents a tool that counts unaligned memory acesses. Following are the two components of the tool, the instrumentation code, unaligned-inst.cxx, and the runtime code, unaligned-rt.c.
/* * unaligned-inst.cxx * * Instrumentation code for simple unaligned access counter tool. * * The InstrumentInstruction procedure is invoked by Etch before and * after each instruction is discovered. It calls Etch routines * InsertCallLoadRefs and InsertCallStoreRefs, which cause calls * to be made to tool runtime routines (LoadReference and StoreReference) * when a memory load or store is executed. * */ #define _ETCHLIB_IMPORTER_ #include "etch-inst.h" /* * make the callbacks visible to the etch engine. */ extern "C" { DllExport void InstrumentInstruction(WhenType when, InstPtr inst, int procNum); DllExport void InstrumentProgram(WhenType when); } void InstrumentInstruction(WhenType when, InstPtr inst, int procNum) { int argc; Address_t pc; void * array[3]; ArgType argt[3]; if (when == Before) { argc = 3; pc = InstGetPC(inst); /* set up the arguments for the calls to LoadReference and * StoreReference. */ array[0] = 0; argt[0] = ArgEffAddr; array[1] = 0; argt[1] = ArgEffAddrLen; array[2] = (void*)pc; argt[2] = ArgImmed; /* insert calls to LoadReference and StoreReference. Note * that these routines will only be called on instructions * that actually make memory references. */ InsertCallLoadRefs("LoadReference",argc,array,argt); InsertCallStoreRefs("StoreReference",argc,array,argt); } } void InstrumentProgram(WhenType when) { /* call the ProgramAfter routine with no arguments, once the * program completes. */ if (when == After) { InsertCall("ProgramAfter", 0, 0, 0); } }
/* * unaligned-rt.c * * Runtime code for unaligned access counting Tool. * * The routines LoadReference and StoreReference are called * at each memory access by code inserted into the executable * at instrumentation time. ProgramAfter is called before the * program exits, so that counters can be written out. * * */ #include#include #include #include #include "util.h" #define DllExport __declspec( dllexport ) #define DllImport __declspec( dllimport ) DllExport void ProgramAfter(); DllExport void LoadReference(unsigned long address, int len, unsigned long pc); DllExport void StoreReference(unsigned long address, int len, unsigned long pc); int loads = 0; int stores = 0; int unaligned_loads = 0; int unaligned_stores = 0; FILE * f; void ProgramAfter() { char *fname; /* call our etchOutputName() utility routine to create an * output file name. */ fname = etchOutputName("unaligned"); f = fopen(fname,"w"); assert(f); /* dump the final counter information into the output file. */ fprintf(f,"Category,Number\n"); fprintf(f,"loads,%d\n", loads); fprintf(f,"stores,%d\n", stores); fprintf(f,"unaligned loads,%d\n", unaligned_loads); fprintf(f,"unaligned_stores,%d\n", unaligned_stores); fclose(f); } void LoadReference(unsigned long address, int len, unsigned long pc) { /* count the number of loads and the number of unaligned loads */ loads++; if (address % len) unaligned_loads++; } void StoreReference(unsigned long address, int len, unsigned long pc) { /* count the number of stores and the number of unaligned stores */ stores++; if (address % len) unaligned_stores++; }