Copyright (c) 1997 by the University of Washington. All rights reserved. This is a proprietary and confidential document. Under no circumstances may this document be used, modified, copied, distributed, or sold without the express written permission of the copyright holder.

Etch Tool Writer's API

Overview

Etch is a binary rewriting engine than can be used to build tools that analyze or optimize a Win32 binary. The Etch engine is typically used with an Etch tool that has been programmed to perform a specific analysis or optimization of the binary application. We supply a standard set of tools with Etch, but it is easy to build new tools. This rest of this document describes how one goes about building an Etch tool.

An Etch tool works in two phases: the instrumentation phase and the execution phase. In the instrumentation phase, Etch reads in a Win32 binary -- either an executable or a dynamic-linked-library(DLL) -- and modifies this binary based on the instrumentation code supplied by the tool creator. The instrumentation code defines (exports) a set of callback procedures, whose interface is given in this document. The Etch engine calls those procedures at certain times, as it processes the executable. At each callback, Etch passes information about the input executable to the instrumentation routines. Based on that information, the callback routines can query Etch for further information or instruct Etch as to what modifications should be made to the executable. In the execution phase, the user runs the binary that was created as output of the instrumentation phase.

Building a new tool with Etch therefore requires (1) an instrumentation DLL "<toolname>-inst.dll" and optionally, (2) a runtime DLL "<toolname>-rt.dll". The instrumentation DLL must contain any callback routines that the tool creator wants Etch to invoke. The instrumentation routines may cause Etch to insert procedure calls into the executable, in which case these procedures must be implemented in the runtime DLL.

We provide a header file, "etch-api.h", to assist programmers in building instrumentation callbacks. This header file gives the tool creator access to a set of support routines for obtaining information about the binary and for rewriting the executable.

Any routines whose names and arguments were added to the executable by the instrumentation routines should have implementations in the runtime DLL, "<toolname>-rt.dll". One common task for a runtime routine is to measure some hardware event (such as cache-misses or CPU-cycles) over a certain code region. We provide a programming interface to the Pentium hardware performance counters that tool writers can use to measure such events.

Instrumentation Callbacks

Since Etch drives the instrumentation process, the programmer implementing the instrumentation code needs to understand how and when Etch will invoke the callbacks in the instrumentation DLL that he provides. The order is as follows:

InstrumentInit()
For each module that is etched as part of the total program:
   For each procedure in the program:
      InstrumentProcedure(Before)
      For each basic block in the Procedure:
         InstrumentBasicBlock(Before)
         For each instruction in the Basic Block:
            InstrumentInstruction(Before)
            InstrumentInstruction(After)
         InstrumentBasicBlock(After)
      InstrumentProcedure(After)
   InstrumentModule(Before)
   InstrumentModule(After)
InstrumentProgram(Before)
InstrumentProgram(After)
InstrumentCleanup()
Etch invokes all of these instrumentation routines in an intuitive order except for InstrumentModule(Before) and InstrumentProgram(Before), which aren't invoked until after all of the procedures, basic blocks, and instructions are instrumented. We do this because it is often convenient to allocate data structures just before the program runs (or before the module is initialized) based on the results of instrumentation. For example, our branch prediction tool allocates a table whose size is the number of unique branch targets in the application, and this number isn't known until InstrumentInstruction() has been called on each instruction.

Invocations to the instrumentation routines InstrumentInstruction, InstrumentBasicBlock, and InstrumentProcedure pass a handle to corresponding component found. For example, InstrumentProcedure passes a ProcNum, which is an integer used to refer to the procedure on behalf of which the callback is made. If debug information is available, procedure ProcGetName can be used to find the name of a procedure, given its ProcNum. Similarly, the Etch call to InstrumentInstruction passes a handle to the instruction on behalf of which the call is made; the instruction handle, of type InstPtr, can be used to determine the instruction type, its PC, its byte encoding, and so on, when passed to the appropriate support routines.

void InstrumentProgram(WhenT when);

Etch calls this routine twice: once to allow the instrumentation routine to insert code that will be executed before the program executes (when == Before), and once to allow the routine to insert code that will be executed after the program completes (when == After). Note that any code inserted by InstrumentProgram(After) will be invoked whenever the program calls the exit() routine, even if the routine that calls exit() is an instrumentation routine.

void InstrumentModule(WhenT when);

Etch calls this routine twice for each instrumented DLL to allow the instrumentation routine to insert code before the entry point of the module (when == Before) or after the module is unloaded (when == After) (for DLLs). For a DLL, if the instrumentation routine called by InstrumentModule with when==Before inserts code into that module, the code will be executed at runtime before execution of the initialization procedure of the DLL.

Note that InstrumentModule(After) is only called for DLLs, not executables. Furthermore, it may be called after InstrumentProgram(After) in some cases.

void InstrumentProcedure(WhenT when, ProcPtr proc, int ProcNum);

Etch calls this routine twice for each procedure in the program: once to allow the instrumentation routine to insert code that will be executed before the procedure executes (when == Before), and once to allow the user to insert code that will be executed after the procedure completes (when == After). If debugging information is available, then ProcNum will have a useful value; without debugging information, procedure numbers cannot be assigned, and ProcNum will have the value -1.

Note that InstrumentProcedure(After) will be invoked once for every exit point in the procedure. Also, any code inserted during Procedure(After) will be inserted directly before the control flow instruction (typically ret) corresponding to the exit point.

void InstrumentBasicBlock(WhenT when, BbPtr bb, int ProcNum);

Etch calls this routine twice for each basic block: once to allow the instrumentation routine to insert code that will be executed before the basic block executes (when == Before), and once to allow the user to insert code that will be executed after the basic block has been executed (when == After). If debugging info is available, then ProcNum will have a useful value, and it is -1 otherwise.

Note that if the last instruction of a basic block is a control flow instruction, then any code inserted during BasicBlock(After) will be inserted directly before the instruction.

void InstrumentInstruction(WhenT when, InstPtr inst, int ProcNum);

Etch calls this routine twice per instruction: once before each instruction in the entire executable (when == Before), and once after each instruction (when == After). If debugging info is available, then ProcNum will have a useful value, and it is -1 otherwise.

Note that if the instruction is a control flow instruction, then any code inserted during InstrumentInstruction(After) will be inserted before the instruction. The inserted code will be executed in the expected order with respect to other inserted code (e.g., after code inserted during InstrumentInstruction(Before), BasicBlock(Before), Procedure(Before), etc.), but will also be executed before the control flow instruction.

void InstrumentInit();

Etch calls this routine once at the very beginning of instrumentation.

void InstrumentCleanup();

Etch calls this routine once at the very end of instrumentation.

Instrumentation Support

Support Routines

The following instrumentation support routines are provided by (exported by) Etch. They can be called by instrumentation routines during the instrumentation phase. The support routines are used to instruct Etch to make changes to the executable or to query Etch for additional information about the executable or its components.

The support routines that insert Call instructions save all general-purpose integer registers and flags. If the "save floating point state" flag is set for the module, floating-point state is saved as well.

void InsertInst(char *newInst, int newInstLen);

Invocation of the InsertInst procedure causes a new instruction, specified by newInst, to be inserted in the executable at the current position, as defined by the Etch callback. For example, if Etch calls the instrumentation routine InstrumentInstruction with when == Before, then invocation of InsertInst by that routine causes the new instruction to be inserted preceding the instruction for which the callback occurred; if the instrumentation routine calls InsertInst on an InstrumentInstruction callback with when==After, then the new instruction is inserted following the instruction for which the callback occurred. The parameter newInstLen specifies the length of the new instruction. Note that no checking is done on newInst to ensure that it is a valid x86 instruction. If InsertInst is called repeatedly from the same instrumentation callback, Etch inserts the new instructions in the order that the InsertInst calls are made. The instructions will therefore be executed in that same order at runtime.

void ReplaceInst(char *newInst, int newInstLen);

ReplaceInst causes the current instruction to be replaced by the instruction newInst, whose length is specified by newInstLen. No checking is done on newInst to ensure that it is a valid x86 instruction. If this routine is called more than once, the last call will over-ride all previous calls. ReplaceInst returns TRUE if th einstruction was replaced, and FALSE if not. Currently, those instructions that contain branch targets or other embedded pointers are not replaceable.

void InsertCall(char *ProcName, int argc, void **argv, ArgType *argt);

The InsertCall procedure inserts code into the instrumented binary to call a procedure whose name is specified by ProcName. Etch inserts the procedure call at the current position, as defined by the current Etch callback. For example, if Etch calls the instrumentation routine InstrumentInstruction with when == After, then invocation of InsertCall causes the call to be inserted following the instruction for which the callback occurred. An instrumentation procedure can call InsertCall more than once from a given callback location; Etch inserts the calls in the order that the InsertCall invocations are made. The call instructions will therefore be executed in that same order at runtime.

A set of arguments can be passed to the procedure; the details of how to do this are described in the Argument Passing section.

void InsertCallLoadRefs(InstPtr inst, char *ProcName, int argc, void **argv, ArgType *argt);

Invocation of InsertCallLoadRefs causes a procedure call to be made at runtime to procedure ProcName on all memory reads by instruction inst. If the instruction makes multiple memory reads (e.g., in the case of a REP instruction), then the ProcName procedure will be called multiple times (in the order that the reference occurred in the actual instruction). This routine may be called repeatedly, and the calls will occur at runtime in the same order the InsertCallLoadRefs routine was called during instrumentation. If the instruction causes no memory loads, this routine has no effect.

A set of arguments can be passed to the procedure; the details of how to do this are described in the Argument Passing section.

void InsertCallStoreRefs(InstPtr inst, char *ProcName, int argc, void *argv, ArgType *argt);

Invocation of InsertCallStoreRefs causes a procedure call to be made at runtime to procedure ProcName on all memory writes by instruction inst. If the instruction makes multiple memory writes (e.g., in the case of a REP instruction), then the ProcName procedure will be called multiple times (in the order that the references occurred in the actual instruction). This routine may be called repeatedly, and the calls will occur at run-time in the same order the InsertCallStoreRefs routine was called during instrumentation. If the instruction causes no memory stores, this routine has no effect.

A set of arguments can be passed to the procedure; the details of how to do this are described in the Argument Passing section.

char *InstGetBytes(InstPtr inst);

This routine returns a pointer to the actual bytes of the original instruction. Use InstGetLength to find the length in bytes of the instruction.

int InstGetLength(InstPtr inst);

This routine returns the length (in bytes) of the original instruction.

Address_t InstGetPC(InstPtr inst);

This routine returns the program counter of the original instruction.

Address_t InstGetRVA(InstPtr inst);

This routine returns the relative address (RVA) of the instruction from the start of the module.

IType InstGetType(InstPtr inst);

This routine classifies an instruction into an IType. Currently, the possible IType values are: {InstTypeUnknown, InstTypeCall, InstTypeJmp, InstTypeJcc, InstTypeReturn, InstTypeMov, InstTypeALU, InstTypePush, InstTypePop}.

Address_t InstGetBranchTarget(InstPtr inst);

This routine returns the branch target for those branches where the target address can be determined statically. It returns 0 (zero) if the target address cannot be statically determined.

DCInstruction_t InstGetInfo(InstPtr inst);

This routine provides decoding help for tool writers. It decodes the instruction and presents a structure DCInstruction_t (defined in include/instruction.h) to the user with information about the instruction. The following macros can be used with InstGetInfo to provide more information:

DCOpcode_t InstGetMnemonic(DCInstruction_t)
Returns a mnemonic for the opcode of an instruction.
char InstGetPrefix(DCInstruction_t)
Returns the prefix byte of an instruction.
int InstGetNumOperands(DCInstruction_t)
Returns the number of operands of an instruction.
DCOperand_t InstGetOperand (DCInstruction_t, int)
Returns the requested operand of an instruction.
char InstGetOperandSize (DCOperand_t)
Returns the size of an operand.
int InstGetOperandMode (DCOperand_t)
Returns the mode of an operand.

int InstIsExternalCallSite(InstPtr inst);

This routine returns 1 if the instruction has type ITypeCall and if the call site is a call to an uninstrumented routine. It returns 0 if the call site is not an external call. This only works if debugging information is available; it returns -1 otherwise.

IsExternalCallSite() is not yet implemented.

int InstIsDllCallSite(InstPtr inst);

This routine returns 1 if the instruction is a call to an implicitly-loaded DLL; it returns zero otherwise.

char *InstGetDllProcName(InstPtr inst)

If the specified instruction is a call to a DLL, this routine returns a pointer to the name of that DLL procedure.

char *InstGetDllFileName(InstPtr inst)

If the specified instruction is a call to a DLL, this routine returns a pointer to the name of the file that contains the procedure called by this instruction.

Address_t BbGetPC(BbPtr bb)

This routine returns the original address of the first instruction of the current basic block.

int BbGetLength(BbPtr bb)

This routine returns the total length of the current basic block, measured in bytes.

int BbGetNumInsts(BbPtr bb)

This routine returns the total length of the current basic block, measured in instructions.

char *ProcGetName(int ProcNum);

If debugging information is available, then this routine provides the number to name mapping for each instrumented procedure; it returns zero otherwise.

int ProcGetNum(char *ProcName);

If debugging information is available, then this routine provides the name to number mapping for each instrumented procedure. It returns -1 if unknown.

int ProcGetNumFromAddress(Address_t address);

If debugging information is available, then this routine returns the procedure number of the procedure containing the address. It returns -1 if unknown.
Address_t ProcGetStartAddr(int ProcNum)
Address_t ProcGetStartAddr(ProcPtr proc)

These two routines return the original starting address of the specified procedure. With the first routine, the procedure is specified using a ProcNum, and therefore can be called in any of the instrumentation callbacks. If the ProcNum is -1, corresponding to a procedure with no debugging information, then this routine will return 0.

With the second routine, the procedure is specified using the ProcPtr argument passed into the InstrumentProcedure callback. Since the ProcPtr passed in is always valid, this routine will always return a valid address. However, because it uses a ProcPtr, this routine can only be used in the InstrumentProcedure callback.

Address_t ProcGetEndAddr(int ProcNum)
Address_t ProcGetEndAddr(ProcPtr proc)

These two routines return the original ending address of the specified procedure. With the first routine, the procedure is specified using a ProcNum, and therefore can be called in any of the instrumentation callbacks. If the ProcNum is -1, corresponding to a procedure with no debugging information, then this routine will return 0.

With the second routine, the procedure is specified using the ProcPtr argument passed into the InstrumentProcedure callback. Since the ProcPtr passed in is always valid, this routine will always return a valid address. However, because it uses a ProcPtr, this routine can only be used in the InstrumentProcedure callback.

int ProcIsThunk (ProcPtr proc);

This routine returns 1 if the specified procedure is a thunk, and 0 otherwise. A procedure is a thunk if it immediately and unconditionally jumps to another address.

char *ModuleGetName()

This routine returns a pointer to the filename of the executable or DLL being instrumented. If called from InstrumentProgram it will return a null pointer, since there is no single module that is unique to the entire program.

char *ModuleGetPath()

This routine returns a pointer to the directory of the executable or DLL being instrumented. If called from InstrumentProgram it will return a null pointer, since there is no single module that is unique to the entire program.

char *ModuleGetEtchedName()

This routine returns a pointer to the etched name of the executable or DLL being instrumented. If called from InstrumentProgram it will return a null pointer, since there is no single module that is unique to the entire program.

int DebugGetProcCount();

If debugging information is available, then this routine returns the total number of instrumented procedures; it returns -1 otherwise.

Argument Passing

The instrumentation support procedures InsertCall(), InsertCallLoadRefs(), and InsertCallStoreRefs() all allow the programmer to specify a set of arguments to the procedure call that is being inserted into the instrumented binary. The arguments that will be passed to the procedure are specified by the argc, argv, and argt parameters.

The argt parameter is an array of type ArgType; it defines how the other parameters (argc and argv) are interpreted. Following are the valid ArgType values and their definitions:

Note that not all of these ArgTypes are valid at every location. For example, the ArgBranchTarget type is only valid when InsertCall is invoked from InstrumentInstruction(), and only when the particular instruction is a branch instruction.

Runtime Support

These are support routines that the runtime tool can call. To use them, include the file etchrt-lib.h in the source and link with etchrt-lib.lib.

unsigned long EtchNewTargetToOld(unsigned long newVA);
Translates newVA into the corresponding address in the original unetched binary.

unsigned long EtchOldTargetToNew(unsigned long oldVA);
Translates an address from the original unetched binary into the corresponding address in the etched binary.

Using Debug Information

The Etch API provides extra functionality when modules being etched have Codeview debug information embedded in them. For modules containing such debug information, the tool programmer can associate a procedure name and a unique number with every address range of code in the module. Furthermore, Etch and Visual Etch ensure that the number is unique accross all the modules being etched as part of an application. In this way, measurement tools can associate events being counted with a particular procedure. Both the countiref tool and the cache simulation tool provide implementation examples for this style of use.

Each tool callback to InstrumentInstruction(), InstrumentBasicBlock(), and InstrumentProcedure() provides an argument called procNum, which for modules containing debug information is the unique procedure number associated with that unit. Typically, the measurement tool keeps an array of event counts whose size is the total number of unique procedures; it then increments the appropriate count by using the procNum argument as an index into the array. When the module being etched does not have Codeview debug information, the procNum argument will always be -1.

Another use for debug information is to support selective instrumentation. The memdebug tool provides an example for this style of use. One can use the ProcGetName() function along with strcmp() function to find the function one wants to instrument. Then, one can check the procNum argument in the InstrumentInstruction() callback to instrument only those instructions whose procnum matches the desired number. The Malloc and Free Usage tool (memdebug) uses this technique to instrument the procedure entry point and all the procedure exit points for any function whose name is malloc, free, or realloc.

Detailed descriptions of how individual API calls behave differently with and without debug information can be found in the previous sections of this document.

The following is a list of future possible uses for debug information within Etch (none of these have been implemented).

Code Layout Interface

Etch provides an interface that allows instrumentation tools to change the ordering of the instructions in the instrumented or optimized binary. This interface is provided primarily to support code layout optimization.

To specify a layout, the tool writer must provide a file containing PC ranges, where the PCs correspond to text addresses in the original (uninstrumented) binary. Typically this file is generated automatically by some other analysis tool. To use the layout, Etch must be invoked with the '-l layout_file' option.

When applying the specified layout, Etch automatically applies checks and transformations to guarantee the correctness of the transformed code. In particular, Etch checks to make sure that the layout does not split instructions or data embedded in the text section. Etch also inserts branches to fall through code as needed.

The format of the layout file consists of multiple lines, each with a pair of whitespace-separated hex numbers specifying a base and limit address. Each (base,limit) pair indicates that the instrumented text corresponding to the original pc range [base..limit-1] should be written to the output binary. Warning: code that resides in any range not included in the layout file will not be included in the instrumented binary.

Two special tokens are recognized, 'BEGIN' and 'END', which correspond to the first PC in the text section and the (last PC in the text section)+1 respectively.

Consider, for example, the following layout file, for a program whose original text section ran from 0x0000 to 0x0F00. In effect, this layout just moves the code from 0x0100 to 0x01FF to the beginning of the output text section, followed by the remainder of the program.

layout file:
0100 0200
BEGIN 0100
0200 END

So the order of the instructions in the resulting binary would be:

0x100
0x101
...
0x1FF
0x000
...
0x0FF
0x200
...
0xF00

Performance Counter Support Routines

This section describes support for obtaining the hardware performance counter values (supported on both the Pentium and PentiumPro) and manipulating these results. The files are contained in two DLLs: hwcounter.dll and math64.dll. hwcounter.dll contains the routines for obtaining and manipulating the performance counters while math64.dll contains the routines for manipulating 64-bit numbers. (The math64.dll library is provided for backward compatibility only; new tools should use the builtin Visual C++ support for 64-bit integers.)

Data Types

From hwcounter.h:
struct { 
    unsigned int cycle_lo;
    unsigned int cycle_hi;
    unsigned int ctr0_lo;
    unsigned int ctr0_hi;
    unsigned int ctr1_lo;
    unsigned int ctr1_hi;
} counter_t;

Counter_t defines a structure to contain the results of a read of the counter values. It fills in the results of three counters: the cycle counter (sometimes called the time-stamp counter) and two user selectable performance counters.

From math64.h:

struct { 
   DWORD lo, hi; 
} Int64, *PInt64 

Int64 defines a 64-bit unsigned integer.

Functions

HANDLE InitHardwareCounters()

Initializes the hardware counters. Returns a handle to the driver if the counters and the driver are available. Otherwise it returns NULL. This has to be called before any other routines.

void GetCycleCounter(Int64 *cval)

GetCycleCounter fills cval with the current value of the cycle counter. This has much lower overhead than ReadActiveCounters (described below), because it does not require a call to the device driver.

void SetActiveCounters(int c1, int c2)

Makes counter c1 and c2 the active counters.

void ResetActiveCounters()

Resets the active counters to zero, including the cycle counter.

void ReadActiveCounters(counter_t *ctr)

Reads the active counters and puts the result in ctr. This puts in two results in the active counters. Note that this call takes about 4,000 cycles so this call should not be used to time events that will require more precision than this enables.

SystemType GetSystemType()

Returns the system type: one of SYS_P5_WNT, SYS_P6_WNT, SYS_P5_W95, SYS_P6_W95, SYS_UNKNOWN.

char *GetCounterName(SystemType sys,int c)

Returns the name of counter number c on the system specified.

void CloseCounters()

Closes the device driver file associated with the counters.

PInt64 Add64(PInt64 op1, PInt64 op2, PInt64 result)

Does 64-bit addition: result = op1 + op2; returns result

PInt64 Sub64(PInt64 op1, PInt64 op2, PInt64 result);

Does 64-bit subtraction: result = op1 - op2 ; returns result

void SPrint64(PSTR buffer, PSTR fmt, PInt64 op);

The equivalent of sprintf except that the first %s in fmt is replaced by the value in decimal of op.

An Example Tool

As an example of how to write a simple tool, this section presents a tool that counts unaligned memory acesses. Following are the two components of the tool, the instrumentation code, unaligned-inst.cxx, and the runtime code, unaligned-rt.c.


Example Tool Instrumentation Code



/*
 * unaligned-inst.cxx
 *
 * Instrumentation code for simple unaligned access counter tool.
 *
 * The InstrumentInstruction procedure is invoked by Etch before and
 * after each instruction is discovered.  It calls Etch routines
 * InsertCallLoadRefs and InsertCallStoreRefs, which cause calls
 * to be made to tool runtime routines (LoadReference and StoreReference)
 * when a memory load or store is executed.
 *
 */

#define _ETCHLIB_IMPORTER_

#include "etch-inst.h"

/*
 * make the callbacks visible to the etch engine.
 */
extern "C" {
    DllExport
	void
	InstrumentInstruction(WhenType when, InstPtr inst, int procNum);
    DllExport
	void
	InstrumentProgram(WhenType when);
}

void
InstrumentInstruction(WhenType when, InstPtr inst, int procNum)
{
    int argc;
    Address_t pc;
    void * array[3];
    ArgType argt[3];

    if (when == Before) {
	argc = 3;
	pc = InstGetPC(inst);

	/* set up the arguments for the calls to LoadReference and
	 * StoreReference.
	 */
	array[0] = 0;	  argt[0] = ArgEffAddr;
	array[1] = 0;	  argt[1] = ArgEffAddrLen;
	array[2] = (void*)pc; argt[2] = ArgImmed;

	/* insert calls to LoadReference and StoreReference.  Note
	 * that these routines will only be called on instructions
	 * that actually make memory references.
	 */
	InsertCallLoadRefs("LoadReference",argc,array,argt);
	InsertCallStoreRefs("StoreReference",argc,array,argt);
    }
}

void
InstrumentProgram(WhenType when)
{
    /* call the ProgramAfter routine with no arguments, once the
     * program completes.
     */
    if (when == After) {
	InsertCall("ProgramAfter", 0, 0, 0);
    }
}

Example Tool Runtime Code



/*
 *  unaligned-rt.c
 * 
 * Runtime code for unaligned access counting Tool.
 *
 * The routines LoadReference and StoreReference are called
 * at each memory access by code inserted into the executable 
 * at instrumentation time.   ProgramAfter is called before the
 * program exits, so that counters can be written out.
 *
 *
 */

#include 
#include 
#include 
#include 

#include "util.h"

#define DllExport   __declspec( dllexport )
#define DllImport   __declspec( dllimport )

DllExport void ProgramAfter();
DllExport void LoadReference(unsigned long address, int len, unsigned long pc);
DllExport void StoreReference(unsigned long address, int len, unsigned long pc);

int loads = 0;
int stores = 0;
int unaligned_loads = 0;
int unaligned_stores = 0;
FILE * f;

void
ProgramAfter()
{
    char *fname;

    /* call our etchOutputName() utility routine to create an
     * output file name.
     */
    fname = etchOutputName("unaligned");
    f = fopen(fname,"w");
    assert(f);

    /* dump the final counter information into the output file.
     */
    fprintf(f,"Category,Number\n");
    fprintf(f,"loads,%d\n", loads);
    fprintf(f,"stores,%d\n", stores);
    fprintf(f,"unaligned loads,%d\n", unaligned_loads);
    fprintf(f,"unaligned_stores,%d\n", unaligned_stores);

    fclose(f);
}

void
LoadReference(unsigned long address, int len, unsigned long pc)
{
    /* count the number of loads and the number of unaligned loads
     */
    loads++;
    if (address % len) unaligned_loads++;
}

void
StoreReference(unsigned long address, int len, unsigned long pc)
{
    /* count the number of stores and the number of unaligned stores
     */
    stores++;
    if (address % len) unaligned_stores++;    
}