Valgrind Tools

A guide to writing new tools for Valgrind
This guide was last updated on 20030520

Valgrind is licensed under the GNU General Public License, version 2
An open-source tool for supervising execution of Linux-x86 executables.

Contents of this manual

1 Introduction

1.1 Supervised Execution

Valgrind provides a generic infrastructure for supervising the execution of programs. This is done by providing a way to instrument programs in very precise ways, making it relatively easy to support activities such as dynamic error detection and profiling.

Although writing a tool is not easy, and requires learning quite a few things about Valgrind, it is much easier than instrumenting a program from scratch yourself.

1.2 Tools

The key idea behind Valgrind's architecture is the division between its ``core'' and ``tools''.

The core provides the common low-level infrastructure to support program instrumentation, including the x86-to-x86 JIT compiler, low-level memory manager, signal handling and a scheduler (for pthreads). It also provides certain services that are useful to some but not all tools, such as support for error recording and suppression.

But the core leaves certain operations undefined, which must be filled by tools. Most notably, tools define how program code should be instrumented. They can also define certain variables to indicate to the core that they would like to use certain services, or be notified when certain interesting events occur. But the core takes care of all the hard work.

1.3 Execution Spaces

An important concept to understand before writing a tool is that there are three spaces in which program code executes:

User space: this covers most of the program's execution. The tool is given the code and can instrument it any way it likes, providing (more or less) total control over the code.
Code executed in user space includes all the program code, almost all of the C library (including things like the dynamic linker), and almost all parts of all other libraries.

Core space: a small proportion of the program's execution takes place entirely within Valgrind's core. This includes:
- Dynamic memory management (malloc() etc.)
- Pthread operations and scheduling
- Signal handling
A tool has no control over these operations; it never ``sees'' the code doing this work and thus cannot instrument it. However, the core provides hooks so a tool can be notified when certain interesting events happen, for example when when dynamic memory is allocated or freed, the stack pointer is changed, or a pthread mutex is locked, etc.
Note that these hooks only notify tools of events relevant to user space. For example, when the core allocates some memory for its own use, the tool is not notified of this, because it's not directly part of the supervised program's execution.

Kernel space: execution in the kernel. Two kinds:
1. System calls: can't be directly observed by either the tool or the core. But the core does have some idea of what happens to the arguments, and it provides hooks for a tool to wrap system calls.
2. Other: all other kernel activity (e.g. process scheduling) is totally opaque and irrelevant to the program.

It should be noted that a tool only has direct control over code executed in user space. This is the vast majority of code executed, but it is not absolutely all of it, so any profiling information recorded by a tool won't be totally accurate.

2 Writing a Tool

2.1 Why write a tool?

Before you write a tool, you should have some idea of what it should do. What is it you want to know about your programs of interest? Consider some existing tools:

memcheck: among other things, performs fine-grained validity and addressibility checks of every memory reference performed by the program

addrcheck: performs lighterweight addressibility checks of every memory reference performed by the program

cachegrind: tracks every instruction and memory reference to simulate instruction and data caches, tracking cache accesses and misses that occur on every line in the program

helgrind: tracks every memory access and mutex lock/unlock to determine if a program contains any data races

lackey: does simple counting of various things: the number of calls to a particular function (_dl_runtime_resolve()); the number of basic blocks, x86 instruction, UCode instructions executed; the number of branches executed and the proportion of those which were taken.

These examples give a reasonable idea of what kinds of things Valgrind can be used for. The instrumentation can range from very lightweight (e.g. counting the number of times a particular function is called) to very intrusive (e.g. memcheck's memory checking).

2.2 Suggested tools

Here is a list of ideas we have had for tools that should not be too hard to implement.

branch profiler: A machine's branch prediction hardware could be simulated, and each branch annotated with the number of predicted and mispredicted branches. Would be implemented quite similarly to Cachegrind, and could reuse the cg_annotate script to annotate source code.
The biggest difficulty with this is the simulation; the chip-makers are very cagey about how their chips do branch prediction. But implementing one or more of the basic algorithms could still give good information.

coverage tool: Cachegrind can already be used for doing test coverage, but it's massive overkill to use it just for that.
It would be easy to write a coverage tool that records how many times each basic block was recorded. Again, the cg_annotate script could be used for annotating source code with the gathered information. Although, cg_annotate is only designed for working with single program runs. It could be extended relatively easily to deal with multiple runs of a program, so that the coverage of a whole test suite could be determined.
In addition to the standard coverage information, such a tool could record extra information that would help a user generate test cases to exercise unexercised paths. For example, for each conditional branch, the tool could record all inputs to the conditional test, and print these out when annotating.
run-time type checking: A nice example of a dynamic checker is given in this paper:
Debugging via Run-Time Type Checking
Alexey Loginov, Suan Hsi Yong, Susan Horwitz and Thomas Reps
Proceedings of Fundamental Approaches to Software Engineering
April 2001.
Similar is the tool described in this paper:
Run-Time Type Checking for Binary Programs
Michael Burrows, Stephen N. Freund, Janet L. Wiener
Proceedings of the 12th International Conference on Compiler Construction (CC 2003)
April 2003.
These approach can find quite a range of bugs, particularly in C and C++ programs, and could be implemented quite nicely as a Valgrind tool.
Ways to speed up this run-time type checking are described in this paper:
Reducing the Overhead of Dynamic Analysis
Suan Hsi Yong and Susan Horwitz
Proceedings of Runtime Verification '02
July 2002.
Valgrind's client requests could be used to pass information to a tool about which elements need instrumentation and which don't.

We would love to hear from anyone who implements these or other tools.

2.3 How tools work

Tools must define various functions for instrumenting programs that are called by Valgrind's core, yet they must be implemented in such a way that they can be written and compiled without touching Valgrind's core. This is important, because one of our aims is to allow people to write and distribute their own tools that can be plugged into Valgrind's core easily.

This is achieved by packaging each tool into a separate shared object which is then loaded ahead of the core shared object valgrind.so, using the dynamic linker's LD_PRELOAD variable. Any functions defined in the tool that share the name with a function defined in core (such as the instrumentation function SK_(instrument)()) override the core's definition. Thus the core can call the necessary tool functions.

This magic is all done for you; the shared object used is chosen with the --tool option to the valgrind startup script. The default tool used is memcheck, Valgrind's original memory checker.

2.4 Getting the code

To write your own tool, you'll need to check out a copy of Valgrind from the CVS repository, rather than using a packaged distribution. This is because it contains several extra files needed for writing tools.

To check out the code from the CVS repository, first login:

cvs -d:pserver:anonymous@cvs.valgrind.sourceforge.net:/cvsroot/valgrind login

Then checkout the code. To get a copy of the current development version (recommended for the brave only):

cvs -z3 -d:pserver:anonymous@cvs.valgrind.sourceforge.net:/cvsroot/valgrind co valgrind

To get a copy of the stable released branch:

cvs -z3 -d:pserver:anonymous@cvs.valgrind.sourceforge.net:/cvsroot/valgrind co -r TAG valgrind

where TAG has the form VALGRIND_X_Y_Z for version X.Y.Z.

2.5 Getting started

Valgrind uses GNU automake and autoconf for the creation of Makefiles and configuration. But don't worry, these instructions should be enough to get you started even if you know nothing about those tools.

In what follows, all filenames are relative to Valgrind's top-level directory valgrind/.

Choose a name for the tool, and an abbreviation that can be used as a short prefix. We'll use foobar and fb as an example.

Make a new directory foobar/ which will hold the tool.

Copy none/Makefile.am into foobar/. Edit it by replacing all occurrences of the string ``none'' with ``foobar'' and the one occurrence of the string ``nl_'' with ``fb_''. It might be worth trying to understand this file, at least a little; you might have to do more complicated things with it later on. In particular, the name of the vgskin_foobar_so_SOURCES variable determines the name of the tool's shared object, which determines what name must be passed to the --tool option to use the tool.

Copy none/nl_main.c into foobar/, renaming it as fb_main.c. Edit it by changing the lines in SK_(pre_clo_init)() to something appropriate for the tool. These fields are used in the startup message, except for bug_reports_to which is used if a tool assertion fails.

Edit Makefile.am, adding the new directory foobar to the SUBDIRS variable.

Edit configure.in, adding foobar/Makefile to the AC_OUTPUT list.

Run:
```
    autogen.sh
    ./configure --prefix=`pwd`/inst
    make install
```
It should automake, configure and compile without errors, putting copies of the tool's shared object vgskin_foobar.so in foobar/ and inst/lib/valgrind/.

You can test it with a command like

    inst/bin/valgrind --tool=foobar date

(almost any program should work; date is just an example). The output should be something like this:

==738== foobar-0.0.1, a foobarring tool for x86-linux.
==738== Copyright (C) 1066AD, and GNU GPL'd, by J. Random Hacker.
==738== Built with valgrind-1.1.0, a program execution monitor.
==738== Copyright (C) 2000-2003, and GNU GPL'd, by Julian Seward.
==738== Estimated CPU clock rate is 1400 MHz
==738== For more details, rerun with: -v
==738== 
Wed Sep 25 10:31:54 BST 2002
==738==

The tool does nothing except run the program uninstrumented.

These steps don't have to be followed exactly - you can choose different names for your source files, and use a different --prefix for ./configure.

Now that we've setup, built and tested the simplest possible tool, onto the interesting stuff...

2.6 Writing the code

A tool must define at least these four functions:

    SK_(pre_clo_init)()
    SK_(post_clo_init)()
    SK_(instrument)()
    SK_(fini)()

Also, it must use the macro VG_DETERMINE_INTERFACE_VERSION exactly once in its source code. If it doesn't, you will get a link error involving VG_(skin_interface_major_version). This macro is used to ensure the core/tool interface used by the core and a plugged-in tool are binary compatible. In addition, if a tool wants to use some of the optional services provided by the core, it may have to define other functions.

2.7 Initialisation

Most of the initialisation should be done in SK_(pre_clo_init)(). Only use SK_(post_clo_init)() if a tool provides command line options and must do some initialisation after option processing takes place (``clo'' stands for ``command line options'').

First of all, various ``details'' need to be set for a tool, using the functions VG_(details_*)(). Some are all compulsory, some aren't. Some are used when constructing the startup message, detail_bug_reports_to is used if VG_(skin_panic)() is ever called, or a tool assertion fails. Others have other uses.

Second, various ``needs'' can be set for a tool, using the functions VG_(needs_*)(). They are mostly booleans, and can be left untouched (they default to False). They determine whether a tool can do various things such as: record, report and suppress errors; process command line options; wrap system calls; record extra information about malloc'd blocks, etc.

For example, if a tool wants the core's help in recording and reporting errors, it must set the skin_errors need to True, and then provide definitions of six functions for comparing errors, printing out errors, reading suppressions from a suppressions file, etc. While writing these functions requires some work, it's much less than doing error handling from scratch because the core is doing most of the work. See the type VgNeeds in include/vg_skin.h for full details of all the needs.

Third, the tool can indicate which events in core it wants to be notified about, using the functions VG_(track_*)(). These include things such as blocks of memory being malloc'd, the stack pointer changing, a mutex being locked, etc. If a tool wants to know about this, it should set the relevant pointer in the structure to point to a function, which will be called when that event happens.

For example, if the tool want to be notified when a new block of memory is malloc'd, it should call VG_(track_new_mem_heap)() with an appropriate function pointer, and the assigned function will be called each time this happens.

More information about ``details'', ``needs'' and ``trackable events'' can be found in include/vg_skin.h.

2.8 Instrumentation

SK_(instrument)() is the interesting one. It allows you to instrument UCode, which is Valgrind's RISC-like intermediate language. UCode is described in the technical docs for Memcheck. The easiest way to instrument UCode is to insert calls to C functions when interesting things happen. See the tool ``Lackey'' (lackey/lk_main.c) for a simple example of this, or Cachegrind (cachegrind/cg_main.c) for a more complex example.

A much more complicated way to instrument UCode, albeit one that might result in faster instrumented programs, is to extend UCode with new UCode instructions. This is recommended for advanced Valgrind hackers only! See Memcheck for an example.

2.9 Finalisation

This is where you can present the final results, such as a summary of the information collected. Any log files should be written out at this point.

2.10 Other important information

Please note that the core/tool split infrastructure is quite complex and not brilliantly documented. Here are some important points, but there are undoubtedly many others that I should note but haven't thought of.

The file include/vg_skin.h contains all the types, macros, functions, etc. that a tool should (hopefully) need, and is the only .h file a tool should need to #include.

In particular, you probably shouldn't use anything from the C library (there are deep reasons for this, trust us). Valgrind provides an implementation of a reasonable subset of the C library, details of which are in vg_skin.h.

Similarly, when writing a tool, you shouldn't need to look at any of the code in Valgrind's core. Although it might be useful sometimes to help understand something.

vg_skin.h has a reasonable amount of documentation in it that should hopefully be enough to get you going. But ultimately, the tools distributed (Memcheck, Addrcheck, Cachegrind, Lackey, etc.) are probably the best documentation of all, for the moment.

Note that the VG_ and SK_ macros are used heavily. These just prepend longer strings in front of names to avoid potential namespace clashes. We strongly recommend using the SK_ macro for any global functions and variables in your tool, or writing a similar macro.

2.11 Words of Advice

Writing and debugging tools is not trivial. Here are some suggestions for solving common problems.

If you are getting segmentation faults in C functions used by your tool, the usual GDB command:

gdb prog core

usually gives the location of the segmentation fault.

If you want to debug C functions used by your tool, you can attach GDB to Valgrind with some effort:

Enable the following code in coregrind/vg_main.c by changing if (0) into if (1):

   /* Hook to delay things long enough so we can get the pid and
      attach GDB in another shell. */
   if (0) { 
      Int p, q;
      for (p = 0; p < 50000; p++)
         for (q = 0; q < 50000; q++) ;
   }

and rebuild Valgrind.

Then run:
valgrind prog
Valgrind starts the program, printing its process id, and then delays for a few seconds (you may have to change the loop bounds to get a suitable delay).

In a second shell run:
gdb prog pid

GDB may be able to give you useful information. Note that by default most of the system is built with -fomit-frame-pointer, and you'll need to get rid of this to extract useful tracebacks from GDB.

If you just want to know whether a program point has been reached, using the OINK macro (in include/vg_skin.h) can be easier than using GDB.

If you are having problems with your UCode instrumentation, it's likely that GDB won't be able to help at all. In this case, Valgrind's --trace-codegen option is invaluable for observing the results of instrumentation.

The other debugging command line options can be useful too (run valgrind -h for the list).

3 Advanced Topics

Once a tool becomes more complicated, there are some extra things you may want/need to do.

3.1 Suppressions

If your tool reports errors and you want to suppress some common ones, you can add suppressions to the suppression files. The relevant files are valgrind/*.supp; the final suppression file is aggregated from these files by combining the relevant .supp files depending on the versions of linux, X and glibc on a system.

Suppression types have the form tool_name:suppression_name. The tool_name here is the name you specify for the tool during initialisation with VG_(details_name)().

3.2 Documentation

If you are feeling conscientious and want to write some HTML documentation for your tool, follow these steps (using foobar as the example tool name again):

Make a directory foobar/docs/.

Edit foobar/Makefile.am, adding docs to the SUBDIRS variable.

Edit configure.in, adding foobar/docs/Makefile to the AC_OUTPUT list.

Write foobar/docs/Makefile.am. Use memcheck/docs/Makefile.am as an example.

Write the documentation, putting it in foobar/docs/.

3.3 Regression tests

Valgrind has some support for regression tests. If you want to write regression tests for your tool:

Make a directory foobar/tests/.

Edit foobar/Makefile.am, adding tests to the SUBDIRS variable.

Edit configure.in, adding foobar/tests/Makefile to the AC_OUTPUT list.

Write foobar/tests/Makefile.am. Use memcheck/tests/Makefile.am as an example.

Write the tests, .vgtest test description files, .stdout.exp and .stderr.exp expected output files. (Note that Valgrind's output goes to stderr.) Some details on writing and running tests are given in the comments at the top of the testing script tests/vg_regtest.

Write a filter for stderr results foobar/tests/filter_stderr. It can call the existing filters in tests/. See memcheck/tests/filter_stderr for an example; in particular note the $dir trick that ensures the filter works correctly from any directory.

3.4 Profiling

To do simple tick-based profiling of a tool, include the line

#include "vg_profile.c"

in the tool somewhere, and rebuild (you may have to make clean first). Then run Valgrind with the --profile=yes option.

The profiler is stack-based; you can register a profiling event with VGP_(register_profile_event)() and then use the VGP_PUSHCC and VGP_POPCC macros to record time spent doing certain things. New profiling event numbers must not overlap with the core profiling event numbers. See include/vg_skin.h for details and Memcheck for an example.

3.5 Other makefile hackery

If you add any directories under valgrind/foobar/, you will need to add an appropriate Makefile.am to it, and add a corresponding entry to the AC_OUTPUT list in valgrind/configure.in.

If you add any scripts to your tool (see Cachegrind for an example) you need to add them to the bin_SCRIPTS variable in valgrind/foobar/Makefile.am.

3.5 Core/tool interface versions

In order to allow for the core/tool interface to evolve over time, Valgrind uses a basic interface versioning system. All a tool has to do is use the VG_DETERMINE_INTERFACE_VERSION macro exactly once in its code. If not, a link error will occur when the tool is built.

The interface version number has the form X.Y. Changes in Y indicate binary compatible changes. Changes in X indicate binary incompatible changes. If the core and tool has the same major version number X they should work together. If X doesn't match, Valgrind will abort execution with an explanation of the problem.

This approach was chosen so that if the interface changes in the future, old tools won't work and the reason will be clearly explained, instead of possibly crashing mysteriously. We have attempted to minimise the potential for binary incompatible changes by means such as minimising the use of naked structs in the interface.

4 Final Words

This whole core/tool business under active development, although it's slowly maturing.

The first consequence of this is that the core/tool interface will continue to change in the future; we have no intention of freezing it and then regretting the inevitable stupidities. Hopefully most of the future changes will be to add new features, hooks, functions, etc, rather than to change old ones, which should cause a minimum of trouble for existing tools, and we've put some effort into future-proofing the interface to avoid binary incompatibility. But we can't guarantee anything. The versioning system should catch any incompatibilities. Just something to be aware of.

The second consequence of this is that we'd love to hear your feedback about it:

If you love it or hate it

If you find bugs

If you write a tool

If you have suggestions for new features, needs, trackable events, functions

If you have suggestions for making tools easier to write

If you have suggestions for improving this documentation

If you don't understand something

or anything else!

Happy programming.