This is fftw3.info, produced by makeinfo version 4.2 from fftw3.texi.

This manual is for FFTW (version 3.0.1, 5 June 2003).

   Copyright (C) 2003 Matteo Frigo.

   Copyright (C) 2003 Massachusetts Institute of Technology.

     Permission is granted to make and distribute verbatim copies of
     this manual provided the copyright notice and this permission
     notice are preserved on all copies.

     Permission is granted to copy and distribute modified versions of
     this manual under the conditions for verbatim copying, provided
     that the entire resulting derived work is distributed under the
     terms of a permission notice identical to this one.

     Permission is granted to copy and distribute translations of this
     manual into another language, under the above conditions for
     modified versions, except that this permission notice may be
     stated in a translation approved by the Free Software Foundation.
   
INFO-DIR-SECTION Texinfo documentation system
START-INFO-DIR-ENTRY
* fftw3: (fftw3).	FFTW User's Manual.
END-INFO-DIR-ENTRY


File: fftw3.info,  Node: Top,  Next: Introduction,  Prev: (dir),  Up: (dir)

FFTW User Manual
****************

   Welcome to FFTW, the Fastest Fourier Transform in the West.  FFTW is
a collection of fast C routines to compute the discrete Fourier
transform.  This manual documents FFTW version 3.0.1.

* Menu:

* Introduction::
* Tutorial::
* Other Important Topics::
* FFTW Reference::
* Parallel FFTW::
* Calling FFTW from Fortran::
* Upgrading from FFTW version 2::
* Installation and Customization::
* Acknowledgments::
* License and Copyright::
* Concept Index::
* Library Index::

 --- The Detailed Node Listing ---

Tutorial

* Complex One-Dimensional DFTs::
* Complex Multi-Dimensional DFTs::
* One-Dimensional DFTs of Real Data::
* Multi-Dimensional DFTs of Real Data::
* More DFTs of Real Data::

More DFTs of Real Data

* The Halfcomplex-format DFT::
* Real even/odd DFTs (cosine/sine transforms)::
* The Discrete Hartley Transform::

Other Important Topics

* Data Alignment::
* Multi-dimensional Array Format::
* Words of Wisdom-Saving Plans::
* Caveats in Using Wisdom::

Data Alignment

* SIMD alignment and fftw_malloc::
* Stack alignment on x86::

Multi-dimensional Array Format

* Row-major Format::
* Column-major Format::
* Static Arrays in C::
* Dynamic Arrays in C::
* Dynamic Arrays in C-The Wrong Way::

FFTW Reference

* Data Types and Files::
* Using Plans::
* Basic Interface::
* Advanced Interface::
* Guru Interface::
* Wisdom::
* What FFTW Really Computes::

Data Types and Files

* Complex numbers::
* Precision::
* Memory Allocation::

Basic Interface

* Complex DFTs::
* Planner Flags::
* Real-data DFTs::
* Real-data DFT Array Format::
* Real-to-Real Transforms::
* Real-to-Real Transform Kinds::

Advanced Interface

* Advanced Complex DFTs::
* Advanced Real-data DFTs::
* Advanced Real-to-real Transforms::

Guru Interface

* Interleaved and split arrays::
* Guru vector and transform sizes::
* Guru Complex DFTs::
* Guru Real-data DFTs::
* Guru Real-to-real Transforms::
* Guru Execution of Plans::

Wisdom

* Wisdom Export::
* Wisdom Import::
* Forgetting Wisdom::
* Wisdom Utilities::

What FFTW Really Computes

* The 1d Discrete Fourier Transform (DFT)::
* The 1d Real-data DFT::
* 1d Real-even DFTs (DCTs)::
* 1d Real-odd DFTs (DSTs)::
* 1d Discrete Hartley Transforms (DHTs)::
* Multi-dimensional Transforms::

Parallel FFTW

* Multi-threaded FFTW::
* Thread safety::

Multi-threaded FFTW

* Installation and Supported Hardware/Software::
* Usage of Multi-threaded FFTW::
* How Many Threads to Use?::

Calling FFTW from Fortran

* Fortran-interface routines::
* FFTW Constants in Fortran::
* Fortran Examples::
* Wisdom of Fortran?::

Installation and Customization

* Installation on Unix::
* Installation on non-Unix systems::
* Cycle Counters::
* Generating your own code::


File: fftw3.info,  Node: Introduction,  Next: Tutorial,  Prev: Top,  Up: Top

Introduction
************

   This manual documents version 3.0.1 of FFTW, the _Fastest Fourier
Transform in the West_.  FFTW is a comprehensive collection of fast C
routines for computing the discrete Fourier transform (DFT) and various
special cases thereof.
   * FFTW computes the DFT of complex data, real data, even-   or
     odd-symmetric real data (these symmetric transforms are usually
     known as the discrete cosine or sine transform, respectively), and
     the   discrete Hartley transform (DHT) of real data.

   * The input data can have arbitrary size.         FFTW employs O(n
     log n)  algorithms for all sizes, including        prime numbers.

   * FFTW supports arbitrary n-dimensional data.

   * FFTW supports the SSE, SSE2, 3DNow!, and Altivec instruction
     sets.

   * FFTW 3.0.1 includes parallel (multi-threaded) transforms   for
     shared-memory systems.    FFTW 3.0.1 does not include
     distributed-memory parallel   transforms, but we plan to implement
     an MPI version soon.  (Meanwhile,   you can use the MPI
     implementation from FFTW 2.1.3.)

   We assume herein that you are familiar with the properties and uses
of the DFT that are relevant to your application.  Otherwise, see e.g.
`The Fast Fourier Transform and Its Applications' by E. O. Brigham
(Prentice-Hall, Englewood Cliffs, NJ, 1988).  Our web page
(http://www.fftw.org) also has links to FFT-related information online.

   In order to use FFTW effectively, you need to learn one basic concept
of FFTW's internal structure: FFTW does not use a fixed algorithm for
computing the transform, but instead it adapts the DFT algorithm to
details of the underlying hardware in order to maximize performance.
Hence, the computation of the transform is split into two phases.
First, FFTW's "planner" "learns" the fastest way to compute the
transform on your machine.  The planner produces a data structure
called a "plan" that contains this information.  Subsequently, the plan
is "executed" to transform the array of input data as dictated by the
plan.  The plan can be reused as many times as needed.  In typical
high-performance applications, many transforms of the same size are
computed and, consequently, a relatively expensive initialization of
this sort is acceptable.  On the other hand, if you need a single
transform of a given size, the one-time cost of the planner becomes
significant.  For this case, FFTW provides fast planners based on
heuristics or on previously computed plans.

   FFTW supports transforms of data with arbitrary size, rank,
multiplicity, and a general memory layout.  In simple cases, however,
this generality may be unnecessary and confusing.  Consequently, we
organized the interface to FFTW into three levels of increasing
generality.
   * The "basic interface" computes a single       transform of
     contiguous data.

   * The "advanced interface" computes transforms       of multiple or
     strided arrays.

   * The "guru" interface supports the most general data       layouts,
     multiplicities, and strides.
   We expect that most users will be best served by the basic interface,
whereas the guru interface requires careful attention to the
documentation to avoid problems.

   Besides the automatic performance adaptation performed by the
planner, it is also possible for advanced users to customize FFTW
manually.  For example, if code space is a concern, we provide a tool
that links only the subset of FFTW needed by your application.
Conversely, you may need to extend FFTW because the standard
distribution is not sufficient for your needs.  For example, the
standard FFTW distribution works most efficiently for arrays whose size
can be factored into small primes (2, 3, 5, and 7), and otherwise it
uses a slower general-purpose routine.  If you need efficient
transforms of other sizes, you can use FFTW's code generator, which
produces fast C programs ("codelets") for any particular array size you
may care about.  For example, if you need transforms of size 513 = 19 x
3^3, you can customize FFTW to support the factor 19 efficiently.

   For more information regarding FFTW, see the paper, "FFTW: An
adaptive software architecture for the FFT," by M. Frigo and S. G.
Johnson, which appeared in the 23rd International Conference on
Acoustics, Speech, and Signal Processing (`Proc. ICASSP 1998' 3, p.
1381).  See also, "The Fastest Fourier Transform in the West," by M.
Frigo and S. G. Johnson, which is the technical report MIT-LCS-TR-728
(Sep. '97).  The code generator is described in the paper "A fast
Fourier transform compiler", by M. Frigo, in the `Proceedings of the
1999 ACM SIGPLAN Conference on Programming Language Design and
Implementation (PLDI), Atlanta, Georgia, May 1999'.  These papers,
along with the latest version of FFTW, the FAQ, benchmarks, and other
links, are available at the FFTW home page (http://www.fftw.org).

   The current version of FFTW incorporates many good ideas from the
past thirty years of FFT literature.  In one way or another, FFTW uses
the Cooley-Tukey algorithm, the prime factor algorithm, Rader's
algorithm for prime sizes, and a split-radix algorithm (with a
variation due to Dan Bernstein).  FFTW's code generator also produces
new algorithms that we do not completely understand.  The reader is
referred to the cited papers for the appropriate references.

   The rest of this manual is organized as follows.  We first discuss
the sequential (single-processor) implementation.  We start by
describing the basic interface/features of FFTW in *Note Tutorial::.
The following chapter discusses *Note Other Important Topics::,
including *Note Data Alignment::, the storage scheme of
multi-dimensional arrays (*note Multi-dimensional Array Format::), and
FFTW's mechanism for storing plans on disk (*note Words of
Wisdom-Saving Plans::).  Next, *Note FFTW Reference:: provides
comprehensive documentation of all FFTW's features.  Parallel
transforms are discussed in their own chapter *Note Parallel FFTW::.
Fortran programmers can also use FFTW, as described in *Note Calling
FFTW from Fortran::.  *Note Installation and Customization:: explains
how to install FFTW in your computer system and how to adapt FFTW to
your needs.  License and copyright information is given in *Note
License and Copyright::.  Finally, we thank all the people who helped
us in *Note Acknowledgments::.


File: fftw3.info,  Node: Tutorial,  Next: Other Important Topics,  Prev: Introduction,  Up: Top

Tutorial
********

* Menu:

* Complex One-Dimensional DFTs::
* Complex Multi-Dimensional DFTs::
* One-Dimensional DFTs of Real Data::
* Multi-Dimensional DFTs of Real Data::
* More DFTs of Real Data::

   This chapter describes the basic usage of FFTW, i.e., how to compute
the Fourier transform of a single array.  This chapter tells the truth,
but not the _whole_ truth. Specifically, FFTW implements additional
routines and flags that are not documented here, although in many cases
we try to indicate where added capabilities exist.  For more complete
information, see *Note FFTW Reference::.  (Note that you need to
compile and install FFTW before you can use it in a program.  For the
details of the installation, see *Note Installation and
Customization::.)

   We recommend that you read this tutorial in order.(1)  At the least,
read the first section (*note Complex One-Dimensional DFTs::) before
reading any of the others, even if your main interest lies in one of
the other transform types.

   Users of FFTW version 2 and earlier may also want to read *Note
Upgrading from FFTW version 2::.

   ---------- Footnotes ----------

   (1) You can read the tutorial in bit-reversed order after computing
your first transform.


File: fftw3.info,  Node: Complex One-Dimensional DFTs,  Next: Complex Multi-Dimensional DFTs,  Prev: Tutorial,  Up: Tutorial

Complex One-Dimensional DFTs
============================

     Plan: To bother about the best method of accomplishing an
     accidental result.  [Ambrose Bierce, `The Enlarged Devil's
     Dictionary'.]

   The basic usage of FFTW to compute a one-dimensional DFT of size `N'
is simple, and it typically looks something like this code:

     #include <fftw3.h>
     ...
     {
         fftw_complex *in, *out;
         fftw_plan p;
         ...
         in = fftw_malloc(sizeof(fftw_complex) * N);
         out = fftw_malloc(sizeof(fftw_complex) * N);
         p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
         ...
         fftw_execute(p); /* repeat as needed */
         ...
         fftw_destroy_plan(p);
         fftw_free(in); fftw_free(out);
     }

   (When you compile, you must also link with the `fftw3' library, e.g.
`-lfftw3 -lm' on Unix systems.)

   First you allocate the input and output arrays.  You can allocate
them in any way that you like, but we recommend using `fftw_malloc',
which behaves like `malloc' except that it properly aligns the array
when SIMD instructions (such as SSE and Altivec) are available (*note
SIMD alignment and fftw_malloc::).

   The data is an array of type `fftw_complex', which is by default a
`double[2]' composed of the real (`in[i][0]') and imaginary
(`in[i][1]') parts of a complex number.

   The next step is to create a "plan", which is an object that
contains all the data that FFTW needs to compute the FFT.  This
function creates the plan:

     fftw_plan fftw_plan_dft_1d(int n, fftw_complex *in, fftw_complex *out,
                                int sign, unsigned flags);

   The first argument, `n', is the size of the transform you are trying
to compute.  The size `n' can be any positive integer, but sizes that
are products of small factors are transformed most efficiently
(although prime sizes still use an O(n log n)  algorithm).

   The next two arguments are pointers to the input and output arrays of
the transform.  These pointers can be equal, indicating an "in-place"
transform.

   The fourth argument, `sign', can be either `FFTW_FORWARD' (`-1') or
`FFTW_BACKWARD' (`+1'), and indicates the direction of the transform
you are interested in; technically, it is the sign of the exponent in
the transform.

   The `flags' argument is usually either `FFTW_MEASURE' or
`FFTW_ESTIMATE'.  `FFTW_MEASURE' instructs FFTW to run and measure the
execution time of several FFTs in order to find the best way to compute
the transform of size `n'.  This process takes some time (usually a few
seconds), depending on your machine and on the size of the transform.
`FFTW_ESTIMATE', on the contrary, does not run any computation and just
builds a reasonable plan that is probably sub-optimal.  In short, if
your program performs many transforms of the same size and
initialization time is not important, use `FFTW_MEASURE'; otherwise use
the estimate.  The data in the `in'/`out' arrays is _overwritten_
during `FFTW_MEASURE' planning, so such planning should be done
_before_ the input is initialized by the user.

   Once the plan has been created, you can use it as many times as you
like for transforms on the specified `in'/`out' arrays, computing the
actual transforms via `fftw_execute(plan)':
     void fftw_execute(const fftw_plan plan);

   If you want to transform a _different_ array of the same size, you
can create a new plan with `fftw_plan_dft_1d' and FFTW automatically
reuses the information from the previous plan, if possible.
(Alternatively, with the "guru" interface you can apply a given plan to
a different array, if you are careful.  *Note FFTW Reference::.)

   When you are done with the plan, you deallocate it by calling
`fftw_destroy_plan(plan)':
     void fftw_destroy_plan(fftw_plan plan);
   Arrays allocated with `fftw_malloc' should be deallocated by
`fftw_free' rather than the ordinary `free' (or, heaven forbid,
`delete').

   The DFT results are stored in-order in the array `out', with the
zero-frequency (DC) component in `out[0]'.  If `in != out', the
transform is "out-of-place" and the input array `in' is not modified.
Otherwise, the input array is overwritten with the transform.

   Users should note that FFTW computes an _unnormalized_ DFT.  Thus,
computing a forward followed by a backward transform (or vice versa)
results in the original array scaled by `n'.  For the definition of the
DFT, see *Note What FFTW Really Computes::.

   If you have a C compiler, such as `gcc', that supports the recent
C99 standard, and you `#include <complex.h>' _before_ `<fftw3.h>', then
`fftw_complex' is the native double-precision complex type and you can
manipulate it with ordinary arithmetic.  Otherwise, FFTW defines its
own complex type, which is bit-compatible with the C99 complex type.
*Note Complex numbers::.  (The C++ `<complex>' template class may also
be usable via a typecast.)

   Single and long-double precision versions of FFTW may be installed;
to use them, replace the `fftw_' prefix by `fftwf_' or `fftwl_' and
link with `-lfftw3f' or `-lfftw3l', but use the _same_ `<fftw3.h>'
header file.

   Many more flags exist besides `FFTW_MEASURE' and `FFTW_ESTIMATE'.
For example, use `FFTW_PATIENT' if you're willing to wait even longer
for a possibly even faster plan (*note FFTW Reference::).  You can also
save plans for future use, as described by *Note Words of Wisdom-Saving
Plans::.


File: fftw3.info,  Node: Complex Multi-Dimensional DFTs,  Next: One-Dimensional DFTs of Real Data,  Prev: Complex One-Dimensional DFTs,  Up: Tutorial

Complex Multi-Dimensional DFTs
==============================

   Multi-dimensional transforms work much the same way as
one-dimensional transforms: you allocate arrays of `fftw_complex'
(preferably using `fftw_malloc'), create an `fftw_plan', execute it as
many times as you want with `fftw_execute(plan)', and clean up with
`fftw_destroy_plan(plan)' (and `fftw_free').  The only difference is
the routine you use to create the plan:

     fftw_plan fftw_plan_dft_2d(int nx, int ny,
                                fftw_complex *in, fftw_complex *out,
                                int sign, unsigned flags);
     fftw_plan fftw_plan_dft_3d(int nx, int ny, int nz,
                                fftw_complex *in, fftw_complex *out,
                                int sign, unsigned flags);
     fftw_plan fftw_plan_dft(int rank, const int *n,
                             fftw_complex *in, fftw_complex *out,
                             int sign, unsigned flags);

   These routines create plans for `nx' by `ny' two-dimensional (2d)
transforms, `nx' by `ny' by `nz' 3d transforms, and arbitrary
`rank'-dimensional transforms, respectively.  In the third case, `n' is
a pointer to an array `n[rank]' denoting an `n[0]' by `n[1]' by ... by
`n[rank-1]' transform.  All of these transforms operate on contiguous
arrays in the C-standard "row-major" order, so that the last dimension
has the fastest-varying index in the array.  This layout is described
further in *Note Multi-dimensional Array Format::.

   You may have noticed that all the planner routines described so far
have overlapping functionality.  For example, you can plan a 1d or 2d
transform by using `fftw_plan_dft' with a `rank' of `1' or `2', or even
by calling `fftw_plan_dft_3d' with `nx' and/or `ny' equal to `1' (with
no loss in efficiency).  This pattern continues, and FFTW's planning
routines in general form a "partial order," sequences of interfaces
with strictly increasing generality but correspondingly greater
complexity.

   `fftw_plan_dft' is the most general complex-DFT routine that we
describe in this tutorial, but there are also the advanced and guru
interfaces, which allow one to efficiently combine multiple/strided
transforms into a single FFTW plan, transform a subset of a larger
multi-dimensional array, and/or to handle more general complex-number
formats.  For more information, see *Note FFTW Reference::.


File: fftw3.info,  Node: One-Dimensional DFTs of Real Data,  Next: Multi-Dimensional DFTs of Real Data,  Prev: Complex Multi-Dimensional DFTs,  Up: Tutorial

One-Dimensional DFTs of Real Data
=================================

   In many practical applications, the input data `in[i]' are purely
real numbers, in which case the DFT output satisfies the "Hermitian"
redundancy: `out[i]' is the conjugate of `out[n-i]'.  It is possible to
take advantage of these circumstances in order to achieve roughly a
factor of two improvement in both speed and memory usage.

   In exchange for these speed and space advantages, the user sacrifices
some of the simplicity of FFTW's complex transforms. First of all, the
input and output arrays are of _different sizes and types_: the input
is `n' real numbers, while the output is `n/2+1' complex numbers (the
non-redundant outputs); this also requires slight "padding" of the
input array for in-place transforms.  Second, the inverse transform
(complex to real) has the side-effect of _destroying its input array_,
by default.  Neither of these inconveniences should pose a serious
problem for users, but it is important to be aware of them.

   The routines to perform real-data transforms are almost the same as
those for complex transforms: you allocate arrays of `double' and/or
`fftw_complex' (preferably using `fftw_malloc'), create an `fftw_plan',
execute it as many times as you want with `fftw_execute(plan)', and
clean up with `fftw_destroy_plan(plan)' (and `fftw_free').  The only
differences are that the input (or output) is of type `double' and
there are new routines to create the plan.  In one dimension:

     fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out,
                                    unsigned flags);
     fftw_plan fftw_plan_dft_c2r_1d(int n, fftw_complex *in, double *out,
                                    unsigned flags);

   for the real input to complex-Hermitian output ("r2c") and
complex-Hermitian input to real output ("c2r") transforms.  Unlike the
complex DFT planner, there is no `sign' argument.  Instead, r2c DFTs
are always `FFTW_FORWARD' and c2r DFTs are always `FFTW_BACKWARD'.
(For single/long-double precision `fftwf' and `fftwl', `double' should
be replaced by `float' and `long double', respectively.)

   Here, `n' is the "logical" size of the DFT, not necessarily the
physical size of the array.  In particular, the real (`double') array
has `n' elements, while the complex (`fftw_complex') array has `n/2+1'
elements (where the division is rounded down).  For an in-place
transform, `in' and `out' are aliased to the same array, which must be
big enough to hold both; so, the real array would actually have
`2*(n/2+1)' elements, where the elements beyond the first `n' are
unused padding.  The kth element of the complex array is exactly the
same as the kth element of the corresponding complex DFT.  All positive
`n' are supported; products of small factors are most efficient, but an
O(n log n)  algorithm is used even for prime sizes.

   As noted above, the c2r transform destroys its input array even for
out-of-place transforms.  This can be prevented, if necessary, by
including `FFTW_PRESERVE_INPUT' in the `flags', with unfortunately some
sacrifice in performance.  This flag is also not currently supported
for multi-dimensional real DFTs (next section).

   Readers familiar with DFTs of real data will recall that the 0th (the
"DC") and `n/2'-th (the "Nyquist" frequency, when `n' is even) elements
of the complex output are purely real.  Some implementations therefore
store the Nyquist element where the DC imaginary part would go, in
order to make the input and output arrays the same size.  Such packing,
however, does not generalize well to multi-dimensional transforms, and
the space savings are miniscule in any case; FFTW does not support it.

   An alternate interface for one-dimensional r2c and c2r DFTs can be
found in the `r2r' interface (*note The Halfcomplex-format DFT::), with
"halfcomplex"-format output that _is_ the same size (and type) as the
input array.  That interface, although it is not very useful for
multi-dimensional transforms, may sometimes yield better performance.


File: fftw3.info,  Node: Multi-Dimensional DFTs of Real Data,  Next: More DFTs of Real Data,  Prev: One-Dimensional DFTs of Real Data,  Up: Tutorial

Multi-Dimensional DFTs of Real Data
===================================

   Multi-dimensional DFTs of real data use the following planner
routines:

     fftw_plan fftw_plan_dft_r2c_2d(int nx, int ny,
                                    double *in, fftw_complex *out,
                                    unsigned flags);
     fftw_plan fftw_plan_dft_r2c_3d(int nx, int ny, int nz,
                                    double *in, fftw_complex *out,
                                    unsigned flags);
     fftw_plan fftw_plan_dft_r2c(int rank, const int *n,
                                 double *in, fftw_complex *out,
                                 unsigned flags);

   as well as the corresponding `c2r' routines with the input/output
types swapped.  These routines work similarly to their complex
analogues, except for the fact that here the complex output array is cut
roughly in half and the real array requires padding for in-place
transforms (as in 1d, above).

   As before, `n' is the logical size of the array, and the
consequences of this on the the format of the complex arrays deserve
careful attention.  Suppose that the real data has dimensions n1 x n2 x
n3 x ... x nd  (in row-major order).  Then, after an r2c transform, the
output is an n1 x n2 x n3 x ... x (nd/2 + 1)  array of `fftw_complex'
values in row-major order, corresponding to slightly over half of the
output of the corresponding complex DFT.  (The division is rounded
down.)  The ordering of the data is otherwise exactly the same as in
the complex-DFT case.

   Since the complex data is slightly larger than the real data, some
complications arise for in-place transforms.  In this case, the final
dimension of the real data must be padded with extra values to
accommodate the size of the complex data--two values if the last
dimension is even and one if it is odd.  That is, the last dimension of
the real data must physically contain 2 * (nd/2+1) `double' values
(exactly enough to hold the complex data).  This physical array size
does not, however, change the _logical_ array size--only nd values are
actually stored in the last dimension, and nd is the last dimension
passed to the plan-creation routine.

   For example, consider the transform of a two-dimensional real array
of size `nx' by `ny'.  The output of the r2c transform is a
two-dimensional complex array of size `nx' by `ny/2+1', where the `y'
dimension has been cut nearly in half because of redundancies in the
output.  Because `fftw_complex' is twice the size of `double', the
output array is slightly bigger than the input array.  Thus, if we want
to compute the transform in place, we must _pad_ the input array so
that it is of size `nx' by `2*(ny/2+1)'.  If `ny' is even, then there
are two padding elements at the end of each row (which need not be
initialized, as they are only used for output).

   These transforms are unnormalized, so an r2c followed by a c2r
transform (or vice versa) will result in the original data scaled by
the number of real data elements--that is, the product of the (logical)
dimensions of the real data.

   (Because the last dimension is treated specially, if it is equal to
`1' the transform is _not_ equivalent to a lower-dimensional r2c/c2r
transform.  In that case, the last complex dimension also has size `1'
(`=1/2+1'), and no advantage is gained over the complex transforms.)


File: fftw3.info,  Node: More DFTs of Real Data,  Prev: Multi-Dimensional DFTs of Real Data,  Up: Tutorial

More DFTs of Real Data
======================

* Menu:

* The Halfcomplex-format DFT::
* Real even/odd DFTs (cosine/sine transforms)::
* The Discrete Hartley Transform::

   FFTW supports several other transform types via a unified "r2r"
(real-to-real) interface, so called because it takes a real (`double')
array and outputs a real array of the same size.  These r2r transforms
currently fall into three categories: DFTs of real input and
complex-Hermitian output in halfcomplex format, DFTs of real input with
even/odd symmetry (a.k.a. discrete cosine/sine transforms, DCTs/DSTs),
and discrete Hartley transforms (DHTs), all described in more detail by
the following sections.

   The r2r transforms follow the by now familiar interface of creating
an `fftw_plan', executing it with `fftw_execute(plan)', and destroying
it with `fftw_destroy_plan(plan)'.  Furthermore, all r2r transforms
share the same planner interface:

     fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out,
                                fftw_r2r_kind kind, unsigned flags);
     fftw_plan fftw_plan_r2r_2d(int nx, int ny, double *in, double *out,
                                fftw_r2r_kind kindx, fftw_r2r_kind kindy,
                                unsigned flags);
     fftw_plan fftw_plan_r2r_3d(int nx, int ny, int nz,
                                double *in, double *out,
                                fftw_r2r_kind kindx,
                                fftw_r2r_kind kindy,
                                fftw_r2r_kind kindz,
                                unsigned flags);
     fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out,
                             const fftw_r2r_kind *kind, unsigned flags);

   Just as for the complex DFT, these plan 1d/2d/3d/multi-dimensional
transforms for contiguous arrays in row-major order, transforming (real)
input to output of the same size, where `n' specifies the _physical_
dimensions of the arrays.  All positive `n' are supported (with the
exception of `n=1' for the `FFTW_REDFT00' kind, noted in the real-even
subsection below); products of small factors are most efficient
(factorizing `n-1' and `n+1' for `FFTW_REDFT00' and `FFTW_RODFT00'
kinds, described below), but an O(n log n)  algorithm is used even for
prime sizes.

   Each dimension has a "kind" parameter, of type `fftw_r2r_kind',
specifying the kind of r2r transform to be used for that dimension.
(In the case of `fftw_plan_r2r', this is an array `kind[rank]' where
`kind[i]' is the transform kind for the dimension `n[i]'.)  The kind
can be one of a set of predefined constants, defined in the following
subsections.

   In other words, FFTW computes the separable product of the specified
r2r transforms over each dimension, which can be used e.g. for partial
differential equations with mixed boundary conditions.  (For some r2r
kinds, notably the halfcomplex DFT and the DHT, such a separable
product is somewhat problematic in more than one dimension, however, as
is described below.)

   In the current version of FFTW, all r2r transforms except for the
halfcomplex type are computed via pre- or post-processing of
halfcomplex transforms, and they are therefore not as fast as they
could be.  Since most other general DCT/DST codes employ a similar
algorithm, however, FFTW's implementation should provide at least
competitive performance.


File: fftw3.info,  Node: The Halfcomplex-format DFT,  Next: Real even/odd DFTs (cosine/sine transforms),  Prev: More DFTs of Real Data,  Up: More DFTs of Real Data

The Halfcomplex-format DFT
--------------------------

   An r2r kind of `FFTW_R2HC' ("r2hc") corresponds to an r2c DFT (*note
One-Dimensional DFTs of Real Data::) but with "halfcomplex" format
output, and may sometimes be faster and/or more convenient than the
latter.  The inverse "hc2r" transform is of kind `FFTW_HC2R'.  This
consists of the non-redundant half of the complex output for a 1d
real-input DFT of size `n', stored as a sequence of `n' real numbers
(`double') in the format:

   r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1

   Here, rk is the real part of the kth output, and ik is the imaginary
part.  (Division by 2 is rounded down.) For a halfcomplex array
`hc[n]', the kth component thus has its real part in `hc[k]' and its
imaginary part in `hc[n-k]', with the exception of `k' `==' `0' or
`n/2' (the latter only if `n' is even)--in these two cases, the
imaginary part is zero due to symmetries of the real-input DFT, and is
not stored.  Thus, the r2hc transform of `n' real values is a
halfcomplex array of length `n', and vice versa for hc2r.

   Aside from the differing format, the output of
`FFTW_R2HC'/`FFTW_HC2R' is otherwise exactly the same as for the
corresponding 1d r2c/c2r transform (i.e. `FFTW_FORWARD'/`FFTW_BACKWARD'
transforms, respectively).  Recall that these transforms are
unnormalized, so r2hc followed by hc2r will result in the original data
multiplied by `n'.  Furthermore, like the c2r transform, an
out-of-place hc2r transform will _destroy its input_ array.

   Although these halfcomplex transforms can be used with the
multi-dimensional r2r interface, the interpretation of such a separable
product of transforms along each dimension is problematic.  For example,
consider a two-dimensional `nx' by `ny', r2hc by r2hc transform planned
by `fftw_plan_r2r_2d(nx, ny, in, out, FFTW_R2HC, FFTW_R2HC,
FFTW_MEASURE)'.  Conceptually, FFTW first transforms the rows (of size
`ny') to produce halfcomplex rows, and then transforms the columns (of
size `nx').  Half of these column transforms, however, are of imaginary
parts, and should therefore be multiplied by i and combined with the
r2hc transforms of the real columns to produce the 2d DFT amplitudes;
FFTW's r2r transform does _not_ perform this combination for you.
Thus, if a multi-dimensional real-input/output DFT is required, we
recommend using the ordinary r2c/c2r interface (*note Multi-Dimensional
DFTs of Real Data::).


File: fftw3.info,  Node: Real even/odd DFTs (cosine/sine transforms),  Next: The Discrete Hartley Transform,  Prev: The Halfcomplex-format DFT,  Up: More DFTs of Real Data

Real even/odd DFTs (cosine/sine transforms)
-------------------------------------------

   The Fourier transform of a real-even function f(-x) = f(x) is
real-even, and i times the Fourier transform of a real-odd function
f(-x) = -f(x) is real-odd.  Similar results hold for a discrete Fourier
transform, and thus for these symmetries the need for complex
inputs/outputs is entirely eliminated.  Moreover, one gains a factor of
two in speed/space from the fact that the data are real, and an
additional factor of two from the even/odd symmetry: only the
non-redundant (first) half of the array need be stored.  The result is
the real-even DFT ("REDFT") and the real-odd DFT ("RODFT"), also known
as the discrete cosine and sine transforms ("DCT" and "DST"),
respectively.

   (In this section, we describe the 1d transforms; multi-dimensional
transforms are just a separable product of these transforms operating
along each dimension.)

   Because of the discrete sampling, one has an additional choice: is
the data even/odd around a sampling point, or around the point halfway
between two samples?  The latter corresponds to _shifting_ the samples
by _half_ an interval, and gives rise to several transform variants
denoted by REDFTab and RODFTab: a and b are 0 or 1, and indicate
whether the input (a) and/or output (b) are shifted by half a sample (1
means it is shifted).  These are also known as types I-IV of the DCT
and DST, and all four types are supported by FFTW's r2r interface.(1)

   The r2r kinds for the various REDFT and RODFT types supported by
FFTW, along with the boundary conditions at both ends of the _input_
array (`n' real numbers `in[j=0..n-1]'), are:

   * `FFTW_REDFT00' (DCT-I): even around j=0 and even around j=n-1.

   * `FFTW_REDFT10' (DCT-II): even around j=-0.5 and even around
     j=n-0.5.

   * `FFTW_REDFT01' (DCT-III): even around j=0 and odd around j=n.

   * `FFTW_REDFT11' (DCT-IV): even around j=-0.5 and odd around j=n-0.5.

   * `FFTW_RODFT00' (DST-I): odd around j=-1 and odd around j=n.

   * `FFTW_RODFT10' (DST-II): odd around j=-0.5 and odd around j=n-0.5.

   * `FFTW_RODFT01' (DST-III): odd around j=-1 and even around j=n-1.

   * `FFTW_RODFT11' (DST-IV): odd around j=-0.5 and even around j=n-0.5.


   Note that these symmetries apply to the "logical" array being
transformed; *there are no constraints on your physical input data*.
So, for example, if you specify a size-5 REDFT00 (DCT-I) of the data
abcde, it corresponds to the DFT of the logical even array abcdedcb of
size 8.  A size-4 REDFT10 (DCT-II) of the data abcd corresponds to the
size-8 logical DFT of the even array abcddcba, shifted by half a sample.

   All of these transforms are invertible.  The inverse of R*DFT00 is
R*DFT00; of R*DFT10 is R*DFT01 and vice versa; and of R*DFT11 is
R*DFT11.  However, the transforms computed by FFTW are unnormalized,
exactly like the corresponding real and complex DFTs, so computing a
transform followed by its inverse yields the original array scaled by
N, where N is the _logical_ DFT size.  For REDFT00, N=2(n-1); for
RODFT00, N=2(n+1); otherwise, N=2n.

   Note that the boundary conditions of the transform output array are
given by the input boundary conditions of the inverse transform.  Thus,
the above transforms are all inequivalent in terms of input/output
boundary conditions, even neglecting the 0.5 shift difference.

   FFTW is most efficient when N is a product of small factors; note
that this _differs_ from the factorization of the physical size `n' for
REDFT00 and RODFT00!  There is another oddity: `n=1' REDFT00 transforms
correspond to N=0, and so are _not defined_ (the planner will return
`NULL').  Otherwise, any positive `n' is supported.

   For the precise mathematical definitions of these transforms as used
by FFTW, see *Note What FFTW Really Computes::.  (For people accustomed
to the DCT/DST, FFTW's definitions have a coefficient of 2 in front of
the cos/sin functions so that they correspond precisely to an even/odd
DFT of size N.)

Which type do you need?
.......................

   Since the required flavor of even/odd DFT depends upon your problem,
you are the best judge of this choice, but we can make a few comments on
relative efficiency to help you in your selection.  In particular,
R*DFT01 and R*DFT10 tend to be slightly faster than R*DFT11 (especially
for odd sizes), while the R*DFT00 transforms are significantly
slower.(2)

   Thus, if only the boundary conditions on the transform inputs are
specified, we generally recommend R*DFT10 over R*DFT00 and R*DFT01 over
R*DFT11 (unless the half-sample shift or the self-inverse property is
significant for your problem).

   If performance is important to you and you are using only small sizes
(say n<200), e.g. for multi-dimensional transforms, then you might
consider generating hard-coded transforms of those sizes and types that
you are interested in (*note Generating your own code::).

   We are interested in hearing what types of symmetric transforms you
find most useful.

   ---------- Footnotes ----------

   (1) There are also type V-VIII transforms, which correspond to a
logical DFT of _odd_ size N, independent of whether the physical size
`n' is odd, but we do not support these variants.

   (2) R*DFT00 is slower in FFTW because we discovered that the
standard algorithm for computing this by a pre/post-processed real
DFT--the algorithm used in FFTPACK, Numerical Recipes, and other
sources for decades now--has serious numerical problems: it already
loses several decimal places of accuracy for 16k sizes.  There seem to
be only two alternatives in the literature that do not suffer similarly:
a recursive decomposition into smaller DCTs, which would require a large
set of codelets for efficiency and generality, or sacrificing a factor
of two in speed to use a real DFT of twice the size.  We currently
employ the latter technique.


File: fftw3.info,  Node: The Discrete Hartley Transform,  Prev: Real even/odd DFTs (cosine/sine transforms),  Up: More DFTs of Real Data

The Discrete Hartley Transform
------------------------------

   The discrete Hartley transform (DHT) is an invertible linear
transform closely related to the DFT.  In the DFT, one multiplies each
input by cos - i * sin (a complex exponential), whereas in the DHT each
input is multiplied by simply cos + sin.  Thus, the DHT transforms `n'
real numbers to `n' real numbers, and has the convenient property of
being its own inverse.  In FFTW, a DHT (of any positive `n') can be
specified by an r2r kind of `FFTW_DHT'.

   If you are planning to use the DHT because you've heard that it is
"faster" than the DFT (FFT), *stop here*.  That story is an old but
enduring misconception that was debunked in 1987: a properly designed
real-input FFT (such as FFTW's) has no more operations in general than
an FHT.  Moreover, in FFTW, the DHT is ordinarily _slower_ than the DFT
for composite sizes (see below).

   Like the DFT, in FFTW the DHT is unnormalized, so computing a DHT of
size `n' followed by another DHT of the same size will result in the
original array multiplied by `n'.

   The DHT was originally proposed as a more efficient alternative to
the DFT for real data, but it was subsequently shown that a specialized
DFT (such as FFTW's r2hc or r2c transforms) could be just as fast.  In
FFTW, the DHT is actually computed by post-processing an r2hc
transform, so there is ordinarily no reason to prefer it from a
performance perspective.(1) However, we have heard rumors that the DHT
might be the most appropriate transform in its own right for certain
applications, and we would be very interested to hear from anyone who
finds it useful.

   If `FFTW_DHT' is specified for multiple dimensions of a
multi-dimensional transform, FFTW computes the separable product of 1d
DHTs along each dimension.  Unfortunately, this is not quite the same
thing as a true multi-dimensional DHT; you can compute the latter, if
necessary, with at most `rank-1' post-processing passes [see e.g. H.
Hao and R. N. Bracewell, Proc. IEEE 75, 264-266 (1987)].

   For the precise mathematical definition of the DHT as used by FFTW,
see *Note What FFTW Really Computes::.

   ---------- Footnotes ----------

   (1) We provide the DHT mainly as a byproduct of some internal
algorithms. FFTW computes a real input/output DFT of _prime_ size by
re-expressing it as a DHT plus post/pre-processing and then using
Rader's prime-DFT algorithm adapted to the DHT.


File: fftw3.info,  Node: Other Important Topics,  Next: FFTW Reference,  Prev: Tutorial,  Up: Top

Other Important Topics
**********************

* Menu:

* Data Alignment::
* Multi-dimensional Array Format::
* Words of Wisdom-Saving Plans::
* Caveats in Using Wisdom::


File: fftw3.info,  Node: Data Alignment,  Next: Multi-dimensional Array Format,  Prev: Other Important Topics,  Up: Other Important Topics

Data Alignment
==============

* Menu:

* SIMD alignment and fftw_malloc::
* Stack alignment on x86::

   In order to get the best performance from FFTW, one needs to be
somewhat aware of two problems related to data alignment on x86
(Pentia) architectures: alignment of allocated arrays (for use with
SIMD acceleration), and alignment of the stack.


File: fftw3.info,  Node: SIMD alignment and fftw_malloc,  Next: Stack alignment on x86,  Prev: Data Alignment,  Up: Data Alignment

SIMD alignment and fftw_malloc
------------------------------

   SIMD, which stands for "Single Instruction Multiple Data," is a set
of special operations supported by some processors to perform a single
operation on several numbers (usually 2 or 4) simultaneously.  SIMD
floating-point instructions are available on several popular CPUs:
SSE/SSE2 (single/double precision) on Pentium III/IV and higher, 3DNow!
(single precision) on the AMD K7 and higher, and AltiVec (single
precision) on the PowerPC G4 and higher.  FFTW can be compiled to
support the SIMD instructions on any of these systems.

   A program linking to an FFTW library compiled with SIMD support can
obtain a nonnegligible speedup for most complex and r2c/c2r transforms.
In order to obtain this speedup, however, the arrays of complex (or
real) data passed to FFTW must be specially aligned in memory
(typically 16-byte aligned), and often this alignment is more stringent
than that provided by the usual `malloc' (etc.)  allocation routines.

   In order to guarantee proper alignment for SIMD, therefore, in case
your program is ever linked against a SIMD-using FFTW, we recommend
allocating your transform data with `fftw_malloc' and de-allocating it
with `fftw_free'.  These have exactly the same interface and behavior as
`malloc'/`free', except that for a SIMD FFTW they ensure that the
returned pointer has the necessary alignment (by calling `memalign' or
its equivalent on your OS).

   You are not _required_ to use `fftw_malloc'.  You can allocate your
data in any way that you like, from `malloc' to `new' (in C++) to a
static array declaration.  If the array happens not to be properly
aligned, FFTW will not use the SIMD extensions.


File: fftw3.info,  Node: Stack alignment on x86,  Prev: SIMD alignment and fftw_malloc,  Up: Data Alignment

Stack alignment on x86
----------------------

   On the Pentium and subsequent x86 processors, there is a substantial
performance penalty if double-precision variables are not stored 8-byte
aligned; a factor of two or more is not unusual.  Unfortunately, the
stack (the place that local variables and subroutine arguments live) is
not guaranteed by the Intel ABI to be 8-byte aligned.

   Recent versions of `gcc' (as well as most other compilers, we are
told, such as Intel's, Metrowerks', and Microsoft's) are able to keep
the stack 8-byte aligned; `gcc' does this by default (see
`-mpreferred-stack-boundary' in the `gcc' documentation).  If you are
not certain whether your compiler maintains stack alignment by default,
it is a good idea to make sure.

   Unfortunately, `gcc' only _preserves_ the stack alignment--as a
result, if the stack starts off misaligned, it will always be
misaligned, with a disastrous effect on performance (in double
precision).  Fortunately, recent versions of glibc (on GNU/Linux)
provide a properly-aligned starting stack, but this was not the case
with a number of older versions, and we are not certain of the
situation on other operating systems.  Hopefully, as time goes by this
will become less of a concern, but if you want to be paranoid you can
copy the code from FFTW's `libbench2/aligned-main.c' to guarantee
alignment of your `main' function (with `gcc').


File: fftw3.info,  Node: Multi-dimensional Array Format,  Next: Words of Wisdom-Saving Plans,  Prev: Data Alignment,  Up: Other Important Topics

Multi-dimensional Array Format
==============================

   This section describes the format in which multi-dimensional arrays
are stored in FFTW.  We felt that a detailed discussion of this topic
was necessary.  Since several different formats are common, this topic
is often a source of confusion among users.

* Menu:

* Row-major Format::
* Column-major Format::
* Static Arrays in C::
* Dynamic Arrays in C::
* Dynamic Arrays in C-The Wrong Way::


File: fftw3.info,  Node: Row-major Format,  Next: Column-major Format,  Prev: Multi-dimensional Array Format,  Up: Multi-dimensional Array Format

Row-major Format
----------------

   The multi-dimensional arrays passed to `fftw_plan_dft' etcetera are
expected to be stored as a single contiguous block in "row-major" order
(sometimes called "C order").  Basically, this means that as you step
through adjacent memory locations, the first dimension's index varies
most slowly and the last dimension's index varies most quickly.

   To be more explicit, let us consider an array of rank d whose
dimensions are n1 x n2 x n3 x ... x nd . Now, we specify a location in
the array by a sequence of (zero-based) indices, one for each dimension:
(i1, i2, ..., id).  If the array is stored in row-major order, then
this element is located at the position id + nd * (id-1 + nd-1 * (... +
n2 * i1)).

   Note that, for the ordinary complex DFT, each element of the array
must be of type `fftw_complex'; i.e. a (real, imaginary) pair of
(double-precision) numbers.

   In the advanced FFTW interface, the physical dimensions n from which
the indices are computed can be different from (larger than) the
logical dimensions of the transform to be computed, in order to
transform a subset of a larger array.  Note also that, in the advanced
interface, the expression above is multiplied by a "stride" to get the
actual array index--this is useful in situations where each element of
the multi-dimensional array is actually a data structure (or another
array), and you just want to transform a single field. In the basic
interface, however, the stride is 1.


File: fftw3.info,  Node: Column-major Format,  Next: Static Arrays in C,  Prev: Row-major Format,  Up: Multi-dimensional Array Format

Column-major Format
-------------------

   Readers from the Fortran world are used to arrays stored in
"column-major" order (sometimes called "Fortran order").  This is
essentially the exact opposite of row-major order in that, here, the
_first_ dimension's index varies most quickly.

   If you have an array stored in column-major order and wish to
transform it using FFTW, it is quite easy to do.  When creating the
plan, simply pass the dimensions of the array to the planner in
_reverse order_.  For example, if your array is a rank three `N x M x
L' matrix in column-major order, you should pass the dimensions of the
array as if it were an `L x M x N' matrix (which it is, from the
perspective of FFTW).  This is done for you _automatically_ by the FFTW
Fortran interface (*note Calling FFTW from Fortran::).


File: fftw3.info,  Node: Static Arrays in C,  Next: Dynamic Arrays in C,  Prev: Column-major Format,  Up: Multi-dimensional Array Format

Static Arrays in C
------------------

   Multi-dimensional arrays declared statically (that is, at compile
time, not necessarily with the `static' keyword) in C are _already_ in
row-major order.  You don't have to do anything special to transform
them.  For example:

     {
          fftw_complex data[NX][NY][NZ];
          fftw_plan plan;
          ...
          plan = fftw_plan_dft_3d(NX, NY, NZ, &data[0][0][0], &data[0][0][0],
                                  FFTW_FORWARD, FFTW_ESTIMATE);
          ...
     }

   This will plan a 3d in-place transform of size `NX x NY x NZ'.
Notice how we took the address of the zero-th element to pass to the
planner (we could also have used a typecast).

   However, we tend to _discourage_ users from declaring their arrays
statically in this way, for two reasons.  First, this allocates the
array on the stack, which has a very limited size on most operating
systems (declaring an array with more than a few thousand elements will
often cause a crash).  Second, it may not optimally align the array if
you link with a SIMD FFTW (*note SIMD alignment and fftw_malloc::).
Instead, we recommend using `fftw_malloc', as described below.