This is fftw3.info, produced by makeinfo version 4.2 from fftw3.texi.

This manual is for FFTW (version 3.0.1, 5 June 2003).

   Copyright (C) 2003 Matteo Frigo.

   Copyright (C) 2003 Massachusetts Institute of Technology.

     Permission is granted to make and distribute verbatim copies of
     this manual provided the copyright notice and this permission
     notice are preserved on all copies.

     Permission is granted to copy and distribute modified versions of
     this manual under the conditions for verbatim copying, provided
     that the entire resulting derived work is distributed under the
     terms of a permission notice identical to this one.

     Permission is granted to copy and distribute translations of this
     manual into another language, under the above conditions for
     modified versions, except that this permission notice may be
     stated in a translation approved by the Free Software Foundation.
   
INFO-DIR-SECTION Texinfo documentation system
START-INFO-DIR-ENTRY
* fftw3: (fftw3).	FFTW User's Manual.
END-INFO-DIR-ENTRY


File: fftw3.info,  Node: Guru Real-data DFTs,  Next: Guru Real-to-real Transforms,  Prev: Guru Complex DFTs,  Up: Guru Interface

Guru Real-data DFTs
-------------------

     fftw_plan fftw_plan_guru_dft_r2c(
          int rank, const fftw_iodim *dims,
          int howmany_rank, const fftw_iodim *howmany_dims,
          double *in, fftw_complex *out,
          unsigned flags);
     
     fftw_plan fftw_plan_guru_split_dft_r2c(
          int rank, const fftw_iodim *dims,
          int howmany_rank, const fftw_iodim *howmany_dims,
          double *in, double *ro, double *io,
          unsigned flags);
     
     fftw_plan fftw_plan_guru_dft_c2r(
          int rank, const fftw_iodim *dims,
          int howmany_rank, const fftw_iodim *howmany_dims,
          fftw_complex *in, double *out,
          unsigned flags);
     
     fftw_plan fftw_plan_guru_split_dft_c2r(
          int rank, const fftw_iodim *dims,
          int howmany_rank, const fftw_iodim *howmany_dims,
          double *ri, double *ii, double *out,
          unsigned flags);

   Plan a real-input (r2c) or real-output (c2r), multi-dimensional DFT
with transform dimensions given by (`rank', `dims') over a
multi-dimensional vector (loop) of dimensions (`howmany_rank',
`howmany_dims').  `dims' and `howmany_dims' should point to
`fftw_iodim' arrays of length `rank' and `howmany_rank', respectively.
As for the basic and advanced interfaces, an r2c transform is
`FFTW_FORWARD' and a c2r transform is `FFTW_BACKWARD'.

   The _last_ dimension of `dims' is interpreted specially: that
dimension of the real array has size `dims[rank-1].n', but that
dimension of the complex array has size `dims[rank-1].n/2+1' (division
rounded down).  The strides, on the other hand, are taken to be exactly
as specified.  It is up to the user to specify the strides
appropriately for the peculiar dimensions of the data, and we do not
guarantee that the planner will succeed (return non-`NULL') for any
dimensions other than those described in *Note Real-data DFT Array
Format:: and generalized in *Note Advanced Real-data DFTs::.  (That is,
for an in-place transform, each individual dimension should be able to
operate in place.)

   `in' and `out' point to the input and output arrays for r2c and c2r
transforms, respectively.  For split arrays, `ri' and `ii' point to the
real and imaginary input arrays for a c2r transform, and `ro' and `io'
point to the real and imaginary output arrays for an r2c transform.
`in' and `ro' or `ri' and `out' may be the same, indicating an in-place
transform.

   `flags' is a bitwise OR (`|') of zero or more planner flags, as
defined in *Note Planner Flags::.

   In-place transforms of rank greater than 1 are currently only
supported for interleaved arrays.  For split arrays, the planner will
return `NULL'.


File: fftw3.info,  Node: Guru Real-to-real Transforms,  Next: Guru Execution of Plans,  Prev: Guru Real-data DFTs,  Up: Guru Interface

Guru Real-to-real Transforms
----------------------------

     fftw_plan fftw_plan_guru_r2r(int rank, const fftw_iodim *dims,
                                  int howmany_rank,
                                  const fftw_iodim *howmany_dims,
                                  double *in, double *out,
                                  const fftw_r2r_kind *kind,
                                  unsigned flags);

   Plan a real-to-real (r2r) multi-dimensional `FFTW_FORWARD' transform
with transform dimensions given by (`rank', `dims') over a
multi-dimensional vector (loop) of dimensions (`howmany_rank',
`howmany_dims').  `dims' and `howmany_dims' should point to
`fftw_iodim' arrays of length `rank' and `howmany_rank', respectively.

   The transform kind of each dimension is given by the `kind'
parameter, which should point to an array of length `rank'.  Valid
`fftw_r2r_kind' constants are given in *Note Real-to-Real Transform
Kinds::.

   `in' and `out' point to the real input and output arrays; they may
be the same, indicating an in-place transform.

   `flags' is a bitwise OR (`|') of zero or more planner flags, as
defined in *Note Planner Flags::.


File: fftw3.info,  Node: Guru Execution of Plans,  Prev: Guru Real-to-real Transforms,  Up: Guru Interface

Guru Execution of Plans
-----------------------

   Normally, one executes a plan for the arrays with which the plan was
created, by calling `fftw_execute(plan)' as described in *Note Using
Plans::.  However, it is possible to apply a given plan to a _different_
array using the guru functions detailed below, provided that the
following conditions are met:

   * The array size, strides, etcetera are the same (since those are
     set by the plan).

   * The input and output arrays are the same (in-place) or different
     (out-of-place) if the plan was originally created to be in-place or
     out-of-place, respectively.

   * For split arrays, the separations between the real and imaginary
     parts, `ii-ri' and `io-ro', are the same as they were for the
     input and output arrays when the plan was created.  (This
     condition is automatically satisfied for interleaved arrays.)

   * The "alignment" of the new input/output arrays is the same as that
     of the input/output arrays when the plan was created, unless the
     plan was created with the `FFTW_UNALIGNED' flag.  Here, the
     alignment is a platform-dependent quantity (for example, it is the
     address modulo 16 if SSE SIMD instructions are used, but the
     address modulo 4 for non-SIMD single-precision FFTW on the same
     machine).  In general, only arrays allocated with `fftw_malloc'
     are guaranteed to be equally aligned.


   If you are tempted to use this guru interface because you want to
transform a known bunch of arrays of the same size, *stop here* and go
use the advanced interface instead (*note Advanced Interface::)).

   The guru execute functions are:

     void fftw_execute_dft(
          const fftw_plan p,
          fftw_complex *in, fftw_complex *out);
     
     void fftw_execute_split_dft(
          const fftw_plan p,
          double *ri, double *ii, double *ro, double *io);
     
     void fftw_execute_dft_r2c(
          const fftw_plan p,
          double *in, fftw_complex *out);
     
     void fftw_execute_split_dft_r2c(
          const fftw_plan p,
          double *in, double *ro, double *io);
     
     void fftw_execute_dft_c2r(
          const fftw_plan p,
          fftw_complex *in, double *out);
     
     void fftw_execute_split_dft_c2r(
          const fftw_plan p,
          double *ri, double *ii, double *out);
     
     void fftw_execute_r2r(
          const fftw_plan p,
          double *in, double *out);

   These execute the `plan' to compute the corresponding transform on
the input/output arrays specified by the subsequent arguments.  The
input/output array arguments have the same meanings as the ones passed
to the guru planner routines in the preceding sections.  The `plan' is
not modified, and these routines can be called as many times as
desired, or intermixed with calls to the ordinary `fftw_execute'.

   The `plan' _must_ have been created for the transform type
corresponding to the execute function, e.g. it must be a complex-DFT
plan for `fftw_execute_dft'.  Any of the planner routines for that
transform type, from the basic to the guru interface, could have been
used to create the plan, however.


File: fftw3.info,  Node: Wisdom,  Next: What FFTW Really Computes,  Prev: Guru Interface,  Up: FFTW Reference

Wisdom
======

   This section documents the FFTW mechanism for saving and restoring
plans from disk.  This mechanism is called "wisdom".

* Menu:

* Wisdom Export::
* Wisdom Import::
* Forgetting Wisdom::
* Wisdom Utilities::


File: fftw3.info,  Node: Wisdom Export,  Next: Wisdom Import,  Prev: Wisdom,  Up: Wisdom

Wisdom Export
-------------

     void fftw_export_wisdom_to_file(FILE *output_file);
     char *fftw_export_wisdom_to_string(void);
     void fftw_export_wisdom(void (*write_char)(char c, void *), void *data);

   These functions allow you to export all currently accumulated wisdom
in a form from which it can be later imported and restored, even during
a separate run of the program. (*Note Words of Wisdom-Saving Plans::.)
The current store of wisdom is not affected by calling any of these
routines.

   `fftw_export_wisdom' exports the wisdom to any output medium, as
specified by the callback function `write_char'. `write_char' is a
`putc'-like function that writes the character `c' to some output; its
second parameter is the `data' pointer passed to `fftw_export_wisdom'.
For convenience, the following two "wrapper" routines are provided:

   `fftw_export_wisdom_to_file' writes the wisdom to the current
position in `output_file', which should be open with write permission.
Upon exit, the file remains open and is positioned at the end of the
wisdom data.

   `fftw_export_wisdom_to_string' returns a pointer to a
`NULL'-terminated string holding the wisdom data. This string is
dynamically allocated, and it is the responsibility of the caller to
deallocate it with `fftw_free' when it is no longer needed.

   All of these routines export the wisdom in the same format, which we
will not document here except to say that it is LISP-like ASCII text
that is insensitive to white space.


File: fftw3.info,  Node: Wisdom Import,  Next: Forgetting Wisdom,  Prev: Wisdom Export,  Up: Wisdom

Wisdom Import
-------------

     int fftw_import_system_wisdom(void);
     int fftw_import_wisdom_from_file(FILE *input_file);
     int fftw_import_wisdom_from_string(const char *input_string);
     int fftw_import_wisdom(int (*read_char)(void *), void *data);

   These functions import wisdom into a program from data stored by the
`fftw_export_wisdom' functions above. (*Note Words of Wisdom-Saving
Plans::.)  The imported wisdom replaces any wisdom already accumulated
by the running program.

   `fftw_import_wisdom' imports wisdom from any input medium, as
specified by the callback function `read_char'. `read_char' is a
`getc'-like function that returns the next character in the input; its
parameter is the `data' pointer passed to `fftw_import_wisdom'. If the
end of the input data is reached (which should never happen for valid
data), `read_char' should return `EOF' (as defined in `<stdio.h>').
For convenience, the following two "wrapper" routines are provided:

   `fftw_import_wisdom_from_file' reads wisdom from the current
position in `input_file', which should be open with read permission.
Upon exit, the file remains open, but the position of the read pointer
is unspecified.

   `fftw_import_wisdom_from_string' reads wisdom from the
`NULL'-terminated string `input_string'.

   `fftw_import_system_wisdom' reads wisdom from an
implementation-defined standard file (`/etc/fftw/wisdom' on Unix and
GNU systems).

   The return value of these import routines is `1' if the wisdom was
read successfully and `0' otherwise. Note that, in all of these
functions, any data in the input stream past the end of the wisdom data
is simply ignored.


File: fftw3.info,  Node: Forgetting Wisdom,  Next: Wisdom Utilities,  Prev: Wisdom Import,  Up: Wisdom

Forgetting Wisdom
-----------------

     void fftw_forget_wisdom(void);

   Calling `fftw_forget_wisdom' causes all accumulated `wisdom' to be
discarded and its associated memory to be freed. (New `wisdom' can
still be gathered subsequently, however.)


File: fftw3.info,  Node: Wisdom Utilities,  Prev: Forgetting Wisdom,  Up: Wisdom

Wisdom Utilities
----------------

   FFTW includes two standalone utility programs that deal with wisdom.
We merely summarize them here, since they come with their own `man'
pages for Unix and GNU systems (with HTML versions on our web site).

   The first program is `fftw-wisdom' (or `fftwf-wisdom' in single
precision, etcetera), which can be used to create a wisdom file
containing plans for any of the transform sizes and types supported by
FFTW.  It is preferable to create wisdom directly from your executable
(*note Caveats in Using Wisdom::), but this program is useful for
creating global wisdom files for `fftw_import_system_wisdom'.

   The second program is `fftw-wisdom-to-conf', which takes a wisdom
file as input and produces a "configuration routine" as output.  The
latter is a C subroutine that you can compile and link into your
program, replacing a routine of the same name in the FFTW library, that
determines which parts of FFTW are callable by your program.
`fftw-wisdom-to-conf' produces a configuration routine that links to
only those parts of FFTW needed by the saved plans in the wisdom,
greatly reducing the size of statically linked executables (which should
only attempt to create plans corresponding to those in the wisdom,
however).


File: fftw3.info,  Node: What FFTW Really Computes,  Prev: Wisdom,  Up: FFTW Reference

What FFTW Really Computes
=========================

   In this section, we provide precise mathematical definitions for the
transforms that FFTW computes.  These transform definitions are fairly
standard, but some authors follow slightly different conventions for the
normalization of the transform (the constant factor in front) and the
sign of the complex exponent.  We begin by presenting the
one-dimensional (1d) transform definitions, and then give the
straightforward extension to multi-dimensional transforms.

* Menu:

* The 1d Discrete Fourier Transform (DFT)::
* The 1d Real-data DFT::
* 1d Real-even DFTs (DCTs)::
* 1d Real-odd DFTs (DSTs)::
* 1d Discrete Hartley Transforms (DHTs)::
* Multi-dimensional Transforms::


File: fftw3.info,  Node: The 1d Discrete Fourier Transform (DFT),  Next: The 1d Real-data DFT,  Prev: What FFTW Really Computes,  Up: What FFTW Really Computes

The 1d Discrete Fourier Transform (DFT)
---------------------------------------

   The forward (`FFTW_FORWARD') discrete Fourier transform (DFT) of a
1d complex array X of size n computes an array Y, where:  Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi j k sqrt(-1)/n) .
The
backward (`FFTW_BACKWARD') DFT computes:  Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi j k sqrt(-1)/n) .
FFTW computes an unnormalized transform, in that there is no coefficient
in front of the summation in the DFT.  In other words, applying the
forward and then the backward transform will multiply the input by n.

   From above, an `FFTW_FORWARD' transform corresponds to a sign of -1
in the exponent of the DFT.  Note also that we use the standard
"in-order" output ordering--the k-th output corresponds to the
frequency k/n (or k/T, where T is your total sampling period).  For
those who like to think in terms of positive and negative frequencies,
this means that the positive frequencies are stored in the first half
of the output and the negative frequencies are stored in backwards
order in the second half of the output.  (The frequency -k/n is the
same as the frequency (n-k)/n.)


File: fftw3.info,  Node: The 1d Real-data DFT,  Next: 1d Real-even DFTs (DCTs),  Prev: The 1d Discrete Fourier Transform (DFT),  Up: What FFTW Really Computes

The 1d Real-data DFT
--------------------

   The real-input (r2c) DFT in FFTW computes the _forward_ transform Y
of the size `n' real array X, exactly as defined above, i.e.   Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi j k sqrt(-1)/n) .
This
output array Y can easily be shown to possess the "Hermitian" symmetry
Y[k] = Y[n-k]*, where we take Y to be periodic so that Y[n] = Y[0].

   As a result of this symmetry, half of the output Y is redundant
(being the complex conjugate of the other half), and so the 1d r2c
transforms only output elements 0...n/2 of Y (n/2+1 complex numbers),
where the division by 2 is rounded down.

   Moreover, the Hermitian symmetry implies that Y[0] and, if n is
even, the Y[n/2] element, are purely real.  So, for the `R2HC' r2r
transform, these elements are not stored in the halfcomplex output
format.

   The c2r and `H2RC' r2r transforms compute the backward DFT of the
_complex_ array X with Hermitian symmetry, stored in the r2c/`R2HC'
output formats, respectively, where the backward transform is defined
exactly as for the complex case:  Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi j k sqrt(-1)/n) .
The
outputs `Y' of this transform can easily be seen to be purely real, and
are stored as an array of real numbers.

   Like FFTW's complex DFT, these transforms are unnormalized.  In other
words, applying the real-to-complex (forward) and then the
complex-to-real (backward) transform will multiply the input by n.


File: fftw3.info,  Node: 1d Real-even DFTs (DCTs),  Next: 1d Real-odd DFTs (DSTs),  Prev: The 1d Real-data DFT,  Up: What FFTW Really Computes

1d Real-even DFTs (DCTs)
------------------------

   The Real-even DFTs in FFTW are exactly equivalent to the unnormalized
forward (and backward) DFTs as defined above, where the input array X
of length N is purely real and is also "even".  In this case, the
output array is likewise real and even.

   For the case of `REDFT00', this even symmetry means that X[j] =
X[N-j], where we take X to be periodic so that X[N] = X[0].  Because of
this redundancy, only the first n real numbers are actually stored,
where N = 2(n-1).

   The proper definition of even symmetry for `REDFT10', `REDFT01', and
`REDFT11' transforms is somewhat more intricate because of the shifts
by 1/2 of the input and/or output, although the corresponding boundary
conditions are given in *Note Real even/odd DFTs (cosine/sine
transforms)::.  Because of the even symmetry, however, the sine terms
in the DFT all cancel and the remaining cosine terms are written
explicitly below.  This formulation often leads people to call such a
transform a "discrete cosine transform" (DCT), although it is really
just a special case of the DFT.

   In each of the definitions below, we transform a real array X of
length n to a real array Y of length n:

REDFT00 (DCT-I)
...............

   An `REDFT00' transform (type-I DCT) in FFTW is defined by: Y[k] =
X[0] + (-1)^k X[n-1] + 2 (sum for j = 1 to n-2 of X[j] cos(pi jk
/(n-1))).  Note that this transform is not defined for n=1.  For n=2,
the summation term above is dropped as you might expect.

REDFT10 (DCT-II)
................

   An `REDFT10' transform (type-II DCT) in FFTW is defined by: Y[k] = 2
(sum for j = 0 to n-1 of X[j] cos(pi (j+1/2) k / n)).

REDFT01 (DCT-III)
.................

   An `REDFT01' transform (type-III DCT) in FFTW is defined by: Y[k] =
X[0] + 2 (sum for j = 1 to n-1 of X[j] cos(pi j (k+1/2) / n)).  In the
case of n=1, this reduces to Y[0] = X[0].

REDFT11 (DCT-IV)
................

   An `REDFT11' transform (type-IV DCT) in FFTW is defined by: Y[k] = 2
(sum for j = 0 to n-1 of X[j] cos(pi (j+1/2) (k+1/2) / n)).

Inverses and Normalization
..........................

   These definitions correspond directly to the unnormalized DFTs used
elsewhere in FFTW (hence the factors of 2 in front of the summations).
The unnormalized inverse of `REDFT00' is `REDFT00', of `REDFT10' is
`REDFT01' and vice versa, and of `REDFT11' is `REDFT11'.  Each
unnormalized inverse results in the original array multiplied by N,
where N is the _logical_ DFT size.  For `REDFT00', N=2(n-1) (note that
n=1 is not defined); otherwise, N=2n.


File: fftw3.info,  Node: 1d Real-odd DFTs (DSTs),  Next: 1d Discrete Hartley Transforms (DHTs),  Prev: 1d Real-even DFTs (DCTs),  Up: What FFTW Really Computes

1d Real-odd DFTs (DSTs)
-----------------------

   The Real-odd DFTs in FFTW are exactly equivalent to the unnormalized
forward (and backward) DFTs as defined above, where the input array X
of length N is purely real and is also "odd".  In this case, the output
is odd and purely imaginary.

   For the case of `RODFT00', this odd symmetry means that X[j] =
-X[N-j], where we take X to be periodic so that X[N] = X[0].  Because
of this redundancy, only the first n real numbers starting at j=1 are
actually stored (the j=0 element is zero), where N = 2(n+1).

   The proper definition of odd symmetry for `RODFT10', `RODFT01', and
`RODFT11' transforms is somewhat more intricate because of the shifts
by 1/2 of the input and/or output, although the corresponding boundary
conditions are given in *Note Real even/odd DFTs (cosine/sine
transforms)::.  Because of the odd symmetry, however, the cosine terms
in the DFT all cancel and the remaining sine terms are written
explicitly below.  This formulation often leads people to call such a
transform a "discrete sine transform" (DST), although it is really just
a special case of the DFT.

   In each of the definitions below, we transform a real array X of
length n to a real array Y of length n:

RODFT00 (DST-I)
...............

   An `RODFT00' transform (type-I DST) in FFTW is defined by: Y[k] = 2
(sum for j = 0 to n-1 of X[j] sin(pi (j+1)(k+1) / (n+1))).

RODFT10 (DST-II)
................

   An `RODFT10' transform (type-II DST) in FFTW is defined by: Y[k] = 2
(sum for j = 0 to n-1 of X[j] sin(pi (j+1/2) (k+1) / n)).

RODFT01 (DST-III)
.................

   An `RODFT01' transform (type-III DST) in FFTW is defined by: Y[k] =
(-1)^k X[n-1] + 2 (sum for j = 0 to n-2 of X[j] sin(pi (j+1) (k+1/2) /
n)).  In the case of n=1, this reduces to Y[0] = X[0].

RODFT11 (DST-IV)
................

   An `RODFT11' transform (type-IV DST) in FFTW is defined by: Y[k] = 2
(sum for j = 0 to n-1 of X[j] sin(pi (j+1/2) (k+1/2) / n)).

Inverses and Normalization
..........................

   These definitions correspond directly to the unnormalized DFTs used
elsewhere in FFTW (hence the factors of 2 in front of the summations).
The unnormalized inverse of `RODFT00' is `RODFT00', of `RODFT10' is
`RODFT01' and vice versa, and of `RODFT11' is `RODFT11'.  Each
unnormalized inverse results in the original array multiplied by N,
where N is the _logical_ DFT size.  For `RODFT00', N=2(n+1); otherwise,
N=2n.


File: fftw3.info,  Node: 1d Discrete Hartley Transforms (DHTs),  Next: Multi-dimensional Transforms,  Prev: 1d Real-odd DFTs (DSTs),  Up: What FFTW Really Computes

1d Discrete Hartley Transforms (DHTs)
-------------------------------------

   The discrete Hartley transform (DHT) of a 1d real array X of size n
computes a real array Y of the same size, where: Y[k] = sum for j = 0 to (n - 1) of X[j] * [cos(2 pi j k / n) + sin(2 pi j k / n)].
FFTW computes an unnormalized transform, in that there is no coefficient
in front of the summation in the DHT.  In other words, applying the
transform twice (the DHT is its own inverse) will multiply the input by
n.


File: fftw3.info,  Node: Multi-dimensional Transforms,  Prev: 1d Discrete Hartley Transforms (DHTs),  Up: What FFTW Really Computes

Multi-dimensional Transforms
----------------------------

   The multi-dimensional transforms of FFTW, in general, compute simply
the separable product of the given 1d transform along each dimension of
the array.  Since each of these transforms is unnormalized, computing
the forward followed by the backward/inverse multi-dimensional transform
will result in the original array scaled by the product of the
normalization factors for each dimension (e.g. the product of the
dimension sizes, for a multi-dimensional DFT).

   The definition of FFTW's multi-dimensional DFT of real data (r2c)
deserves special attention.  In this case, we logically compute the full
multi-dimensional DFT of the input data; since the input data are purely
real, the output data have the Hermitian symmetry and therefore only one
non-redundant half need be stored.  More specifically, for an n1 x n2 x
n3 x ... x nd  multi-dimensional real-input DFT, the full (logical)
complex output array Y[k1, k2, ..., kd] has the symmetry: Y[k1, k2,
..., kd] = Y[n1 - k1, n2 - k2, ..., nd - kd]* (where each dimension is
periodic).  Because of this symmetry, we only store the kd = 0...nd/2
elements of the _last_ dimension (division by 2 is rounded down).  (We
could instead have cut any other dimension in half, but the last
dimension proved computationally convenient.)  This results in the
peculiar array format described in more detail by *Note Real-data DFT
Array Format::.

   The multi-dimensional c2r transform is simply the unnormalized
inverse of the r2c transform.  i.e. it is the same as FFTW's complex
backward multi-dimensional DFT, operating on a Hermitian input array in
the peculiar format mentioned above and outputting a real array (since
the DFT output is purely real).

   We should remind the user that the separable product of 1d transforms
along each dimension, as computed by FFTW, is not always the same thing
as the usual multi-dimensional transform.  A multi-dimensional `R2HC'
(or `HC2R') transform is not identical to the multi-dimensional DFT,
requiring some post-processing to combine the requisite real and
imaginary parts, as was described in *Note The Halfcomplex-format
DFT::.  Likewise, FFTW's multidimensional `FFTW_DHT' r2r transform is
not the same thing as the logical multi-dimensional discrete Hartley
transform defined in the literature, as discussed in *Note The Discrete
Hartley Transform::.


File: fftw3.info,  Node: Parallel FFTW,  Next: Calling FFTW from Fortran,  Prev: FFTW Reference,  Up: Top

Parallel FFTW
*************

   In this chapter we discuss the use of FFTW in a parallel environment.

   Currently, FFTW 3 includes parallel transforms for shared-memory
machiens with some flavor of threads (e.g. POSIX threads); any program
using FFTW can be trivially modified to use these transforms, which are
documented in *Note Multi-threaded FFTW::.

   Users calling FFTW from a multi-threaded program should also consult
*Note Thread safety::.  This section tells you which routines of FFTW it
is safe to call in parallel from different shared-memory threads.

   FFTW 2 also contains distributed-memory parallel transforms using the
MPI message-passing standard.  MPI transforms are not yet available in
FFTW 3, so users requiring that capability must use FFTW 2 for now.

* Menu:

* Multi-threaded FFTW::
* Thread safety::


File: fftw3.info,  Node: Multi-threaded FFTW,  Next: Thread safety,  Prev: Parallel FFTW,  Up: Parallel FFTW

Multi-threaded FFTW
===================

   In this section we document the parallel FFTW routines for
shared-memory threads on SMP hardware.  These routines, which support
parallel one- and multi-dimensional transforms of both real and complex
data, are the easiest way to take advantage of multiple processors with
FFTW.  They work just like the corresponding uniprocessor transform
routines, except that you have an extra initialization routine to call,
and there is a routine to set the number of threads to employ.  Any
program that uses the uniprocessor FFTW can therefore be trivially
modified to use the multi-threaded FFTW.

* Menu:

* Installation and Supported Hardware/Software::
* Usage of Multi-threaded FFTW::
* How Many Threads to Use?::


File: fftw3.info,  Node: Installation and Supported Hardware/Software,  Next: Usage of Multi-threaded FFTW,  Prev: Multi-threaded FFTW,  Up: Multi-threaded FFTW

Installation and Supported Hardware/Software
--------------------------------------------

   All of the FFTW threads code is located in the `threads'
subdirectory of the FFTW package.  On Unix systems, the FFTW threads
libraries and header files can be automatically configured, compiled,
and installed along with the uniprocessor FFTW libraries simply by
including `--enable-threads' in the flags to the `configure' script
(*note Installation on Unix::).

   The threads routines require your operating system to have some sort
of shared-memory threads support.  Specifically, the FFTW threads
package works with POSIX threads (available on most Unix variants, from
GNU/Linux to MacOS X) and Win32 threads.  We also support using OpenMP
(http://www.openmp.org) or SGI MP compiler directives to launch
threads, enabled by using `--with-openmp' or `--with-sgimp' in addition
to `--enable-threads'.  (This may be useful if you are employing that
sort of directive in your own code, in order to minimize conflicts.)
If you have a shared-memory machine that uses a different threads API,
it should be a simple matter of programming to include support for it;
see the file `fftw_threads-int.h' for more detail.

   Ideally, of course, you should also have multiple processors in
order to get any benefit from the threaded transforms.


File: fftw3.info,  Node: Usage of Multi-threaded FFTW,  Next: How Many Threads to Use?,  Prev: Installation and Supported Hardware/Software,  Up: Multi-threaded FFTW

Usage of Multi-threaded FFTW
----------------------------

   Here, it is assumed that the reader is already familiar with the
usage of the uniprocessor FFTW routines, described elsewhere in this
manual.  We only describe what one has to change in order to use the
multi-threaded routines.

   First, programs using the parallel complex transforms should be
linked with `-lfftw3_threads -lfftw3 -lm' on Unix. You will also need
to link with whatever library is responsible for threads on your system
(e.g. `-lpthread' on GNU/Linux).

   Second, before calling _any_ FFTW routines, you should call the
function:

     int fftw_init_threads(void);

   This function, which need only be called once, performs any one-time
initialization required to use threads on your system.  It returns zero
if there was some error (which should not happen under normal
circumstances) and a non-zero value otherwise.

   Third, before creating a plan that you want to parallelize, you
should call:

     void fftw_plan_with_nthreads(int nthreads);

   The `nthreads' argument indicates the number of threads you want
FFTW to use (or actually, the maximum number).  All plans subsequently
created with any planner routine will use that many threads.  You can
call `fftw_plan_with_nthreads', create some plans, call
`fftw_plan_with_nthreads' again with a different argument, and create
some more plans for a new number threads.  Plans already created before
a call to `fftw_plan_with_nthreads' are unaffected.  If you pass an
`nthreads' argument of `1' (the default), threads are disabled for
subsequent plans.

   Given a plan, you then execute it as usual with
`fftw_execute(plan)', and the execution will use the number of threads
specified when the plan was created.  When done, you destroy it as
usual with `fftw_destroy_plan'.

   There is one additional routine: if you want to get rid of all memory
and other resources allocated internally by FFTW, you can call:

     void fftw_cleanup_threads(void);

   which is much like the `fftw_cleanup()' function except that it also
gets rid of threads-related data.  You must _not_ execute any
previously created plans after calling this function.

   We should also mention one other restriction: if you save wisdom
from a program using the multi-threaded FFTW, that wisdom _cannot be
used_ by a program using only the single-threaded FFTW (i.e. not calling
`fftw_init_threads').  *Note Words of Wisdom-Saving Plans::.


File: fftw3.info,  Node: How Many Threads to Use?,  Prev: Usage of Multi-threaded FFTW,  Up: Multi-threaded FFTW

How Many Threads to Use?
------------------------

   There is a fair amount of overhead involved in spawning and
synchronizing threads, so the optimal number of threads to use depends
upon the size of the transform as well as on the number of processors
you have.

   As a general rule, you don't want to use more threads than you have
processors.  (Using more threads will work, but there will be extra
overhead with no benefit.)  In fact, if the problem size is too small,
you may want to use fewer threads than you have processors.

   You will have to experiment with your system to see what level of
parallelization is best for your problem size.  Typically, the problem
will have to involve at least a few thousand data points before threads
become beneficial.  If you plan with `FFTW_PATIENT', it will
automatically disable threads for sizes that don't benefit from
parallelization.


File: fftw3.info,  Node: Thread safety,  Prev: Multi-threaded FFTW,  Up: Parallel FFTW

Thread safety
=============

   Users writing multi-threaded programs must concern themselves with
the "thread safety" of the libraries they use--that is, whether it is
safe to call routines in parallel from multiple threads.  FFTW can be
used in such an environment, but some care must be taken because the
planner routines share data (e.g. wisdom and trigonometric tables)
between calls and plans.

   The upshot is that the only thread-safe (re-entrant) routine in FFTW
is `fftw_execute' (and the guru variants thereof).  All other routines
(e.g. the planner) should only be called from one thread at a time.  So,
for example, you can wrap a semaphore lock around any calls to the
planner; even more simply, you can just create all of your plans from
one thread.  We do not think this should be an important restriction
(FFTW is designed for the situation where the only performance-sensitive
code is the actual execution of the transform), and the benefits of
shared data between plans are great.

   Note also that, since the plan is not modified by `fftw_execute', it
is safe to execute the _same plan_ in parallel by multiple threads.

   (Users should note that these comments only apply to programs using
shared-memory threads.  Parallelism using MPI or forked processes
involves a separate address-space and global variables for each process,
and is not susceptible to problems of this sort.)


File: fftw3.info,  Node: Calling FFTW from Fortran,  Next: Upgrading from FFTW version 2,  Prev: Parallel FFTW,  Up: Top

Calling FFTW from Fortran
*************************

   This chapter describes the Fortran-callable interface to FFTW, which
differs from the C interface only in the prefix (`dfftw_' instead of
`fftw_'), and a few other minor details.  The Fortran interface is
included in the FFTW libraries by default, unless a Fortran compiler
isn't found on your system or `--disable-fortran' is included in the
`configure' flags.  We assume here that the reader is already familiar
with the usage of FFTW in C, as described elsewhere in this manual.

* Menu:

* Fortran-interface routines::
* FFTW Constants in Fortran::
* Fortran Examples::
* Wisdom of Fortran?::


File: fftw3.info,  Node: Fortran-interface routines,  Next: FFTW Constants in Fortran,  Prev: Calling FFTW from Fortran,  Up: Calling FFTW from Fortran

Fortran-interface routines
==========================

   Nearly all of the FFTW functions have Fortran-callable equivalents.
The name of the Fortran routine is the same as that of the
corresponding C routine, but with the `fftw_' prefix replaced by
`dfftw_'.  (The single and long-double precision versions use `sfftw_'
and `lfftw_', respectively, instead of `fftwf_' and `fftwl_'.)(1)

   For the most part, all of the arguments to the functions are the
same, with the following exceptions:

   * `plan' variables (what would be of type `fftw_plan' in C), must be
     declared as a type that is at least as big as a pointer (address)
     on your machine.  We recommend using `integer*8'.

   * Any function that returns a value (e.g. `fftw_plan_dft') is
     converted into a _subroutine_.  The return value is converted into
     an additional _first_ parameter of this subroutine.(2)

   * The Fortran routines expect multi-dimensional arrays to be in
     _column-major_ order, which is the ordinary format of Fortran
     arrays (*note Multi-dimensional Array Format::).  They do this
     transparently and costlessly simply by reversing the order of the
     dimensions passed to FFTW, but this has one important consequence
     for multi-dimensional real-complex transforms, discussed below.

   * Wisdom import and export is somewhat more tricky because one cannot
     easily pass files or strings between C and Fortran; see *Note
     Wisdom of Fortran?::.

   * Fortran cannot use the `fftw_malloc' dynamic-allocation routine.
     If you want to exploit the SIMD FFTW (*note Data Alignment::),
     you'll need to figure out some other way to ensure that your
     arrays are at least 16-byte aligned.

   * Since Fortran 77 does not have data structures, the `fftw_iodim'
     structure from the guru interface (*note Guru vector and transform
     sizes::) must be split into separate arguments.  In particular, any
     `fftw_iodim' array arguments in the C guru interface become three
     integer array arguments (`n', `is', and `os') in the Fortran guru
     interface, all of whose length should be equal to the
     corresponding `rank' argument.


   In general, you should take care to use Fortran data types that
correspond to (i.e. are the same size as) the C types used by FFTW.  If
your C and Fortran compilers are made by the same vendor, the
correspondence is usually straightforward (i.e. `integer' corresponds
to `int', `real' corresponds to `float', etcetera).  The native Fortran
double/single-precision complex type should be compatible with
`fftw_complex'/`fftwf_complex'.  Such simple correspondences are
assumed in the examples below.

   ---------- Footnotes ----------

   (1) Technically, Fortran 77 identifiers are not allowed to have more
than 6 characters, nor may they contain underscores.  Any compiler that
enforces this limitation doesn't deserve to link to FFTW.

   (2) The reason for this is that some Fortran implementations seem to
have trouble with C function return values, and vice versa.


File: fftw3.info,  Node: FFTW Constants in Fortran,  Next: Fortran Examples,  Prev: Fortran-interface routines,  Up: Calling FFTW from Fortran

FFTW Constants in Fortran
=========================

   When creating plans in FFTW, a number of constants are used to
specify options, such as `FFTW_FORWARD' or `FFTW_ESTIMATE'.  The same
constants must be used with the wrapper routines, but of course the C
header files where the constants are defined can't be incorporated
directly into Fortran code.

   Instead, we have placed Fortran equivalents of the FFTW constant
definitions in the file `fftw3.f', which can be found in the same
directory as `fftw3.h'.  If your Fortran compiler supports a
preprocessor of some sort, you should be able to `include' or
`#include' this file; otherwise, you can paste it directly into your
code.

   In C, you combine different flags (like `FFTW_PRESERVE_INPUT' and
`FFTW_MEASURE') using the ``|'' operator; in Fortran you should just
use ``+''.  (Take care not to add in the same flag more than once,
though.)


File: fftw3.info,  Node: Fortran Examples,  Next: Wisdom of Fortran?,  Prev: FFTW Constants in Fortran,  Up: Calling FFTW from Fortran

Fortran Examples
================

   In C, you might have something like the following to transform a
one-dimensional complex array:

             fftw_complex in[N], out[N];
             fftw_plan plan;
     
             plan = fftw_plan_dft_1d(N,in,out,FFTW_FORWARD,FFTW_ESTIMATE);
             fftw_execute(plan);
             fftw_destroy_plan(plan);

   In Fortran, you would use the following to accomplish the same thing:

             double complex in, out
             dimension in(N), out(N)
             integer*8 plan
     
             call dfftw_plan_dft_1d(plan,N,in,out,FFTW_FORWARD,FFTW_ESTIMATE)
             call dfftw_execute(plan)
             call dfftw_destroy_plan(plan)

   Notice how all routines are called as Fortran subroutines, and the
plan is returned via the first argument to `dfftw_plan_dft_1d'.  To do
the same thing, but using 8 threads in parallel (*note Multi-threaded
FFTW::), you would simply prefix these calls with:

             call dfftw_init_threads
             call dfftw_plan_with_nthreads(8)

   To transform a three-dimensional array in-place with C, you might do:

             fftw_complex arr[L][M][N];
             fftw_plan plan;
     
             plan = fftw_plan_dft_3d(L,M,N, arr,arr,
                                     FFTW_FORWARD, FFTW_ESTIMATE);
             fftw_execute(plan);
             fftw_destroy_plan(plan);

   In Fortran, you would use this instead:

             double complex arr
             dimension arr(L,M,N)
             integer*8 plan
     
             call dfftw_plan_dft_3d(plan, L,M,N, arr,arr,
            &                       FFTW_FORWARD, FFTW_ESTIMATE)
             call dfftw_execute(plan)
             call dfftw_destroy_plan(plan)

   Note that we pass the array dimensions in the "natural" order in
both C and Fortran.

   To transform a one-dimensional real array in Fortran, you might do:

             double precision in
             dimension in(N)
             double complex out
             dimension out(N/2 + 1)
             integer*8 plan
     
             call dfftw_plan_dft_r2c_1d(plan,N,in,out,FFTW_ESTIMATE)
             call dfftw_execute(plan)
             call dfftw_destroy_plan(plan)

   To transform a two-dimensional real array, out of place, you might
use the following:

             double precision in
             dimension in(M,N)
             double complex out
             dimension out(M/2 + 1, N)
             integer*8 plan
     
             call dfftw_plan_dft_r2c_2d(plan,M,N,in,out,FFTW_ESTIMATE)
             call dfftw_execute(plan)
             call dfftw_destroy_plan(plan)

   *Important:* Notice that it is the _first_ dimension of the complex
output array that is cut in half in Fortran, rather than the last
dimension as in C.  This is a consequence of the interface routines
reversing the order of the array dimensions passed to FFTW so that the
Fortran program can use its ordinary column-major order.


File: fftw3.info,  Node: Wisdom of Fortran?,  Prev: Fortran Examples,  Up: Calling FFTW from Fortran

Wisdom of Fortran?
==================

   In this section, we discuss how one can import/export FFTW wisdom
(saved plans) to/from a Fortran program; we assume that the reader is
already familiar with wisdom, as described in *Note Words of
Wisdom-Saving Plans::.

   The basic problem is that is difficult to (portably) pass files and
strings between Fortran and C, so we cannot provide a direct Fortran
equivalent to the `fftw_export_wisdom_to_file', etcetera, functions.
Fortran interfaces _are_ provided for the functions that do not take
file/string arguments, however: `dfftw_import_system_wisdom',
`dfftw_import_wisdom', `dfftw_export_wisdom', and `dfftw_forget_wisdom'.

   So, for examplem to import the system-wide wisdom, you would do:

             integer isuccess
             call dfftw_import_system_wisdom(isuccess)

   As usual, the C return value is turned into a first parameter;
`isuccess' is non-zero on success and zero on failure (e.g. if there is
no system wisdom installed).

   If you want to import/export wisdom from/to an arbitrary file or
elsewhere, you can employ the generic `dfftw_import_wisdom' and
`dfftw_export_wisdom' functions, for which you must supply a subroutine
to read/write one character at a time.  The FFTW package contains an
example file `doc/f77_wisdom.f' demonstrating how to implement
`import_wisdom_from_file' and `export_wisdom_to_file' subroutines in
this way.  (These routines cannot be compiled into the FFTW library
itself, lest all FFTW-using programs be required to link with the
Fortran I/O library.)