This is fftw3.info, produced by makeinfo version 4.2 from fftw3.texi. This manual is for FFTW (version 3.0.1, 5 June 2003). Copyright (C) 2003 Matteo Frigo. Copyright (C) 2003 Massachusetts Institute of Technology. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Free Software Foundation. INFO-DIR-SECTION Texinfo documentation system START-INFO-DIR-ENTRY * fftw3: (fftw3). FFTW User's Manual. END-INFO-DIR-ENTRY  File: fftw3.info, Node: Top, Next: Introduction, Prev: (dir), Up: (dir) FFTW User Manual **************** Welcome to FFTW, the Fastest Fourier Transform in the West. FFTW is a collection of fast C routines to compute the discrete Fourier transform. This manual documents FFTW version 3.0.1. * Menu: * Introduction:: * Tutorial:: * Other Important Topics:: * FFTW Reference:: * Parallel FFTW:: * Calling FFTW from Fortran:: * Upgrading from FFTW version 2:: * Installation and Customization:: * Acknowledgments:: * License and Copyright:: * Concept Index:: * Library Index:: --- The Detailed Node Listing --- Tutorial * Complex One-Dimensional DFTs:: * Complex Multi-Dimensional DFTs:: * One-Dimensional DFTs of Real Data:: * Multi-Dimensional DFTs of Real Data:: * More DFTs of Real Data:: More DFTs of Real Data * The Halfcomplex-format DFT:: * Real even/odd DFTs (cosine/sine transforms):: * The Discrete Hartley Transform:: Other Important Topics * Data Alignment:: * Multi-dimensional Array Format:: * Words of Wisdom-Saving Plans:: * Caveats in Using Wisdom:: Data Alignment * SIMD alignment and fftw_malloc:: * Stack alignment on x86:: Multi-dimensional Array Format * Row-major Format:: * Column-major Format:: * Static Arrays in C:: * Dynamic Arrays in C:: * Dynamic Arrays in C-The Wrong Way:: FFTW Reference * Data Types and Files:: * Using Plans:: * Basic Interface:: * Advanced Interface:: * Guru Interface:: * Wisdom:: * What FFTW Really Computes:: Data Types and Files * Complex numbers:: * Precision:: * Memory Allocation:: Basic Interface * Complex DFTs:: * Planner Flags:: * Real-data DFTs:: * Real-data DFT Array Format:: * Real-to-Real Transforms:: * Real-to-Real Transform Kinds:: Advanced Interface * Advanced Complex DFTs:: * Advanced Real-data DFTs:: * Advanced Real-to-real Transforms:: Guru Interface * Interleaved and split arrays:: * Guru vector and transform sizes:: * Guru Complex DFTs:: * Guru Real-data DFTs:: * Guru Real-to-real Transforms:: * Guru Execution of Plans:: Wisdom * Wisdom Export:: * Wisdom Import:: * Forgetting Wisdom:: * Wisdom Utilities:: What FFTW Really Computes * The 1d Discrete Fourier Transform (DFT):: * The 1d Real-data DFT:: * 1d Real-even DFTs (DCTs):: * 1d Real-odd DFTs (DSTs):: * 1d Discrete Hartley Transforms (DHTs):: * Multi-dimensional Transforms:: Parallel FFTW * Multi-threaded FFTW:: * Thread safety:: Multi-threaded FFTW * Installation and Supported Hardware/Software:: * Usage of Multi-threaded FFTW:: * How Many Threads to Use?:: Calling FFTW from Fortran * Fortran-interface routines:: * FFTW Constants in Fortran:: * Fortran Examples:: * Wisdom of Fortran?:: Installation and Customization * Installation on Unix:: * Installation on non-Unix systems:: * Cycle Counters:: * Generating your own code::  File: fftw3.info, Node: Introduction, Next: Tutorial, Prev: Top, Up: Top Introduction ************ This manual documents version 3.0.1 of FFTW, the _Fastest Fourier Transform in the West_. FFTW is a comprehensive collection of fast C routines for computing the discrete Fourier transform (DFT) and various special cases thereof. * FFTW computes the DFT of complex data, real data, even- or odd-symmetric real data (these symmetric transforms are usually known as the discrete cosine or sine transform, respectively), and the discrete Hartley transform (DHT) of real data. * The input data can have arbitrary size. FFTW employs O(n log n) algorithms for all sizes, including prime numbers. * FFTW supports arbitrary n-dimensional data. * FFTW supports the SSE, SSE2, 3DNow!, and Altivec instruction sets. * FFTW 3.0.1 includes parallel (multi-threaded) transforms for shared-memory systems. FFTW 3.0.1 does not include distributed-memory parallel transforms, but we plan to implement an MPI version soon. (Meanwhile, you can use the MPI implementation from FFTW 2.1.3.) We assume herein that you are familiar with the properties and uses of the DFT that are relevant to your application. Otherwise, see e.g. `The Fast Fourier Transform and Its Applications' by E. O. Brigham (Prentice-Hall, Englewood Cliffs, NJ, 1988). Our web page (http://www.fftw.org) also has links to FFT-related information online. In order to use FFTW effectively, you need to learn one basic concept of FFTW's internal structure: FFTW does not use a fixed algorithm for computing the transform, but instead it adapts the DFT algorithm to details of the underlying hardware in order to maximize performance. Hence, the computation of the transform is split into two phases. First, FFTW's "planner" "learns" the fastest way to compute the transform on your machine. The planner produces a data structure called a "plan" that contains this information. Subsequently, the plan is "executed" to transform the array of input data as dictated by the plan. The plan can be reused as many times as needed. In typical high-performance applications, many transforms of the same size are computed and, consequently, a relatively expensive initialization of this sort is acceptable. On the other hand, if you need a single transform of a given size, the one-time cost of the planner becomes significant. For this case, FFTW provides fast planners based on heuristics or on previously computed plans. FFTW supports transforms of data with arbitrary size, rank, multiplicity, and a general memory layout. In simple cases, however, this generality may be unnecessary and confusing. Consequently, we organized the interface to FFTW into three levels of increasing generality. * The "basic interface" computes a single transform of contiguous data. * The "advanced interface" computes transforms of multiple or strided arrays. * The "guru" interface supports the most general data layouts, multiplicities, and strides. We expect that most users will be best served by the basic interface, whereas the guru interface requires careful attention to the documentation to avoid problems. Besides the automatic performance adaptation performed by the planner, it is also possible for advanced users to customize FFTW manually. For example, if code space is a concern, we provide a tool that links only the subset of FFTW needed by your application. Conversely, you may need to extend FFTW because the standard distribution is not sufficient for your needs. For example, the standard FFTW distribution works most efficiently for arrays whose size can be factored into small primes (2, 3, 5, and 7), and otherwise it uses a slower general-purpose routine. If you need efficient transforms of other sizes, you can use FFTW's code generator, which produces fast C programs ("codelets") for any particular array size you may care about. For example, if you need transforms of size 513 = 19 x 3^3, you can customize FFTW to support the factor 19 efficiently. For more information regarding FFTW, see the paper, "FFTW: An adaptive software architecture for the FFT," by M. Frigo and S. G. Johnson, which appeared in the 23rd International Conference on Acoustics, Speech, and Signal Processing (`Proc. ICASSP 1998' 3, p. 1381). See also, "The Fastest Fourier Transform in the West," by M. Frigo and S. G. Johnson, which is the technical report MIT-LCS-TR-728 (Sep. '97). The code generator is described in the paper "A fast Fourier transform compiler", by M. Frigo, in the `Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Atlanta, Georgia, May 1999'. These papers, along with the latest version of FFTW, the FAQ, benchmarks, and other links, are available at the FFTW home page (http://www.fftw.org). The current version of FFTW incorporates many good ideas from the past thirty years of FFT literature. In one way or another, FFTW uses the Cooley-Tukey algorithm, the prime factor algorithm, Rader's algorithm for prime sizes, and a split-radix algorithm (with a variation due to Dan Bernstein). FFTW's code generator also produces new algorithms that we do not completely understand. The reader is referred to the cited papers for the appropriate references. The rest of this manual is organized as follows. We first discuss the sequential (single-processor) implementation. We start by describing the basic interface/features of FFTW in *Note Tutorial::. The following chapter discusses *Note Other Important Topics::, including *Note Data Alignment::, the storage scheme of multi-dimensional arrays (*note Multi-dimensional Array Format::), and FFTW's mechanism for storing plans on disk (*note Words of Wisdom-Saving Plans::). Next, *Note FFTW Reference:: provides comprehensive documentation of all FFTW's features. Parallel transforms are discussed in their own chapter *Note Parallel FFTW::. Fortran programmers can also use FFTW, as described in *Note Calling FFTW from Fortran::. *Note Installation and Customization:: explains how to install FFTW in your computer system and how to adapt FFTW to your needs. License and copyright information is given in *Note License and Copyright::. Finally, we thank all the people who helped us in *Note Acknowledgments::.  File: fftw3.info, Node: Tutorial, Next: Other Important Topics, Prev: Introduction, Up: Top Tutorial ******** * Menu: * Complex One-Dimensional DFTs:: * Complex Multi-Dimensional DFTs:: * One-Dimensional DFTs of Real Data:: * Multi-Dimensional DFTs of Real Data:: * More DFTs of Real Data:: This chapter describes the basic usage of FFTW, i.e., how to compute the Fourier transform of a single array. This chapter tells the truth, but not the _whole_ truth. Specifically, FFTW implements additional routines and flags that are not documented here, although in many cases we try to indicate where added capabilities exist. For more complete information, see *Note FFTW Reference::. (Note that you need to compile and install FFTW before you can use it in a program. For the details of the installation, see *Note Installation and Customization::.) We recommend that you read this tutorial in order.(1) At the least, read the first section (*note Complex One-Dimensional DFTs::) before reading any of the others, even if your main interest lies in one of the other transform types. Users of FFTW version 2 and earlier may also want to read *Note Upgrading from FFTW version 2::. ---------- Footnotes ---------- (1) You can read the tutorial in bit-reversed order after computing your first transform.  File: fftw3.info, Node: Complex One-Dimensional DFTs, Next: Complex Multi-Dimensional DFTs, Prev: Tutorial, Up: Tutorial Complex One-Dimensional DFTs ============================ Plan: To bother about the best method of accomplishing an accidental result. [Ambrose Bierce, `The Enlarged Devil's Dictionary'.] The basic usage of FFTW to compute a one-dimensional DFT of size `N' is simple, and it typically looks something like this code: #include ... { fftw_complex *in, *out; fftw_plan p; ... in = fftw_malloc(sizeof(fftw_complex) * N); out = fftw_malloc(sizeof(fftw_complex) * N); p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE); ... fftw_execute(p); /* repeat as needed */ ... fftw_destroy_plan(p); fftw_free(in); fftw_free(out); } (When you compile, you must also link with the `fftw3' library, e.g. `-lfftw3 -lm' on Unix systems.) First you allocate the input and output arrays. You can allocate them in any way that you like, but we recommend using `fftw_malloc', which behaves like `malloc' except that it properly aligns the array when SIMD instructions (such as SSE and Altivec) are available (*note SIMD alignment and fftw_malloc::). The data is an array of type `fftw_complex', which is by default a `double[2]' composed of the real (`in[i][0]') and imaginary (`in[i][1]') parts of a complex number. The next step is to create a "plan", which is an object that contains all the data that FFTW needs to compute the FFT. This function creates the plan: fftw_plan fftw_plan_dft_1d(int n, fftw_complex *in, fftw_complex *out, int sign, unsigned flags); The first argument, `n', is the size of the transform you are trying to compute. The size `n' can be any positive integer, but sizes that are products of small factors are transformed most efficiently (although prime sizes still use an O(n log n) algorithm). The next two arguments are pointers to the input and output arrays of the transform. These pointers can be equal, indicating an "in-place" transform. The fourth argument, `sign', can be either `FFTW_FORWARD' (`-1') or `FFTW_BACKWARD' (`+1'), and indicates the direction of the transform you are interested in; technically, it is the sign of the exponent in the transform. The `flags' argument is usually either `FFTW_MEASURE' or `FFTW_ESTIMATE'. `FFTW_MEASURE' instructs FFTW to run and measure the execution time of several FFTs in order to find the best way to compute the transform of size `n'. This process takes some time (usually a few seconds), depending on your machine and on the size of the transform. `FFTW_ESTIMATE', on the contrary, does not run any computation and just builds a reasonable plan that is probably sub-optimal. In short, if your program performs many transforms of the same size and initialization time is not important, use `FFTW_MEASURE'; otherwise use the estimate. The data in the `in'/`out' arrays is _overwritten_ during `FFTW_MEASURE' planning, so such planning should be done _before_ the input is initialized by the user. Once the plan has been created, you can use it as many times as you like for transforms on the specified `in'/`out' arrays, computing the actual transforms via `fftw_execute(plan)': void fftw_execute(const fftw_plan plan); If you want to transform a _different_ array of the same size, you can create a new plan with `fftw_plan_dft_1d' and FFTW automatically reuses the information from the previous plan, if possible. (Alternatively, with the "guru" interface you can apply a given plan to a different array, if you are careful. *Note FFTW Reference::.) When you are done with the plan, you deallocate it by calling `fftw_destroy_plan(plan)': void fftw_destroy_plan(fftw_plan plan); Arrays allocated with `fftw_malloc' should be deallocated by `fftw_free' rather than the ordinary `free' (or, heaven forbid, `delete'). The DFT results are stored in-order in the array `out', with the zero-frequency (DC) component in `out[0]'. If `in != out', the transform is "out-of-place" and the input array `in' is not modified. Otherwise, the input array is overwritten with the transform. Users should note that FFTW computes an _unnormalized_ DFT. Thus, computing a forward followed by a backward transform (or vice versa) results in the original array scaled by `n'. For the definition of the DFT, see *Note What FFTW Really Computes::. If you have a C compiler, such as `gcc', that supports the recent C99 standard, and you `#include ' _before_ `', then `fftw_complex' is the native double-precision complex type and you can manipulate it with ordinary arithmetic. Otherwise, FFTW defines its own complex type, which is bit-compatible with the C99 complex type. *Note Complex numbers::. (The C++ `' template class may also be usable via a typecast.) Single and long-double precision versions of FFTW may be installed; to use them, replace the `fftw_' prefix by `fftwf_' or `fftwl_' and link with `-lfftw3f' or `-lfftw3l', but use the _same_ `' header file. Many more flags exist besides `FFTW_MEASURE' and `FFTW_ESTIMATE'. For example, use `FFTW_PATIENT' if you're willing to wait even longer for a possibly even faster plan (*note FFTW Reference::). You can also save plans for future use, as described by *Note Words of Wisdom-Saving Plans::.  File: fftw3.info, Node: Complex Multi-Dimensional DFTs, Next: One-Dimensional DFTs of Real Data, Prev: Complex One-Dimensional DFTs, Up: Tutorial Complex Multi-Dimensional DFTs ============================== Multi-dimensional transforms work much the same way as one-dimensional transforms: you allocate arrays of `fftw_complex' (preferably using `fftw_malloc'), create an `fftw_plan', execute it as many times as you want with `fftw_execute(plan)', and clean up with `fftw_destroy_plan(plan)' (and `fftw_free'). The only difference is the routine you use to create the plan: fftw_plan fftw_plan_dft_2d(int nx, int ny, fftw_complex *in, fftw_complex *out, int sign, unsigned flags); fftw_plan fftw_plan_dft_3d(int nx, int ny, int nz, fftw_complex *in, fftw_complex *out, int sign, unsigned flags); fftw_plan fftw_plan_dft(int rank, const int *n, fftw_complex *in, fftw_complex *out, int sign, unsigned flags); These routines create plans for `nx' by `ny' two-dimensional (2d) transforms, `nx' by `ny' by `nz' 3d transforms, and arbitrary `rank'-dimensional transforms, respectively. In the third case, `n' is a pointer to an array `n[rank]' denoting an `n[0]' by `n[1]' by ... by `n[rank-1]' transform. All of these transforms operate on contiguous arrays in the C-standard "row-major" order, so that the last dimension has the fastest-varying index in the array. This layout is described further in *Note Multi-dimensional Array Format::. You may have noticed that all the planner routines described so far have overlapping functionality. For example, you can plan a 1d or 2d transform by using `fftw_plan_dft' with a `rank' of `1' or `2', or even by calling `fftw_plan_dft_3d' with `nx' and/or `ny' equal to `1' (with no loss in efficiency). This pattern continues, and FFTW's planning routines in general form a "partial order," sequences of interfaces with strictly increasing generality but correspondingly greater complexity. `fftw_plan_dft' is the most general complex-DFT routine that we describe in this tutorial, but there are also the advanced and guru interfaces, which allow one to efficiently combine multiple/strided transforms into a single FFTW plan, transform a subset of a larger multi-dimensional array, and/or to handle more general complex-number formats. For more information, see *Note FFTW Reference::.  File: fftw3.info, Node: One-Dimensional DFTs of Real Data, Next: Multi-Dimensional DFTs of Real Data, Prev: Complex Multi-Dimensional DFTs, Up: Tutorial One-Dimensional DFTs of Real Data ================================= In many practical applications, the input data `in[i]' are purely real numbers, in which case the DFT output satisfies the "Hermitian" redundancy: `out[i]' is the conjugate of `out[n-i]'. It is possible to take advantage of these circumstances in order to achieve roughly a factor of two improvement in both speed and memory usage. In exchange for these speed and space advantages, the user sacrifices some of the simplicity of FFTW's complex transforms. First of all, the input and output arrays are of _different sizes and types_: the input is `n' real numbers, while the output is `n/2+1' complex numbers (the non-redundant outputs); this also requires slight "padding" of the input array for in-place transforms. Second, the inverse transform (complex to real) has the side-effect of _destroying its input array_, by default. Neither of these inconveniences should pose a serious problem for users, but it is important to be aware of them. The routines to perform real-data transforms are almost the same as those for complex transforms: you allocate arrays of `double' and/or `fftw_complex' (preferably using `fftw_malloc'), create an `fftw_plan', execute it as many times as you want with `fftw_execute(plan)', and clean up with `fftw_destroy_plan(plan)' (and `fftw_free'). The only differences are that the input (or output) is of type `double' and there are new routines to create the plan. In one dimension: fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out, unsigned flags); fftw_plan fftw_plan_dft_c2r_1d(int n, fftw_complex *in, double *out, unsigned flags); for the real input to complex-Hermitian output ("r2c") and complex-Hermitian input to real output ("c2r") transforms. Unlike the complex DFT planner, there is no `sign' argument. Instead, r2c DFTs are always `FFTW_FORWARD' and c2r DFTs are always `FFTW_BACKWARD'. (For single/long-double precision `fftwf' and `fftwl', `double' should be replaced by `float' and `long double', respectively.) Here, `n' is the "logical" size of the DFT, not necessarily the physical size of the array. In particular, the real (`double') array has `n' elements, while the complex (`fftw_complex') array has `n/2+1' elements (where the division is rounded down). For an in-place transform, `in' and `out' are aliased to the same array, which must be big enough to hold both; so, the real array would actually have `2*(n/2+1)' elements, where the elements beyond the first `n' are unused padding. The kth element of the complex array is exactly the same as the kth element of the corresponding complex DFT. All positive `n' are supported; products of small factors are most efficient, but an O(n log n) algorithm is used even for prime sizes. As noted above, the c2r transform destroys its input array even for out-of-place transforms. This can be prevented, if necessary, by including `FFTW_PRESERVE_INPUT' in the `flags', with unfortunately some sacrifice in performance. This flag is also not currently supported for multi-dimensional real DFTs (next section). Readers familiar with DFTs of real data will recall that the 0th (the "DC") and `n/2'-th (the "Nyquist" frequency, when `n' is even) elements of the complex output are purely real. Some implementations therefore store the Nyquist element where the DC imaginary part would go, in order to make the input and output arrays the same size. Such packing, however, does not generalize well to multi-dimensional transforms, and the space savings are miniscule in any case; FFTW does not support it. An alternate interface for one-dimensional r2c and c2r DFTs can be found in the `r2r' interface (*note The Halfcomplex-format DFT::), with "halfcomplex"-format output that _is_ the same size (and type) as the input array. That interface, although it is not very useful for multi-dimensional transforms, may sometimes yield better performance.  File: fftw3.info, Node: Multi-Dimensional DFTs of Real Data, Next: More DFTs of Real Data, Prev: One-Dimensional DFTs of Real Data, Up: Tutorial Multi-Dimensional DFTs of Real Data =================================== Multi-dimensional DFTs of real data use the following planner routines: fftw_plan fftw_plan_dft_r2c_2d(int nx, int ny, double *in, fftw_complex *out, unsigned flags); fftw_plan fftw_plan_dft_r2c_3d(int nx, int ny, int nz, double *in, fftw_complex *out, unsigned flags); fftw_plan fftw_plan_dft_r2c(int rank, const int *n, double *in, fftw_complex *out, unsigned flags); as well as the corresponding `c2r' routines with the input/output types swapped. These routines work similarly to their complex analogues, except for the fact that here the complex output array is cut roughly in half and the real array requires padding for in-place transforms (as in 1d, above). As before, `n' is the logical size of the array, and the consequences of this on the the format of the complex arrays deserve careful attention. Suppose that the real data has dimensions n1 x n2 x n3 x ... x nd (in row-major order). Then, after an r2c transform, the output is an n1 x n2 x n3 x ... x (nd/2 + 1) array of `fftw_complex' values in row-major order, corresponding to slightly over half of the output of the corresponding complex DFT. (The division is rounded down.) The ordering of the data is otherwise exactly the same as in the complex-DFT case. Since the complex data is slightly larger than the real data, some complications arise for in-place transforms. In this case, the final dimension of the real data must be padded with extra values to accommodate the size of the complex data--two values if the last dimension is even and one if it is odd. That is, the last dimension of the real data must physically contain 2 * (nd/2+1) `double' values (exactly enough to hold the complex data). This physical array size does not, however, change the _logical_ array size--only nd values are actually stored in the last dimension, and nd is the last dimension passed to the plan-creation routine. For example, consider the transform of a two-dimensional real array of size `nx' by `ny'. The output of the r2c transform is a two-dimensional complex array of size `nx' by `ny/2+1', where the `y' dimension has been cut nearly in half because of redundancies in the output. Because `fftw_complex' is twice the size of `double', the output array is slightly bigger than the input array. Thus, if we want to compute the transform in place, we must _pad_ the input array so that it is of size `nx' by `2*(ny/2+1)'. If `ny' is even, then there are two padding elements at the end of each row (which need not be initialized, as they are only used for output). These transforms are unnormalized, so an r2c followed by a c2r transform (or vice versa) will result in the original data scaled by the number of real data elements--that is, the product of the (logical) dimensions of the real data. (Because the last dimension is treated specially, if it is equal to `1' the transform is _not_ equivalent to a lower-dimensional r2c/c2r transform. In that case, the last complex dimension also has size `1' (`=1/2+1'), and no advantage is gained over the complex transforms.)  File: fftw3.info, Node: More DFTs of Real Data, Prev: Multi-Dimensional DFTs of Real Data, Up: Tutorial More DFTs of Real Data ====================== * Menu: * The Halfcomplex-format DFT:: * Real even/odd DFTs (cosine/sine transforms):: * The Discrete Hartley Transform:: FFTW supports several other transform types via a unified "r2r" (real-to-real) interface, so called because it takes a real (`double') array and outputs a real array of the same size. These r2r transforms currently fall into three categories: DFTs of real input and complex-Hermitian output in halfcomplex format, DFTs of real input with even/odd symmetry (a.k.a. discrete cosine/sine transforms, DCTs/DSTs), and discrete Hartley transforms (DHTs), all described in more detail by the following sections. The r2r transforms follow the by now familiar interface of creating an `fftw_plan', executing it with `fftw_execute(plan)', and destroying it with `fftw_destroy_plan(plan)'. Furthermore, all r2r transforms share the same planner interface: fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out, fftw_r2r_kind kind, unsigned flags); fftw_plan fftw_plan_r2r_2d(int nx, int ny, double *in, double *out, fftw_r2r_kind kindx, fftw_r2r_kind kindy, unsigned flags); fftw_plan fftw_plan_r2r_3d(int nx, int ny, int nz, double *in, double *out, fftw_r2r_kind kindx, fftw_r2r_kind kindy, fftw_r2r_kind kindz, unsigned flags); fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out, const fftw_r2r_kind *kind, unsigned flags); Just as for the complex DFT, these plan 1d/2d/3d/multi-dimensional transforms for contiguous arrays in row-major order, transforming (real) input to output of the same size, where `n' specifies the _physical_ dimensions of the arrays. All positive `n' are supported (with the exception of `n=1' for the `FFTW_REDFT00' kind, noted in the real-even subsection below); products of small factors are most efficient (factorizing `n-1' and `n+1' for `FFTW_REDFT00' and `FFTW_RODFT00' kinds, described below), but an O(n log n) algorithm is used even for prime sizes. Each dimension has a "kind" parameter, of type `fftw_r2r_kind', specifying the kind of r2r transform to be used for that dimension. (In the case of `fftw_plan_r2r', this is an array `kind[rank]' where `kind[i]' is the transform kind for the dimension `n[i]'.) The kind can be one of a set of predefined constants, defined in the following subsections. In other words, FFTW computes the separable product of the specified r2r transforms over each dimension, which can be used e.g. for partial differential equations with mixed boundary conditions. (For some r2r kinds, notably the halfcomplex DFT and the DHT, such a separable product is somewhat problematic in more than one dimension, however, as is described below.) In the current version of FFTW, all r2r transforms except for the halfcomplex type are computed via pre- or post-processing of halfcomplex transforms, and they are therefore not as fast as they could be. Since most other general DCT/DST codes employ a similar algorithm, however, FFTW's implementation should provide at least competitive performance.  File: fftw3.info, Node: The Halfcomplex-format DFT, Next: Real even/odd DFTs (cosine/sine transforms), Prev: More DFTs of Real Data, Up: More DFTs of Real Data The Halfcomplex-format DFT -------------------------- An r2r kind of `FFTW_R2HC' ("r2hc") corresponds to an r2c DFT (*note One-Dimensional DFTs of Real Data::) but with "halfcomplex" format output, and may sometimes be faster and/or more convenient than the latter. The inverse "hc2r" transform is of kind `FFTW_HC2R'. This consists of the non-redundant half of the complex output for a 1d real-input DFT of size `n', stored as a sequence of `n' real numbers (`double') in the format: r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1 Here, rk is the real part of the kth output, and ik is the imaginary part. (Division by 2 is rounded down.) For a halfcomplex array `hc[n]', the kth component thus has its real part in `hc[k]' and its imaginary part in `hc[n-k]', with the exception of `k' `==' `0' or `n/2' (the latter only if `n' is even)--in these two cases, the imaginary part is zero due to symmetries of the real-input DFT, and is not stored. Thus, the r2hc transform of `n' real values is a halfcomplex array of length `n', and vice versa for hc2r. Aside from the differing format, the output of `FFTW_R2HC'/`FFTW_HC2R' is otherwise exactly the same as for the corresponding 1d r2c/c2r transform (i.e. `FFTW_FORWARD'/`FFTW_BACKWARD' transforms, respectively). Recall that these transforms are unnormalized, so r2hc followed by hc2r will result in the original data multiplied by `n'. Furthermore, like the c2r transform, an out-of-place hc2r transform will _destroy its input_ array. Although these halfcomplex transforms can be used with the multi-dimensional r2r interface, the interpretation of such a separable product of transforms along each dimension is problematic. For example, consider a two-dimensional `nx' by `ny', r2hc by r2hc transform planned by `fftw_plan_r2r_2d(nx, ny, in, out, FFTW_R2HC, FFTW_R2HC, FFTW_MEASURE)'. Conceptually, FFTW first transforms the rows (of size `ny') to produce halfcomplex rows, and then transforms the columns (of size `nx'). Half of these column transforms, however, are of imaginary parts, and should therefore be multiplied by i and combined with the r2hc transforms of the real columns to produce the 2d DFT amplitudes; FFTW's r2r transform does _not_ perform this combination for you. Thus, if a multi-dimensional real-input/output DFT is required, we recommend using the ordinary r2c/c2r interface (*note Multi-Dimensional DFTs of Real Data::).  File: fftw3.info, Node: Real even/odd DFTs (cosine/sine transforms), Next: The Discrete Hartley Transform, Prev: The Halfcomplex-format DFT, Up: More DFTs of Real Data Real even/odd DFTs (cosine/sine transforms) ------------------------------------------- The Fourier transform of a real-even function f(-x) = f(x) is real-even, and i times the Fourier transform of a real-odd function f(-x) = -f(x) is real-odd. Similar results hold for a discrete Fourier transform, and thus for these symmetries the need for complex inputs/outputs is entirely eliminated. Moreover, one gains a factor of two in speed/space from the fact that the data are real, and an additional factor of two from the even/odd symmetry: only the non-redundant (first) half of the array need be stored. The result is the real-even DFT ("REDFT") and the real-odd DFT ("RODFT"), also known as the discrete cosine and sine transforms ("DCT" and "DST"), respectively. (In this section, we describe the 1d transforms; multi-dimensional transforms are just a separable product of these transforms operating along each dimension.) Because of the discrete sampling, one has an additional choice: is the data even/odd around a sampling point, or around the point halfway between two samples? The latter corresponds to _shifting_ the samples by _half_ an interval, and gives rise to several transform variants denoted by REDFTab and RODFTab: a and b are 0 or 1, and indicate whether the input (a) and/or output (b) are shifted by half a sample (1 means it is shifted). These are also known as types I-IV of the DCT and DST, and all four types are supported by FFTW's r2r interface.(1) The r2r kinds for the various REDFT and RODFT types supported by FFTW, along with the boundary conditions at both ends of the _input_ array (`n' real numbers `in[j=0..n-1]'), are: * `FFTW_REDFT00' (DCT-I): even around j=0 and even around j=n-1. * `FFTW_REDFT10' (DCT-II): even around j=-0.5 and even around j=n-0.5. * `FFTW_REDFT01' (DCT-III): even around j=0 and odd around j=n. * `FFTW_REDFT11' (DCT-IV): even around j=-0.5 and odd around j=n-0.5. * `FFTW_RODFT00' (DST-I): odd around j=-1 and odd around j=n. * `FFTW_RODFT10' (DST-II): odd around j=-0.5 and odd around j=n-0.5. * `FFTW_RODFT01' (DST-III): odd around j=-1 and even around j=n-1. * `FFTW_RODFT11' (DST-IV): odd around j=-0.5 and even around j=n-0.5. Note that these symmetries apply to the "logical" array being transformed; *there are no constraints on your physical input data*. So, for example, if you specify a size-5 REDFT00 (DCT-I) of the data abcde, it corresponds to the DFT of the logical even array abcdedcb of size 8. A size-4 REDFT10 (DCT-II) of the data abcd corresponds to the size-8 logical DFT of the even array abcddcba, shifted by half a sample. All of these transforms are invertible. The inverse of R*DFT00 is R*DFT00; of R*DFT10 is R*DFT01 and vice versa; and of R*DFT11 is R*DFT11. However, the transforms computed by FFTW are unnormalized, exactly like the corresponding real and complex DFTs, so computing a transform followed by its inverse yields the original array scaled by N, where N is the _logical_ DFT size. For REDFT00, N=2(n-1); for RODFT00, N=2(n+1); otherwise, N=2n. Note that the boundary conditions of the transform output array are given by the input boundary conditions of the inverse transform. Thus, the above transforms are all inequivalent in terms of input/output boundary conditions, even neglecting the 0.5 shift difference. FFTW is most efficient when N is a product of small factors; note that this _differs_ from the factorization of the physical size `n' for REDFT00 and RODFT00! There is another oddity: `n=1' REDFT00 transforms correspond to N=0, and so are _not defined_ (the planner will return `NULL'). Otherwise, any positive `n' is supported. For the precise mathematical definitions of these transforms as used by FFTW, see *Note What FFTW Really Computes::. (For people accustomed to the DCT/DST, FFTW's definitions have a coefficient of 2 in front of the cos/sin functions so that they correspond precisely to an even/odd DFT of size N.) Which type do you need? ....................... Since the required flavor of even/odd DFT depends upon your problem, you are the best judge of this choice, but we can make a few comments on relative efficiency to help you in your selection. In particular, R*DFT01 and R*DFT10 tend to be slightly faster than R*DFT11 (especially for odd sizes), while the R*DFT00 transforms are significantly slower.(2) Thus, if only the boundary conditions on the transform inputs are specified, we generally recommend R*DFT10 over R*DFT00 and R*DFT01 over R*DFT11 (unless the half-sample shift or the self-inverse property is significant for your problem). If performance is important to you and you are using only small sizes (say n<200), e.g. for multi-dimensional transforms, then you might consider generating hard-coded transforms of those sizes and types that you are interested in (*note Generating your own code::). We are interested in hearing what types of symmetric transforms you find most useful. ---------- Footnotes ---------- (1) There are also type V-VIII transforms, which correspond to a logical DFT of _odd_ size N, independent of whether the physical size `n' is odd, but we do not support these variants. (2) R*DFT00 is slower in FFTW because we discovered that the standard algorithm for computing this by a pre/post-processed real DFT--the algorithm used in FFTPACK, Numerical Recipes, and other sources for decades now--has serious numerical problems: it already loses several decimal places of accuracy for 16k sizes. There seem to be only two alternatives in the literature that do not suffer similarly: a recursive decomposition into smaller DCTs, which would require a large set of codelets for efficiency and generality, or sacrificing a factor of two in speed to use a real DFT of twice the size. We currently employ the latter technique.  File: fftw3.info, Node: The Discrete Hartley Transform, Prev: Real even/odd DFTs (cosine/sine transforms), Up: More DFTs of Real Data The Discrete Hartley Transform ------------------------------ The discrete Hartley transform (DHT) is an invertible linear transform closely related to the DFT. In the DFT, one multiplies each input by cos - i * sin (a complex exponential), whereas in the DHT each input is multiplied by simply cos + sin. Thus, the DHT transforms `n' real numbers to `n' real numbers, and has the convenient property of being its own inverse. In FFTW, a DHT (of any positive `n') can be specified by an r2r kind of `FFTW_DHT'. If you are planning to use the DHT because you've heard that it is "faster" than the DFT (FFT), *stop here*. That story is an old but enduring misconception that was debunked in 1987: a properly designed real-input FFT (such as FFTW's) has no more operations in general than an FHT. Moreover, in FFTW, the DHT is ordinarily _slower_ than the DFT for composite sizes (see below). Like the DFT, in FFTW the DHT is unnormalized, so computing a DHT of size `n' followed by another DHT of the same size will result in the original array multiplied by `n'. The DHT was originally proposed as a more efficient alternative to the DFT for real data, but it was subsequently shown that a specialized DFT (such as FFTW's r2hc or r2c transforms) could be just as fast. In FFTW, the DHT is actually computed by post-processing an r2hc transform, so there is ordinarily no reason to prefer it from a performance perspective.(1) However, we have heard rumors that the DHT might be the most appropriate transform in its own right for certain applications, and we would be very interested to hear from anyone who finds it useful. If `FFTW_DHT' is specified for multiple dimensions of a multi-dimensional transform, FFTW computes the separable product of 1d DHTs along each dimension. Unfortunately, this is not quite the same thing as a true multi-dimensional DHT; you can compute the latter, if necessary, with at most `rank-1' post-processing passes [see e.g. H. Hao and R. N. Bracewell, Proc. IEEE 75, 264-266 (1987)]. For the precise mathematical definition of the DHT as used by FFTW, see *Note What FFTW Really Computes::. ---------- Footnotes ---------- (1) We provide the DHT mainly as a byproduct of some internal algorithms. FFTW computes a real input/output DFT of _prime_ size by re-expressing it as a DHT plus post/pre-processing and then using Rader's prime-DFT algorithm adapted to the DHT.  File: fftw3.info, Node: Other Important Topics, Next: FFTW Reference, Prev: Tutorial, Up: Top Other Important Topics ********************** * Menu: * Data Alignment:: * Multi-dimensional Array Format:: * Words of Wisdom-Saving Plans:: * Caveats in Using Wisdom::  File: fftw3.info, Node: Data Alignment, Next: Multi-dimensional Array Format, Prev: Other Important Topics, Up: Other Important Topics Data Alignment ============== * Menu: * SIMD alignment and fftw_malloc:: * Stack alignment on x86:: In order to get the best performance from FFTW, one needs to be somewhat aware of two problems related to data alignment on x86 (Pentia) architectures: alignment of allocated arrays (for use with SIMD acceleration), and alignment of the stack.  File: fftw3.info, Node: SIMD alignment and fftw_malloc, Next: Stack alignment on x86, Prev: Data Alignment, Up: Data Alignment SIMD alignment and fftw_malloc ------------------------------ SIMD, which stands for "Single Instruction Multiple Data," is a set of special operations supported by some processors to perform a single operation on several numbers (usually 2 or 4) simultaneously. SIMD floating-point instructions are available on several popular CPUs: SSE/SSE2 (single/double precision) on Pentium III/IV and higher, 3DNow! (single precision) on the AMD K7 and higher, and AltiVec (single precision) on the PowerPC G4 and higher. FFTW can be compiled to support the SIMD instructions on any of these systems. A program linking to an FFTW library compiled with SIMD support can obtain a nonnegligible speedup for most complex and r2c/c2r transforms. In order to obtain this speedup, however, the arrays of complex (or real) data passed to FFTW must be specially aligned in memory (typically 16-byte aligned), and often this alignment is more stringent than that provided by the usual `malloc' (etc.) allocation routines. In order to guarantee proper alignment for SIMD, therefore, in case your program is ever linked against a SIMD-using FFTW, we recommend allocating your transform data with `fftw_malloc' and de-allocating it with `fftw_free'. These have exactly the same interface and behavior as `malloc'/`free', except that for a SIMD FFTW they ensure that the returned pointer has the necessary alignment (by calling `memalign' or its equivalent on your OS). You are not _required_ to use `fftw_malloc'. You can allocate your data in any way that you like, from `malloc' to `new' (in C++) to a static array declaration. If the array happens not to be properly aligned, FFTW will not use the SIMD extensions.  File: fftw3.info, Node: Stack alignment on x86, Prev: SIMD alignment and fftw_malloc, Up: Data Alignment Stack alignment on x86 ---------------------- On the Pentium and subsequent x86 processors, there is a substantial performance penalty if double-precision variables are not stored 8-byte aligned; a factor of two or more is not unusual. Unfortunately, the stack (the place that local variables and subroutine arguments live) is not guaranteed by the Intel ABI to be 8-byte aligned. Recent versions of `gcc' (as well as most other compilers, we are told, such as Intel's, Metrowerks', and Microsoft's) are able to keep the stack 8-byte aligned; `gcc' does this by default (see `-mpreferred-stack-boundary' in the `gcc' documentation). If you are not certain whether your compiler maintains stack alignment by default, it is a good idea to make sure. Unfortunately, `gcc' only _preserves_ the stack alignment--as a result, if the stack starts off misaligned, it will always be misaligned, with a disastrous effect on performance (in double precision). Fortunately, recent versions of glibc (on GNU/Linux) provide a properly-aligned starting stack, but this was not the case with a number of older versions, and we are not certain of the situation on other operating systems. Hopefully, as time goes by this will become less of a concern, but if you want to be paranoid you can copy the code from FFTW's `libbench2/aligned-main.c' to guarantee alignment of your `main' function (with `gcc').  File: fftw3.info, Node: Multi-dimensional Array Format, Next: Words of Wisdom-Saving Plans, Prev: Data Alignment, Up: Other Important Topics Multi-dimensional Array Format ============================== This section describes the format in which multi-dimensional arrays are stored in FFTW. We felt that a detailed discussion of this topic was necessary. Since several different formats are common, this topic is often a source of confusion among users. * Menu: * Row-major Format:: * Column-major Format:: * Static Arrays in C:: * Dynamic Arrays in C:: * Dynamic Arrays in C-The Wrong Way::  File: fftw3.info, Node: Row-major Format, Next: Column-major Format, Prev: Multi-dimensional Array Format, Up: Multi-dimensional Array Format Row-major Format ---------------- The multi-dimensional arrays passed to `fftw_plan_dft' etcetera are expected to be stored as a single contiguous block in "row-major" order (sometimes called "C order"). Basically, this means that as you step through adjacent memory locations, the first dimension's index varies most slowly and the last dimension's index varies most quickly. To be more explicit, let us consider an array of rank d whose dimensions are n1 x n2 x n3 x ... x nd . Now, we specify a location in the array by a sequence of (zero-based) indices, one for each dimension: (i1, i2, ..., id). If the array is stored in row-major order, then this element is located at the position id + nd * (id-1 + nd-1 * (... + n2 * i1)). Note that, for the ordinary complex DFT, each element of the array must be of type `fftw_complex'; i.e. a (real, imaginary) pair of (double-precision) numbers. In the advanced FFTW interface, the physical dimensions n from which the indices are computed can be different from (larger than) the logical dimensions of the transform to be computed, in order to transform a subset of a larger array. Note also that, in the advanced interface, the expression above is multiplied by a "stride" to get the actual array index--this is useful in situations where each element of the multi-dimensional array is actually a data structure (or another array), and you just want to transform a single field. In the basic interface, however, the stride is 1.  File: fftw3.info, Node: Column-major Format, Next: Static Arrays in C, Prev: Row-major Format, Up: Multi-dimensional Array Format Column-major Format ------------------- Readers from the Fortran world are used to arrays stored in "column-major" order (sometimes called "Fortran order"). This is essentially the exact opposite of row-major order in that, here, the _first_ dimension's index varies most quickly. If you have an array stored in column-major order and wish to transform it using FFTW, it is quite easy to do. When creating the plan, simply pass the dimensions of the array to the planner in _reverse order_. For example, if your array is a rank three `N x M x L' matrix in column-major order, you should pass the dimensions of the array as if it were an `L x M x N' matrix (which it is, from the perspective of FFTW). This is done for you _automatically_ by the FFTW Fortran interface (*note Calling FFTW from Fortran::).  File: fftw3.info, Node: Static Arrays in C, Next: Dynamic Arrays in C, Prev: Column-major Format, Up: Multi-dimensional Array Format Static Arrays in C ------------------ Multi-dimensional arrays declared statically (that is, at compile time, not necessarily with the `static' keyword) in C are _already_ in row-major order. You don't have to do anything special to transform them. For example: { fftw_complex data[NX][NY][NZ]; fftw_plan plan; ... plan = fftw_plan_dft_3d(NX, NY, NZ, &data[0][0][0], &data[0][0][0], FFTW_FORWARD, FFTW_ESTIMATE); ... } This will plan a 3d in-place transform of size `NX x NY x NZ'. Notice how we took the address of the zero-th element to pass to the planner (we could also have used a typecast). However, we tend to _discourage_ users from declaring their arrays statically in this way, for two reasons. First, this allocates the array on the stack, which has a very limited size on most operating systems (declaring an array with more than a few thousand elements will often cause a crash). Second, it may not optimally align the array if you link with a SIMD FFTW (*note SIMD alignment and fftw_malloc::). Instead, we recommend using `fftw_malloc', as described below.