NAG Library Function Document
nag_sum_fast_gauss (c06sac)
1
Purpose
nag_sum_fast_gauss (c06sac) calculates the multidimensional fast Gauss transform.
2
Specification
#include <nag.h> 
#include <nagc06.h> 
void 
nag_sum_fast_gauss (Integer d,
const double srcs[],
Integer n,
const double trgs[],
Integer m,
const double q[],
Integer *p1,
Integer *p2,
Integer *k,
const double hin[],
Integer lhin,
double tol,
double v[],
double term[],
NagError *fail) 

3
Description
nag_sum_fast_gauss (c06sac) calculates the
$d$dimensional fast Gauss transform (FGT),
$\hat{G}\left(y\right)$, that approximates the discrete Gauss transform (DGT),
$G\left(y\right)$, evaluated at a set of target points
${y}_{\mathit{j}}$, for
$\mathit{j}=1,2,\dots ,m\in {\mathbb{R}}^{d}$. The DGT is defined as:
where
${x}_{\mathit{i}}$, for
$\mathit{i}=1,2,\dots ,n\in {\mathbb{R}}^{d}$, are the Gaussian source points,
${q}_{\mathit{i}}$, for
$\mathit{i}=1,2,\dots ,n\in {\mathbb{R}}^{+}$, are the source weights and
${h}_{\mathit{i}}$, for
$\mathit{i}=1,2,\dots ,n\in {\mathbb{R}}^{+}$, are the source standard deviations (alternatively source scales or source bandwidths).
This function implements the improved FGT algorithm presented in
Raykar and Duraiswami (2005). The algorithm clusters the sources into
$k$ distinct clusters and then computes two Taylor series approximations per cluster with
${p}_{1}$ and
${p}_{2}$ terms respectively. You must provide
${p}_{1}$,
${p}_{2}$ and
$k$ when calling the function. See
Section 7 below for a further discussion on accuracy when choosing their values.
The input array
${\mathbf{hin}}$ of this function is designed to allow maximum flexibility in the supply of the standard deviation arguments by reusing, in a cyclic manner, elements of the array when it is less than
$n$ elements long. For example, if all Gaussian sources have the same standard deviation then it is only necessary to set
${\mathbf{lhin}}$ to
$1$ and to provide the value of the standard deviation in
${\mathbf{hin}}\left(1\right)$; the function will then automatically expand
${\mathbf{hin}}$ to be of length
$n$. For further details please see
Section 2.6 in the g01 Chapter Introduction.
4
References
Greengard L and Strain J (1991) The Fast Gauss Transform SIAM J. Sci. Statist. Comput. 12(1) 79–94
Raykar V C and Duraiswami R (2005) Improved Fast Gauss Transform With Variable Source Scales University of Maryland Technical Report CSTR4727/UMIACSTR200534
5
Arguments
 1:
$\mathbf{d}$ – IntegerInput

On entry: $d$, the number of dimensions.
Constraint:
${\mathbf{d}}>0$.
 2:
$\mathbf{srcs}\left[{\mathbf{d}}\times {\mathbf{n}}\right]$ – const doubleInput

Note: the $\left(i,j\right)$th element of the matrix is stored in ${\mathbf{srcs}}\left[\left(j1\right)\times {\mathbf{d}}+i1\right]$.
On entry: $x$, the locations of the Gaussian sources.
 3:
$\mathbf{n}$ – IntegerInput

On entry: $n$, the number of Gaussian sources.
Constraint:
${\mathbf{n}}>0$.
 4:
$\mathbf{trgs}\left[{\mathbf{d}}\times {\mathbf{m}}\right]$ – const doubleInput

Note: the $\left(i,j\right)$th element of the matrix is stored in ${\mathbf{trgs}}\left[\left(j1\right)\times {\mathbf{d}}+i1\right]$.
On entry: $y$, the locations of the target points at which the FGT will be evaluated.
 5:
$\mathbf{m}$ – IntegerInput

On entry: $m$, the number of target points.
Constraint:
${\mathbf{m}}>0$.
 6:
$\mathbf{q}\left[{\mathbf{n}}\right]$ – const doubleInput

On entry: $q$, the weights of the Gaussian sources.
 7:
$\mathbf{p1}$ – Integer *Input/Output

On entry: ${p}_{1}$, the number of terms of the first Taylor series to be evaluated.
On exit:
p1 is unchanged.
Constraint:
${\mathbf{p1}}>0$.
 8:
$\mathbf{p2}$ – Integer *Input/Output

On entry: ${p}_{2}$, the number of terms of the second Taylor series to be evaluated.
On exit:
p2 is unchanged.
Constraint:
${\mathbf{p2}}>0$.
 9:
$\mathbf{k}$ – Integer *Input/Output

On entry: $k$, the number of clusters into which the source points will be aggregated.
Constraint:
$1\le {\mathbf{k}}\le {\mathbf{n}}$.
 10:
$\mathbf{hin}\left[{\mathbf{lhin}}\right]$ – const doubleInput

On entry:
$h$, the standard deviations of the Gaussian sources. If
${\mathbf{lhin}}<{\mathbf{n}}$, the array will be expanded automatically by repeating
hin until it is of length
n. See
Section 2.6 in the g01 Chapter Introduction for further information.
Constraint:
${\mathbf{hin}}\left[\mathit{i}\right]>0.0$, for $\mathit{i}=0,1,\dots ,{\mathbf{lhin}}1$.
 11:
$\mathbf{lhin}$ – IntegerInput

On entry: the length of the array
hin.
Constraint:
$1\le {\mathbf{lhin}}\le {\mathbf{n}}$.
 12:
$\mathbf{tol}$ – doubleInput

On entry: $\epsilon $, the desired accuracy of the FGT approximation of the DGT. Determines the radius of the source clusters: the contribution of a source point to the FGT approximation at a target point is disregarded if the source is outside the corresponding cluster radius.
Constraint:
${\mathbf{tol}}>0.0$.
 13:
$\mathbf{v}\left[{\mathbf{m}}\right]$ – doubleOutput

On exit: $\hat{G}\left(y\right)$, the value of the FGT evaluated at $y$.
 14:
$\mathbf{term}\left[{\mathbf{m}}\right]$ – doubleOutput

On exit: ${\mathbf{term}}\left[j1\right]$ contains the absolute value of the final Taylor series term that is largest, relative to the size of the sum of the corresponding series, across all clusters that contribute to the FGT at target point ${\mathbf{v}}\left[j1\right]$.
 15:
$\mathbf{fail}$ – NagError *Input/Output

The NAG error argument (see
Section 3.7 in How to Use the NAG Library and its Documentation).
6
Error Indicators and Warnings
 NE_ALLOC_FAIL

Dynamic memory allocation failed.
See
Section 2.3.1.2 in How to Use the NAG Library and its Documentation for further information.
 NE_ARRAY_SIZE

On entry, ${\mathbf{lhin}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: $1\le {\mathbf{lhin}}\le {\mathbf{n}}$.
 NE_BAD_PARAM

On entry, argument $\u2329\mathit{\text{value}}\u232a$ had an illegal value.
 NE_INT

On entry, ${\mathbf{d}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{d}}>0$.
On entry, ${\mathbf{m}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{m}}>0$.
On entry, ${\mathbf{n}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{n}}>0$.
On entry, ${\mathbf{p1}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{p1}}>0$.
On entry, ${\mathbf{p2}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{p2}}>0$.
 NE_INT_2

On entry, ${\mathbf{k}}=\u2329\mathit{\text{value}}\u232a$ and ${\mathbf{n}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: $1\le {\mathbf{k}}\le {\mathbf{n}}$.
 NE_INTERNAL_ERROR

An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact
NAG for assistance.
See
Section 2.7.6 in How to Use the NAG Library and its Documentation for further information.
 NE_NO_LICENCE

Your licence key may have expired or may not have been installed correctly.
See
Section 2.7.5 in How to Use the NAG Library and its Documentation for further information.
 NE_REAL

On entry, ${\mathbf{tol}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{tol}}>0.0$.
 NE_REAL_ARRAY_INPUT

On entry, ${\mathbf{hin}}\left[\u2329\mathit{\text{value}}\u232a\right]=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{hin}}\left[\mathit{i}\right]>0.0$, for $\mathit{i}=0,1,\dots ,{\mathbf{lhin}}1$.
 NW_TOO_FEW_TERMS

On exit,
${\mathbf{p1}}=\u2329\mathit{\text{value}}\u232a$,
${\mathbf{p2}}=\u2329\mathit{\text{value}}\u232a$ and
${\mathbf{k}}=\u2329\mathit{\text{value}}\u232a$.
p1,
p2 or
k may have been too small to calculate
${\mathbf{v}}\left[{\mathbf{m}}1\right]$ to the required accuracy
tol.
7
Accuracy
The function does not currently implement the procedure described in
Raykar and Duraiswami (2005) for automatically determining values for
p1,
p2 and
k. Nonzero values must therefore be provided for these parameters when calling the function.
For a given set of source and target points and a specified tolerance, there is an interaction between the number of clusters,
k, and the number of Taylor series terms,
p1 and
p2: if the sources are clustered together in fewer clusters (small
k) then more terms will be needed in each cluster's Taylor series (large
p1 and
p2) to capture the effect of the source points further from the cluster centres. Increasing the number of clusters reduces their individual radii and requires fewer terms in their Taylor series, but increases the number of Taylor series that must be evaluated overall.
If the source and target points are uniformly distributed in a unit hypercube,
Raykar and Duraiswami (2005) advise users to select
${\mathbf{k}}\sim \u2308{\left({h}_{\mathrm{max}}+{h}_{\mathrm{min}}/2\right)}^{d}\u2309$. If the points are not uniformly distributed then more clusters than this will be needed to calculate the FGT to within the specified
tol without requiring prohibitively large values for
p1 and
p2.
There is less guidance available for selecting good values for
p1 and
p2. As the number of Taylor series terms is a major factor on the computation time taken by this function, you are advised to initially try a small number, e.g.
$20$ or so, and then tune
p1 and
p2 up or down based on the values returned. Note that
p1 and
p2 are not required to have identical values.
To aid the selection of values for
p1,
p2 and
k, the function returns in
${\mathbf{term}}\left[j1\right]$ the absolute value of the final Taylor series term that is largest, relative to the size of the sum of the corresponding series, across all clusters that contribute to the FGT at target point
$j$. If this value is larger than the requested
tol, the function will additionally return a nonzero
fail value and you are advised to rerun the function with larger
p1,
p2 or
k.
8
Parallelism and Performance
nag_sum_fast_gauss (c06sac) is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
nag_sum_fast_gauss (c06sac) makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the
x06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this function. Please also consult the
Users' Note for your implementation for any additional implementationspecific information.
The time complexity of the algorithm implemented by this function is $O\left(M+N\right)$, versus the $O\left(MN\right)$ time complexity of evaluating the DGT directly.
10
Example
In this example values for $x$, $y$, ${p}_{1}$, ${p}_{2}$, $k$ and $\epsilon $ are read in, $\hat{G}\left(y\right)$ calculated and the results displayed.
10.1
Program Text
Program Text (c06sace.c)
10.2
Program Data
Program Data (c06sace.d)
10.3
Program Results
Program Results (c06sace.r)