NAG Library Function Document
nag_2d_spline_fit_scat (e02ddc)
 
1
 Purpose
nag_2d_spline_fit_scat (e02ddc) computes a bicubic spline approximation to a set of scattered data.  The knots of the spline are located automatically, but a single argument must be specified to control the trade-off between closeness of fit and smoothness of fit.
 
2
 Specification
| 
| #include <nag.h> |  
| #include <nage02.h> |  
| void | nag_2d_spline_fit_scat (Nag_Start start,
Integer m,
const double x[],
const double y[],
const double f[],
const double weights[],
double s,
Integer nxest,
Integer nyest,
double *fp,
Integer *rank,
double *warmstartinf,
Nag_2dSpline *spline,
NagError *fail) |  | 
 
3
 Description
nag_2d_spline_fit_scat (e02ddc) determines a smooth bicubic spline approximation  to the set of data points  with weights , for .
The approximation domain is considered to be the rectangle , where  and  denote the lowest and highest data values of .
The spline is given in the B-spline representation
where 
 and 
 denote normalized cubic B-splines, the former defined on the knots 
 to 
 and the latter on the knots 
 to 
.  For further details, see 
Hayes and Halliday (1974) for bicubic splines and 
de Boor (1972) for normalized B-splines.
The total numbers 
 and 
 of these knots and their values 
 and 
 are chosen automatically by the function.  The knots 
 and 
 are the interior knots; they divide the approximation domain 
 into 
 subpanels 
, for 
 and 
.  Then, much as in the curve case (see 
nag_1d_spline_fit (e02bec));, the coefficients 
 are determined as the solution of the following constrained minimization problem:
minimize
subject to the constraint
where 
 is a measure of the (lack of) smoothness of 
.  Its value depends on the discontinuity jumps in 
 across the boundaries of the subpanels.  It is zero only when there are no discontinuities and is positive otherwise, increasing with the size of the jumps (see 
Dierckx (1981b) for details).  
 denotes the weighted residual 
, and 
 is a non-negative number to be specified.
By means of the argument 
, ‘the smoothing factor’, you will then control the balance between smoothness and closeness of fit, as measured by the sum of squares of residuals in 3.  If 
 is too large, the spline will be too smooth and signal will be lost (underfit); if 
 is too small, the spline will pick up too much noise (overfit).  In the extreme cases the method would return an interpolating spline 
 if 
 were set to zero, and the least squares bicubic polynomial 
 if 
 is set very large.  Experimenting with 
 values between these two extremes should result in a good compromise.  (See 
Section 9.3 for advice on choice of 
.)  Note however, that this function, unlike 
nag_1d_spline_fit (e02bec) and 
nag_2d_spline_fit_grid (e02dcc), does not allow 
 to be set exactly to zero.
The method employed is outlined in 
Section 9.5 and fully described in 
Dierckx (1981a) and 
Dierckx (1981b).  It involves an adaptive strategy for locating the knots of the bicubic spline (depending on the function underlying the data and on the value of 
), and an iterative method for solving the constrained minimization problem once the knots have been determined.
Values and derivatives of the computed spline can subsequently be computed by calling 
nag_2d_spline_eval (e02dec), 
nag_2d_spline_eval_rect (e02dfc) and 
nag_2d_spline_deriv_rect (e02dhc) as described in 
Section 9.6.
 
4
 References
de Boor C (1972)  On calculating with B-splines J. Approx. Theory 6 50–62 
Dierckx P (1981a)  An improved algorithm for curve fitting with spline functions Report TW54 Department of Computer Science, Katholieke Univerciteit Leuven 
Dierckx P (1981b)  An algorithm for surface fitting with spline functions IMA J. Numer. Anal. 1 267–283 
Hayes J G and Halliday J (1974)  The least squares fitting of cubic spline surfaces to general data sets J. Inst. Math. Appl. 14 89–103 
Peters G and Wilkinson J H (1970)  The least squares problem and pseudo-inverses Comput. J. 13 309–316 
Reinsch C H (1967)  Smoothing by spline functions Numer. Math. 10 177–183 
 
5
 Arguments
- 1:
  
      – Nag_StartInput
- 
On entry:  start must be set to   or  .
 
-  (cold start)
- The function will build up the knot set starting with no interior knots.  No values need be assigned to  and  and memory will be internally allocated to ,  and .
-  (warm start)
- The function will restart the knot-placing strategy using the knots found in a previous call of the function.  In this case, all arguments except s must be unchanged from that previous call.  This warm start can save much time in searching for a satisfactory value of .
 
 Constraint:
   or .
 
- 2:
  
      – IntegerInput
- 
On entry:  , the number of data points.
 The number of data points with nonzero weight (see  weights) must be at least 16. 
 
- 3:
  
      – const doubleInput
- 4:
  
      – const doubleInput
- 5:
  
      – const doubleInput
- 
On entry: , ,  must be set to the coordinates of , the th data point, for .  The order of the data points is immaterial. 
- 6:
  
      – const doubleInput
- 
On entry:   must be set to  , the  th value in the set of weights, for  .  Zero weights are permitted and the corresponding points are ignored, except when determining  ,  ,   and   (see  Section 9.4).  For advice on the choice of weights, see the  e02 Chapter Introduction. 
 Constraint:
   the number of data points with nonzero weight must be at least 16. 
- 7:
  
      – doubleInput
- 
On entry: the smoothing factor,  .  For advice on the choice of  , see  Section 3 and  Section 9.2. 
 Constraint:
  .
 
- 8:
  
      – IntegerInput
- 9:
  
      – IntegerInput
- 
On entry: an upper bound for the number of knots   and   required in the   and   directions respectively.  In most practical situations,   is sufficient.  See also  Section 9.3. 
 Constraint:
   and .
 
- 10:
  
    – double *Output
- 
On exit: the weighted sum of squared residuals,  , of the computed spline approximation.   fp should equal   within a relative tolerance of 0.001 unless  , when the spline has no interior knots and so is simply a bicubic polynomial.  For knots to be inserted,   must be set to a value below the value of  fp produced in this case. 
 
- 11:
  
    – Integer *Output
- 
On exit:  rank gives the rank of the system of equations used to compute the final spline (as determined by a suitable machine-dependent threshold).  When  , the solution is unique; otherwise the system is rank-deficient and the minimum-norm solution is computed.  The latter case may be caused by too small a value of  . 
 
- 12:
  
    – double *Output
- 
On exit: if the warm start option is used, its value must be left unchanged from the previous call. 
- 13:
  
    – Nag_2dSpline *
- 
Pointer to structure of type Nag_2dSpline with the following members: 
- nx – IntegerInput/Output
- 
On entry: if the warm start option is used, the value of  must be left unchanged from the previous call. On exit: the total number of knots, , of the computed spline with respect to the  variable. 
 
- lamda – double *Input/Output
- 
On entry: a pointer to which if  , memory of size  nxest is internally allocated.  If the warm start option is used, the values   must be left unchanged from the previous call. 
 On exit:   contains the complete set of knots   associated with the   variable, i.e., the interior knots   as well as the additional knots   and   needed for the B-spline representation (where   and   are as described in  Section 3). 
 
 
- ny – IntegerInput/Output
- 
On entry: if the warm start option is used, the value of  must be left unchanged from the previous call. On exit: the total number of knots, , of the computed spline with respect to the  variable. 
 
- mu – double *Input/Output
- 
On entry: a pointer to which if  , memory of size  nyest is internally allocated.  If the warm start option is used, the values   must be left unchanged from the previous call. 
 On exit:   contains the complete set of knots   associated with the   variable, i.e., the interior knots  ,  ,  ,   as well as the additional knots   and   needed for the B-spline representation (where   and   are as described in  Section 3). 
 
 
- c – double *Output
- 
On exit: a pointer to which, if  , memory of size   is internally allocated.    is the coefficient   defined in  Section 3. 
 
 
 Note that when the information contained in the pointers  ,   and   is no longer of use, or before a new call to  nag_2d_spline_fit_scat (e02ddc) with the same  spline, you should free this storage using the NAG macro  NAG_FREE.  This storage will have been allocated only if this function returns with  ,  NE_NUM_KNOTS_2D_GT_SCAT,
 NE_NUM_COEFF_GT, 
 NE_NO_ADDITIONAL_KNOTS, 
 NE_SPLINE_COEFF_CONV or, possibly, 
 NE_ALLOC_FAIL.
 
- 14:
  
    – NagError *Input/Output
- 
The NAG error argument (see  Section 3.7 in How to Use the NAG Library and its Documentation). 
 
6
 Error Indicators and Warnings
- If the function fails with an error exit of NE_NUM_KNOTS_2D_GT_SCAT, NE_NUM_COEFF_GT, NE_NO_ADDITIONAL_KNOTS or NE_SPLINE_COEFF_CONV, then a spline approximation is returned, but it fails to satisfy the fitting criterion (see (2) and (3)) – perhaps by only a small amount, however.
- NE_ALL_ELEMENTS_EQUAL
- 
On entry, all the values in the array  x must not be equal.
 
On entry, all the values in the array  y must not be equal.
 
- NE_ALLOC_FAIL
- 
Dynamic memory allocation failed.
 
- NE_BAD_PARAM
- 
On entry, argument  start had an illegal value.
 
- NE_ENUMTYPE_WARM
- 
 at the first call of this function.   start must be set to   at the first call.
 
- NE_INT_ARG_LT
- 
On entry, .
 Constraint: .
 On entry, .
 Constraint: .
 
- NE_NO_ADDITIONAL_KNOTS
- 
No more knots added; the additional knot would coincide with an old one.  Possibly an inaccurate data point has too large a weight, or  s is too small.   .
 
- NE_NON_ZERO_WEIGHTS
- 
On entry, the number of data points with nonzero weights .
 Constraint: the number of nonzero weights .
 
- NE_NUM_COEFF_GT
- 
No more knots can be added because the number of B-spline coefficients already exceeds  m.  Either  m or  s is probably too small:  ,  .
 
- NE_NUM_KNOTS_2D_GT_SCAT
- 
The number of knots required is greater than allowed by  nxest or  nyest,  ,  .  Possibly  s is too small, especially if  nxest,  .   ,  .
 
- NE_REAL_ARG_LE
- 
On entry,  s must not be less than or equal to 0.0:  .
 
- NE_SPLINE_COEFF_CONV
- 
The iterative process has failed to converge.  Possibly  s is too small:  .
 
 
7
 Accuracy
On successful exit, the approximation returned is such that its weighted sum of squared residuals 
fp is equal to the smoothing factor 
, up to a specified relative tolerance of 0.001 – except that if 
 and 
, 
fp may be significantly less than 
: in this case the computed spline is simply the least squares bicubic polynomial approximation of degree 3, i.e., a spline with no interior knots.
 
8
 Parallelism and Performance
nag_2d_spline_fit_scat (e02ddc) is not threaded in any implementation.
 
9.1
 Timing 
The time taken for a call of nag_2d_spline_fit_scat (e02ddc) depends on the complexity of the shape of the data, the value of the smoothing factor , and the number of data points.  If nag_2d_spline_fit_scat (e02ddc) is to be called for different values of , much time can be saved by setting  after the first call.
It should be noted that choosing  very small considerably increases computation time.
 
9.2
 Choice of  
If the weights have been correctly chosen (see the 
e02 Chapter Introduction), the standard deviation of 
 would be the same for all 
, equal to 
, say.  In this case, choosing the smoothing factor 
 in the range 
, as suggested by 
Reinsch (1967), is likely to give a good start in the search for a satisfactory value.  Otherwise, experimenting with different values of 
 will be required from the start.
In that case, in view of computation time and memory requirements, it is recommended to start with a very large value for 
 and so determine the least squares bicubic polynomial; the value returned for 
fp, call it 
, gives an upper bound for 
.  Then progressively decrease the value of 
 to obtain closer fits – say by a factor of 10 in the beginning, i.e., 
, 
, and so on, and more carefully as the approximation shows more details.
To choose 
 very small is strongly discouraged.  This considerably increases computation time and memory requirements.  It may also cause rank-deficiency (as indicated by the argument 
rank) and endanger numerical stability.
The number of knots of the spline returned, and their location, generally depend on the value of  and on the behaviour of the function underlying the data.  However, if nag_2d_spline_fit_scat (e02ddc) is called with , the knots returned may also depend on the smoothing factors of the previous calls.  Therefore if, after a number of trials with different values of  and , a fit can finally be accepted as satisfactory, it may be worthwhile to call nag_2d_spline_fit_scat (e02ddc) once more with the selected value for  but now using .  Often, nag_2d_spline_fit_scat (e02ddc) then returns an approximation with the same quality of fit but with fewer knots, which is therefore better if data reduction is also important.
 
9.3
 Choice of nxest and nyest 
The number of knots may also depend on the upper bounds 
nxest and 
nyest.  Indeed, if at a certain stage in 
nag_2d_spline_fit_scat (e02ddc) the number of knots in one direction (say 
) has reached the value of its upper bound (
nxest), then from that moment on all subsequent knots are added in the other 
 direction.  This may indicate that the value of 
nxest is too small.  On the other hand, it gives you the option of limiting the number of knots the function locates in any direction.  For example, by setting 
 (the lowest allowable value for 
nxest), you can indicate that you want an approximation which is a simple cubic polynomial in the variable 
.
 
9.4
 Restriction of the Approximation Domain 
The fit obtained is not defined outside the rectangle .  The reason for taking the extreme data values of  and  for these four knots is that, as is usual in data fitting, the fit cannot be expected to give satisfactory values outside the data region.  If, nevertheless, you require values over a larger rectangle, this can be achieved by augmenting the data with two artificial data points  and  with zero weight, where  denotes the enlarged rectangle.
 
9.5
 Outline of Method Used 
First suitable knot sets are built up in stages (starting with no interior knots in the case of a cold start but with the knot set found in a previous call if a warm start is chosen).  At each stage, a bicubic spline is fitted to the data by least squares and , the sum of squares of residuals, is computed.  If , a new knot is added to one knot set or the other so as to reduce  at the next stage.  The new knot is located in an interval where the fit is particularly poor.  Sooner or later, we find that  and at that point the knot sets are accepted.  The function then goes on to compute a spline which has these knot sets and which satisfies the full fitting criterion specified by 2 and 3.  The theoretical solution has .  The function computes the spline by an iterative scheme which is ended when  within a relative tolerance of 0.001.  The main part of each iteration consists of a linear least squares computation of special form.  The minimal least squares solution is computed wherever the linear system is found to be rank-deficient.
An exception occurs when the function finds at the start that, even with no interior knots , the least squares spline already has its sum of squares of residuals .  In this case, since this spline (which is simply a bicubic polynomial) also has an optimal value for the smoothness measure , namely zero, it is returned at once as the (trivial) solution.  It will usually mean that  has been chosen too large.
For further details of the algorithm and its use see 
Dierckx (1981b).
 
9.6
 Evaluation of Computed Spline
The values of the computed spline at the points 
, for 
, may be obtained in the array 
ff, of length at least 
n, by the following code:
e02dec(n, tx, ty, ff, &spline, &fail)
where 
spline is a structure of type Nag_2dSpline which is an output argument of 
nag_2d_spline_fit_scat (e02ddc).
To evaluate the computed spline on a 
kx by 
ky rectangular grid of points in the 
-
 plane, which is defined by the 
 coordinates stored in 
, for 
, and the 
 coordinates stored in 
, for 
, returning the results in the array 
fg which is of length at least 
, the following call may be used:
e02dfc(kx, ky, tx, ty, fg, &spline, &fail)
where 
spline is a structure of type Nag_2dSpline which is an output argument of 
nag_2d_spline_fit_scat (e02ddc).  The result of the spline evaluated at grid point 
 is returned in element 
 of the array 
fg.
 
10
 Example
This example program reads in a value of 
m, followed by a set of 
m data points 
 and their weights 
.  It then calls 
nag_2d_spline_fit_scat (e02ddc) to compute a bicubic spline approximation for one specified value of S, and prints the values of the computed knots and B-spline coefficients.  Finally it evaluates the spline at a small sample of points on a rectangular grid.
 
10.1
 Program Text
Program Text (e02ddce.c)
 
10.2
 Program Data
Program Data (e02ddce.d)
 
10.3
 Program Results
Program Results (e02ddce.r)