void	nag_smooth_spline_estim (Nag_SmoothParamMethods method, Integer n, const double x[], const double y[], const double weights[], double yhat[], double coeff[], double rss, double df, double res[], double h[], double crit, double rho, double u, double tol, Integer maxcal, NagError *fail)

3

Description

For a set of

n

observations

(x_{i}, y_{i})

, for

i = 1, 2, \dots, n

, the spline provides a flexible smooth function for situations in which a simple polynomial or nonlinear regression model is not suitable.

Cubic smoothing splines arise as the unique real-valued solution function,

f

, with absolutely continuous first derivative and squared-integrable second derivative, which minimizes:

\sum_{i = 1}^{n} w_{i} {\{y_{i} - f (x_{i})\}}^{2} + ρ \int_{- \infty}^{\infty} {\{f^{''} (x)\}}^{2} d x,

where

w_{i}

is the (optional) weight for the

i

th observation and

ρ

is the smoothing argument. This criterion consists of two parts: the first measures the fit of the curve and the second the smoothness of the curve. The value of the smoothing argument

ρ

weights these two aspects; larger values of

ρ

give a smoother fitted curve but, in general, a poorer fit. For details of how the cubic spline can be fitted see Hutchinson and de Hoog (1985) and Reinsch (1967).

The fitted values,

\hat{y} = {({\hat{y}}_{1}, {\hat{y}}_{2}, \dots, {\hat{y}}_{n})}^{T}

, and weighted residuals,

r_{i}

, can be written as:

\hat{y} = H y and r_{i} = \sqrt{w_{i}} (y_{i} - {\hat{y}}_{i})

for a matrix

H

. The residual degrees of freedom for the spline is trace

(I - H)

and the diagonal elements of

H

are the leverages.

The argument

ρ

can be estimated in a number of ways.

The degrees of freedom for the spline can be specified, i.e., find

ρ

such that trace

(H) = ν_{0}

for given

ν_{0}

Minimize the cross-validation (CV), i.e., find

ρ

such that the CV is minimized, where

CV = \frac{1}{\sum_{i = 1}^{n} w_{i}} \sum_{i = 1}^{n} {[\frac{r_{i}}{1 - h_{i i}}]}^{2} .

Minimize the generalized cross-validation (GCV), i.e., find

ρ

such that the GCV is minimized, where

GCV = \frac{n^{2}}{\sum_{i = 1}^{n} w_{i}} [\frac{\sum_{i = 1}^{n} r_{i}^{2}}{{(\sum_{i = 1}^{n} (1 - h_{i i}))}^{2}}] .

nag_smooth_spline_estim (g10acc) requires the

x_{i}

to be strictly increasing. If two or more observations have the same

x_{i}

value then they should be replaced by a single observation with

y_{i}

equal to the (weighted) mean of the

y

values and weight,

w_{i}

, equal to the sum of the weights. This operation can be performed by nag_order_data (g10zac).

The algorithm is based on Hutchinson (1986).

4

References

Hastie T J and Tibshirani R J (1990) Generalized Additive Models Chapman and Hall

Hutchinson M F (1986) Algorithm 642: A fast procedure for calculating minimum cross-validation cubic smoothing splines ACM Trans. Math. Software 12 150–153

Hutchinson M F and de Hoog F R (1985) Smoothing noisy data with spline functions Numer. Math. 47 99–106

Reinsch C H (1967) Smoothing by spline functions Numer. Math. 10 177–183

5

Arguments

1: $method$ – Nag_SmoothParamMethodsInput

On entry: indicates whether the smoothing argument is to be found by minimization of the CV or GCV functions, or by finding the smoothing argument corresponding to a specified degrees of freedom value.

$method = Nag_SmoothParamCV$: Cross-validation is used.
$method = Nag_SmoothParamDF$: The degrees of freedom are specified.
$method = Nag_SmoothParamGCV$: Generalized cross-validation is used.

Constraint:

method = Nag_SmoothParamCV

Nag_SmoothParamDF

Nag_SmoothParamGCV

2: $n$ – IntegerInput

On entry: the number of observations,

n

Constraint:

n \geq 3

3: $x [n]$ – const doubleInput

On entry: the distinct and ordered values

x_{i}

, for

i = 1, 2, \dots, n

Constraint:

x [i - 1] < x [i]

, for

i = 1, 2, \dots, n - 1

4: $y [n]$ – const doubleInput

On entry: the values

y_{i}

, for

i = 1, 2, \dots, n

5: $weights [n]$ – const doubleInput

On entry: weights must contain the

n

weights, if they are required. Otherwise, weights must be set to NULL.

Constraint: if weights are required, then

weights [i - 1] > 0.0

, for

i = 1, 2, \dots, n

6: $yhat [n]$ – doubleOutput

On exit: the fitted values,

{\hat{y}}_{i}

, for

i = 1, 2, \dots, n

7: $coeff [(n - 1) \times 3]$ – doubleOutput

On exit: the spline coefficients. More precisely, the value of the spline approximation at

t

is given by

((coeff [(i - 1) \times (n - 1) + 2] \times d + coeff [(i - 1) \times (n - 1) + 1]) \times d + coeff [(i - 1) \times (n - 1)]) \times d + {\hat{y}}_{i}

, where

x_{i} \leq t < x_{i + 1}

and

d = {t - x}_{i}

8: $rss$ – double *Output

On exit: the (weighted) residual sum of squares.

9: $df$ – double *Output

On exit: the residual degrees of freedom. If

method = Nag_SmoothParamDF

, this will be

n - crit

to the required accuracy.

10: $res [n]$ – doubleOutput

On exit: the (weighted) residuals,

r_{i}

, for

i = 1, 2, \dots, n

11: $h [n]$ – doubleOutput

On exit: the leverages,

h_{i i}

, for

i = 1, 2, \dots, n

12: $crit$ – double *Input/Output

On entry: if

method = Nag_SmoothParamDF

, the required degrees of freedom for the spline.

method = Nag_SmoothParamCV

Nag_SmoothParamGCV

, crit need not be set.

Constraint:

2.0 < crit \leq n

On exit: if

method = Nag_SmoothParamCV

, the value of the cross-validation, or if

method = Nag_SmoothParamGCV

, the value of the generalized cross-validation function, evaluated at the value of

ρ

returned in rho.

13: $rho$ – double *Output

On exit: the smoothing argument,

ρ

14: $u$ – doubleInput

On entry: the upper bound on the smoothing argument. See Section 9 for details on how this argument is used.

Suggested value:

u = 1000.0

Constraint:

u > tol

15: $tol$ – doubleInput

On entry: the accuracy to which the smoothing argument rho is required. tol should be preferably not much less than

\sqrt{ε}

, where

ε

is the machine precision.

Constraint:

tol \geq machine precision

16: $maxcal$ – IntegerInput

On entry: the maximum number of spline evaluations to be used in finding the value of

ρ

Suggested value:

maxcal = 30

Constraint:

maxcal \geq 3

17: $fail$ – NagError *Input/Output

The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6

Error Indicators and Warnings

NE_2_REAL_ARG_LE: On entry, $u = 〈value〉$ while $tol = 〈value〉$ . These arguments must satisfy $u > tol$ .
NE_ALLOC_FAIL: Dynamic memory allocation failed.
NE_BAD_PARAM: On entry, argument method had an illegal value.
NE_G10AC_ACC: A solution to the accuracy given by tol has not been achieved in maxcal iterations. Try increasing the value of tol and/or maxcal.
NE_G10AC_CG_RHO: $method = Nag_SmoothParamCV$ or $Nag_SmoothParamGCV$ and the optimal value of $rho > u$ . Try a larger value of u.
NE_G10AC_DF_RHO: $method = Nag_SmoothParamDF$ and the required value of rho for specified degrees of freedom $> u$ . Try a larger value of u.
NE_G10AC_DF_TOL: $method = Nag_SmoothParamDF$ and the accuracy given by tol cannot be achieved. Try increasing the value of tol.
NE_INT_ARG_LT: On entry, $maxcal = 〈value〉$ .
Constraint: $maxcal \geq 3$ .

On entry, $n = 〈value〉$ .
Constraint: $n \geq 3$ .
NE_INTERNAL_ERROR: An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_NOT_STRICTLY_INCREASING: The sequence x is not strictly increasing: $x [〈value〉] = 〈value〉$ , $x [〈value〉] = 〈value〉$ .
NE_REAL: On entry, $crit = 〈value〉$ .
Constraint: $crit > 2.0$ , if $method = Nag_SmoothParamDF$ .
NE_REAL_ARRAY_CONS: On entry, $weights [〈value〉] = 〈value〉$ .
Constraint: $weights [i] > 0$ , for $i = 0, 1, \dots, n - 1$ .
NE_REAL_INT_ARG_CONS: On entry, $crit = 〈value〉$ and $n = 〈value〉$ . These arguments must satisfy $crit \leq n$ , if $method = Nag_SmoothParamDF$ .
NE_REAL_MACH_PREC: On entry, $tol = 〈value〉$ , $machine precision (nag_machine_precision) = 〈value〉$ .
Constraint: $tol \geq machine precision$ .

7

Accuracy

When minimizing the cross-validation or generalized cross-validation, the error in the estimate of

ρ

should be within

\pm 3 \times (tol \times rho + tol)

. When finding

ρ

for a fixed number of degrees of freedom the error in the estimate of

ρ

should be within

\pm 2 \times tol \times \max (1, rho)

Given the value of

ρ

, the accuracy of the fitted spline depends on the value of

ρ

and the position of the

x

values. The values of

x_{i} - x_{i - 1}

and

w_{i}

are scaled and

ρ

is transformed to avoid underflow and overflow problems.

8

Parallelism and Performance

nag_smooth_spline_estim (g10acc) is not threaded in any implementation.

9

Further Comments

The time to fit the spline for a given value of

ρ

is of order

n

When finding the value of

ρ

that gives the required degrees of freedom, the algorithm examines the interval 0.0 to u. For small degrees of freedom the value of

ρ

can be large, as in the theoretical case of two degrees of freedom when the spline reduces to a straight line and

ρ

is infinite. If the CV or GCV is to be minimized then the algorithm searches for the minimum value in the interval 0.0 to u. If the function is decreasing in that range then the boundary value of u will be returned. In either case, the larger the value of u the more likely is the interval to contain the required solution, but the process will be less efficient.

Regression splines with a small

(< n)

number of knots can be fitted by nag_1d_spline_fit_knots (e02bac) and nag_1d_spline_fit (e02bec).

10

Example

The data, given by Hastie and Tibshirani (1990), is the age,

x_{i}

, and C-peptide concentration (pmol/ml),

y_{i}

, from a study of the factors affecting insulin-dependent diabetes mellitus in children. The data is input, reduced to a strictly ordered set by nag_order_data (g10zac) and a spline with 5 degrees of freedom is fitted by nag_smooth_spline_estim (g10acc). The fitted values and residuals are printed.

NAG Library Function Document

nag_smooth_spline_estim (g10acc)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

1

Purpose

2

Specification

3

Description

4

References

5

Arguments

6

Error Indicators and Warnings

7

Accuracy

8

Parallelism and Performance

9

Further Comments

10

Example

10.1

Program Text

10.2

Program Data

10.3

Program Results