When selecting a linear regression model it is sometimes useful to drop independent variables from the model and to examine the resulting sub-model. nag_regsn_mult_linear_delete_var (g02dfc) updates the

Q R

decomposition used in the computation of the linear regression model. The

Q R

decomposition may come from nag_regsn_mult_linear (g02dac), nag_regsn_mult_linear_addrem_obs (g02dcc), nag_regsn_mult_linear_add_var (g02dec) or a previous call to nag_regsn_mult_linear_delete_var (g02dfc).

For the general linear regression model with

p

independent variables fitted, nag_regsn_mult_linear (g02dac) or nag_regsn_mult_linear_add_var (g02dec) computes a

Q R

decomposition of the (weighted) independent variables and forms an upper triangular matrix

R

and a vector

c

. To remove an independent variable

R

and

c

have to be updated. The column of

R

corresponding to the variable to be dropped is removed and the matrix is then restored to upper triangular form by applying a series of Givens rotations. The rotations are then applied to

c

. Note that only the first

p

elements of

c

are affected.

The method used means that while the updated values of

R

and

c

are computed an updated value of

Q

from the

Q R

decomposition is not available so a call to nag_regsn_mult_linear_add_var (g02dec) cannot be made after a call to nag_regsn_mult_linear_delete_var (g02dfc).

nag_regsn_mult_linear_upd_model (g02ddc) can be used to calculate the parameter estimates,

\hat{β}

, from the information provided by nag_regsn_mult_linear_delete_var (g02dfc).

4

References

Golub G H and Van Loan C F (1996) Matrix Computations (3rd Edition) Johns Hopkins University Press, Baltimore

Hammarling S (1985) The singular value decomposition in multivariate statistics SIGNUM Newsl. 20(3) 2–25

5

Arguments

1: $ip$ – IntegerInput: On entry: the number of independent variables already in the model, $p$ .

Constraint: $ip \geq 1$ .
2: $q [ip \times tdq]$ – doubleInput/Output: Note: the $(i, j)$ th element of the matrix $Q$ is stored in $q [(i - 1) \times tdq + j - 1]$ .

On entry: the results of the $Q R$ decomposition as returned by nag_regsn_mult_linear (g02dac), nag_regsn_mult_linear_addrem_obs (g02dcc), nag_regsn_mult_linear_add_var (g02dec) or previous calls to nag_regsn_mult_linear_delete_var (g02dfc).

On exit: the updated $Q R$ decomposition. The first ip elements of the first column of q contain the updated value of $c$ , the upper triangular part of columns 2 to ip contain the updated $R$ matrix.
3: $tdq$ – IntegerInput: On entry: the stride separating matrix column elements in the array q.

Constraint: $tdq \geq ip + 1$ .
4: $indx$ – IntegerInput: On entry: indicates which independent variable is to be deleted from the model.

Constraint: $1 \leq indx \leq ip$ .
5: $rss$ – double *Input/Output: On entry: the residual sum of squares for the full regression.

Constraint: $rss \geq 0.0$ .

On exit: the residual sum of squares with the (indx)th variable removed. Note that the residual sum of squares will only be valid if the regression is of full rank.
6: $fail$ – NagError *Input/Output: The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6

Error Indicators and Warnings

NE_2_INT_ARG_GT: On entry, $indx = 〈value〉$ while $ip = 〈value〉$ . These arguments must satisfy $indx \leq ip$ .
NE_2_INT_ARG_LT: On entry, $tdq = 〈value〉$ while $ip + 1 = 〈value〉$ . These arguments must satisfy $tdq \geq ip + 1$ .
NE_ALLOC_FAIL: Dynamic memory allocation failed.
NE_DIAG_ELEM_ZERO: On entry, a diagonal element, $〈value〉$ , of $R$ is zero.
NE_INT_ARG_LT: On entry, $indx = 〈value〉$ .
Constraint: $indx \geq 1$ .

On entry, $ip = 〈value〉$ .
Constraint: $ip \geq 1$ .
NE_REAL_ARG_LT: On entry, rss must not be less than 0.0: $rss = 〈value〉$ .

7

Accuracy

There will inevitably be some loss in accuracy in fitting a model by dropping terms from a more complex model rather than fitting it afresh using nag_regsn_mult_linear (g02dac).

8

Parallelism and Performance

nag_regsn_mult_linear_delete_var (g02dfc) is not threaded in any implementation.

9

Further Comments

None.

10

Example

A dataset consisting of 12 observations on four independent variables and one dependent variable is read in. The full model, including a mean term, is fitted using nag_regsn_mult_linear (g02dac). The value of indx is read in and that variable dropped from the regression. The parameter estimates are calculated by nag_regsn_mult_linear_upd_model (g02ddc) and printed. This process is repeated until indx is 0.

NAG Library Function Document

nag_regsn_mult_linear_delete_var (g02dfc)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

1

Purpose

2

Specification

3

Description