NAG Library Function Document
nag_step_regsn (g02eec)
1
Purpose
nag_step_regsn (g02eec) carries out one step of a forward selection procedure in order to enable the ‘best’ linear regression model to be found.
2
Specification
#include <nag.h> |
#include <nagg02.h> |
void |
nag_step_regsn (Nag_OrderType order,
Integer *istep,
Nag_IncludeMean mean,
Integer n,
Integer m,
const double x[],
Integer pdx,
const char *var_names[],
const Integer sx[],
Integer maxip,
const double y[],
const double wt[],
double fin,
Nag_Boolean *addvar,
const char *newvar[],
double *chrss,
double *f,
const char *model[],
Integer *nterm,
double *rss,
Integer *idf,
Integer *ifr,
const char *free_vars[],
double exss[],
double q[],
Integer pdq,
double p[],
NagError *fail) |
|
3
Description
One method of selecting a linear regression model from a given set of independent variables is by forward selection. The following procedure is used:
(i) |
Select the best fitting independent variable, i.e., the independent variable which gives the smallest residual sum of squares. If the -test for this variable is greater than a chosen critical value, , then include the variable in the model, else stop. |
(ii) |
Find the independent variable that leads to the greatest reduction in the residual sum of squares when added to the current model. |
(iii) |
If the -test for this variable is greater than a chosen critical value, , then include the variable in the model and go to (ii), otherwise stop. |
At any step the variables not in the model are known as the free terms.
nag_step_regsn (g02eec) allows you to specify some independent variables that must be in the model, these are known as forced variables.
The computational procedure involves the use of decompositions, the and the matrices being updated as each new variable is added to the model. In addition the matrix , where is the matrix of variables not included in the model, is updated.
nag_step_regsn (g02eec) computes one step of the forward selection procedure at a call. The results produced at each step may be printed or used as inputs to
nag_regsn_mult_linear_upd_model (g02ddc), in order to compute the regression coefficients for the model fitted at that step. Repeated calls to
nag_step_regsn (g02eec) should be made until
is indicated.
4
References
Draper N R and Smith H (1985) Applied Regression Analysis (2nd Edition) Wiley
Weisberg S (1985) Applied Linear Regression Wiley
5
Arguments
Note: after the initial call to
nag_step_regsn (g02eec) with
all arguments except
fin must not be changed by you between calls.
- 1:
– Nag_OrderTypeInput
-
On entry: the
order argument specifies the two-dimensional storage scheme being used, i.e., row-major ordering or column-major ordering. C language defined storage is specified by
. See
Section 3.3.1.3 in How to Use the NAG Library and its Documentation for a more detailed explanation of the use of this argument.
Constraint:
or .
- 2:
– Integer *Input/Output
-
On entry: indicates which step in the forward selection process is to be carried out.
- The process is initialized.
Constraint:
.
On exit: is incremented by .
- 3:
– Nag_IncludeMeanInput
-
On entry: indicates if a mean term is to be included.
- A mean term, intercept, will be included in the model.
- The model will pass through the origin, zero-point.
Constraint:
or .
- 4:
– IntegerInput
-
On entry: , the number of observations.
Constraint:
.
- 5:
– IntegerInput
-
On entry: , the total number of independent variables in the dataset.
Constraint:
.
- 6:
– const doubleInput
-
Note: the dimension,
dim, of the array
x
must be at least
- when ;
- when .
Where
appears in this document, it refers to the array element
- when ;
- when .
On entry: must contain the th observation for the th independent variable, for and .
- 7:
– IntegerInput
-
On entry: the stride separating row or column elements (depending on the value of
order) in the array
x.
Constraints:
- if ,
;
- if , .
- 8:
– const char *Input
-
On entry:
must contain the name of the independent variable in row
of
x, for
.
- 9:
– const IntegerInput
-
On entry: indicates which independent variables could be considered for inclusion in the regression.
- The variable contained in the
th column of x is automatically included in the regression model, for .
- The variable contained in the
th column of x is considered for inclusion in the regression model, for .
- The variable in the
th column is not considered for inclusion in the model, for .
Constraint:
and at least one value of , for .
- 10:
– IntegerInput
-
On entry: the maximum number of independent variables to be included in the model.
Constraints:
- if , number of values of ;
- if , number of values of .
- 11:
– const doubleInput
-
On entry: the dependent variable.
- 12:
– const doubleInput
-
Note: the dimension,
dim, of the array
wt
must be at least
.
On entry:
,
wt must contain the weights to be used in the weighted regression.
If , the th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights.
If weights are not provided then
wt must be set to the null pointer, i.e.,
(double *)0, and the effective number of observations is
n.
Constraint:
if , , for .
- 13:
– doubleInput
-
On entry: the critical value of the statistic for the term to be included in the model, .
Suggested value:
is a commonly used value in exploratory modelling.
Constraint:
.
- 14:
– Nag_Boolean *Output
-
On exit: indicates if a variable has been added to the model.
- A variable has been added to the model.
- No variable had an value greater than and none were added to the model.
- 15:
– const char *Output
-
On exit: if
,
newvar contains the name of the variable added to the model.
-
On exit: if
,
chrss contains the change in the residual sum of squares due to adding variable
newvar.
- 17:
– double *Output
-
On exit: if
,
f contains the
statistic for the inclusion of the variable in
newvar.
- 18:
– const char *Input/Output
-
On entry: if
,
model need not be set.
If
,
model must contain the values returned by the previous call to
nag_step_regsn (g02eec).
On exit: the names of the variables in the current model.
- 19:
– Integer *Input/Output
-
On entry: if
,
nterm need not be set.
If
,
nterm must contain the value returned by the previous call to
nag_step_regsn (g02eec).
On exit: the number of independent variables in the current model, not including the mean, if any.
-
On entry: if
,
rss need not be set.
If
,
rss must contain the value returned by the previous call to
nag_step_regsn (g02eec).
On exit: the residual sums of squares for the current model.
- 21:
– Integer *Input/Output
-
On entry: if
,
idf need not be set.
If
,
idf must contain the value returned by the previous call to
nag_step_regsn (g02eec).
On exit: the degrees of freedom for the residual sum of squares for the current model.
- 22:
– Integer *Input/Output
-
On entry: if
,
ifr need not be set.
If
,
ifr must contain the value returned by the previous call to
nag_step_regsn (g02eec).
On exit: the number of free independent variables, i.e., the number of variables not in the model that are still being considered for selection.
- 23:
– const char *Input/Output
-
On entry: if
,
free_vars need not be set.
If
,
free_vars must contain the values returned by the previous call to
nag_step_regsn (g02eec).
On exit: the first
ifr values of
free_vars contain the names of the free variables.
- 24:
– doubleOutput
-
On exit: the first
ifr values of
exss contain what would be the change in regression sum of squares if the free variables had been added to the model, i.e., the extra sum of squares for the free variables.
contains what would be the change in regression sum of squares if the variable
had been added to the model.
- 25:
– doubleInput/Output
-
Note: the dimension,
dim, of the array
q
must be at least
- when ;
- when .
The
th element of the matrix
is stored in
- when ;
- when .
On entry: if
,
q need not be set.
If
,
q must contain the values returned by the previous call to
nag_step_regsn (g02eec).
On exit: the results of the
decomposition for the current model:
- the first column of q contains (or where is the vector of weights if used);
- the upper triangular part of columns to contain the matrix;
- the strictly lower triangular part of columns to contain details of the matrix;
- the remaining to columns of contain (or ),
where
, or
if
.
- 26:
– IntegerInput
-
On entry: the stride separating row or column elements (depending on the value of
order) in the array
q.
Constraints:
- if ,
;
- if , .
- 27:
– doubleInput/Output
-
On entry: if
,
p need not be set.
If
,
p must contain the values returned by the previous call to
nag_step_regsn (g02eec).
On exit: the first
elements of
p contain details of the
decomposition, where
, or
if
.
- 28:
– NagError *Input/Output
-
The NAG error argument (see
Section 3.7 in How to Use the NAG Library and its Documentation).
6
Error Indicators and Warnings
- NE_ALLOC_FAIL
-
Dynamic memory allocation failed.
See
Section 2.3.1.2 in How to Use the NAG Library and its Documentation for further information.
- NE_BAD_PARAM
-
On entry, argument had an illegal value.
- NE_DENOM_ZERO
-
The value of the change in the sum of squares is greater than the input value of
rss. This may occur due to rounding errors if the true residual sum of squares for the new model is small relative to the residual sum of squares for the previous model.
- NE_FREE_VARS
-
There are no free variables, i.e., no element of .
- NE_FULL_RANK
-
On entry, the variables forced into the model are not of full rank, i.e., some of these variables are linear combinations of others.
- NE_INT
-
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
- NE_INT_2
-
On entry, and .
Constraint: if , .
On entry, and .
Constraint: .
On entry, and .
Constraint: .
On entry, and .
Constraint: .
- NE_INT_ARRAY
-
On entry,
.
Constraint:
maxip must be large enough to accommodate the number of terms given by
sx.
- NE_INT_ARRAY_ELEM_CONS
-
On entry, .
Constraint: , for .
- NE_INTERNAL_ERROR
-
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact
NAG for assistance.
See
Section 2.7.6 in How to Use the NAG Library and its Documentation for further information.
- NE_NO_LICENCE
-
Your licence key may have expired or may not have been installed correctly.
See
Section 2.7.5 in How to Use the NAG Library and its Documentation for further information.
- NE_REAL
-
On entry, .
Constraint: .
On entry, .
Constraint: .
- NE_REAL_ARRAY_ELEM_CONS
-
On entry, .
Constraint: , for .
- NE_ZERO_DF
-
Degrees of freedom for error will equal if new variable is added, i.e., the number of variables in the model plus is equal to the effective number of observations.
On entry, number of forced variables .
- NE_ZERO_VARS
-
On entry,
, for all
.
Constraint: at least one value of
sx must be nonzero.
7
Accuracy
As nag_step_regsn (g02eec) uses a transformation the results will often be more accurate than traditional algorithms using methods based on the cross-products of the dependent and independent variables.
8
Parallelism and Performance
nag_step_regsn (g02eec) is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
nag_step_regsn (g02eec) makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the
x06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this function. Please also consult the
Users' Note for your implementation for any additional implementation-specific information.
None.
10
Example
The data, from an oxygen uptake experiment, is given by
Weisberg (1985). The names of the variables are as given in
Weisberg (1985). The independent and dependent variables are read and
nag_step_regsn (g02eec) is repeatedly called until
. At each step the
statistic, the free variables and their extra sum of squares are printed; also, except for when
, the new variable, the change in the residual sum of squares and the terms in the model are printed.
10.1
Program Text
Program Text (g02eece.c)
10.2
Program Data
Program Data (g02eece.d)
10.3
Program Results
Program Results (g02eece.r)