NAG Library Function Document

1Purpose

nag_cp_stat (g02ecc) calculates ${R}^{2}$ and ${C}_{p}$-values from the residual sums of squares for a series of linear regression models.

2Specification

 #include #include
 void nag_cp_stat (Nag_IncludeMean mean, Integer n, double sigsq, double tss, Integer nmod, const Integer nterms[], const double rss[], double rsq[], double cp[], NagError *fail)

3Description

When selecting a linear regression model for a set of $n$ observations a balance has to be found between the number of independent variables in the model and fit as measured by the residual sum of squares. The more variables included the smaller will be the residual sum of squares. Two statistics can help in selecting the best model.
(a) ${R}^{2}$ represents the proportion of variation in the dependent variable that is explained by the independent variables.
 $R2=Regression Sum of SquaresTotal Sum of Squares,$
 where $\text{Total Sum of Squares}={\mathbf{tss}}=\sum {\left(y-\stackrel{-}{y}\right)}^{2}$ (if mean is fitted, otherwise ${\mathbf{tss}}=\sum {y}^{2}$) and $\text{Regression Sum of Squares}=\text{RegSS}={\mathbf{tss}}-{\mathbf{rss}}$, where ${\mathbf{rss}}=\text{residual sum of squares}=\sum {\left(y-\stackrel{^}{y}\right)}^{2}$.
The ${R}^{2}$-values can be examined to find a model with a high ${R}^{2}$-value but with small number of independent variables.
(b) ${C}_{p}$ statistic.
 $Cp=rssσ^2 -n-2p,$
where $p$ is the number of arguments (including the mean) in the model and ${\stackrel{^}{\sigma }}^{2}$ is an estimate of the true variance of the errors. This can often be obtained from fitting the full model.
A well fitting model will have ${C}_{p}\simeq p$. ${C}_{p}$ is often plotted against $p$ to see which models are closest to the ${C}_{p}=p$ line.
nag_cp_stat (g02ecc) may be called after nag_all_regsn (g02eac) which calculates the residual sums of squares for all possible linear regression models.
Draper N R and Smith H (1985) Applied Regression Analysis (2nd Edition) Wiley
Weisberg S (1985) Applied Linear Regression Wiley

5Arguments

1:    $\mathbf{mean}$Nag_IncludeMeanInput
On entry: indicates if a mean term is to be included.
${\mathbf{mean}}=\mathrm{Nag_MeanInclude}$
A mean term, intercept, will be included in the model.
${\mathbf{mean}}=\mathrm{Nag_MeanZero}$
The model will pass through the origin, zero-point.
Constraint: ${\mathbf{mean}}=\mathrm{Nag_MeanInclude}$ or $\mathrm{Nag_MeanZero}$.
2:    $\mathbf{n}$IntegerInput
On entry: $n$, the number of observations used in the regression model.
Constraint: ${\mathbf{n}}$ must be greater than $2×{p}_{\mathrm{max}}$, where ${p}_{\mathrm{max}}$ is the largest number of independent variables fitted (including the mean if fitted).
3:    $\mathbf{sigsq}$doubleInput
On entry: the best estimate of true variance of the errors, ${\stackrel{^}{\sigma }}^{2}$.
Constraint: ${\mathbf{sigsq}}>0.0$.
4:    $\mathbf{tss}$doubleInput
On entry: the total sum of squares for the regression model.
Constraint: ${\mathbf{tss}}>0.0$.
5:    $\mathbf{nmod}$IntegerInput
On entry: the number of regression models.
Constraint: ${\mathbf{nmod}}>0$.
6:    $\mathbf{nterms}\left[{\mathbf{nmod}}\right]$const IntegerInput
On entry: ${\mathbf{nterms}}\left[\mathit{i}-1\right]$ must contain the number of independent variables (not counting the mean) fitted to the $\mathit{i}$th model, for $\mathit{i}=1,2,\dots ,{\mathbf{nmod}}$.
7:    $\mathbf{rss}\left[{\mathbf{nmod}}\right]$const doubleInput
On entry: ${\mathbf{rss}}\left[i-1\right]$ must contain the residual sum of squares for the $i$th model.
Constraint: ${\mathbf{rss}}\left[\mathit{i}-1\right]\le {\mathbf{tss}}$, for $\mathit{i}=1,2,\dots ,{\mathbf{nmod}}$.
8:    $\mathbf{rsq}\left[{\mathbf{nmod}}\right]$doubleOutput
On exit: ${\mathbf{rsq}}\left[\mathit{i}-1\right]$ contains the ${R}^{2}$-value for the $\mathit{i}$th model, for $\mathit{i}=1,2,\dots ,{\mathbf{nmod}}$.
9:    $\mathbf{cp}\left[{\mathbf{nmod}}\right]$doubleOutput
On exit: ${\mathbf{cp}}\left[\mathit{i}-1\right]$ contains the ${C}_{p}$-value for the $\mathit{i}$th model, for $\mathit{i}=1,2,\dots ,{\mathbf{nmod}}$.
10:  $\mathbf{fail}$NagError *Input/Output
The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
See Section 2.3.1.2 in How to Use the NAG Library and its Documentation for further information.
On entry, argument $〈\mathit{\text{value}}〉$ had an illegal value.
NE_INT
On entry, ${\mathbf{nmod}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{nmod}}>0$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 2.7.6 in How to Use the NAG Library and its Documentation for further information.
NE_MODEL_PARAMETERS
On entry: the number of parameters, $p$, is $〈\mathit{\text{value}}〉$ and ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n}}\ge 2p$.
NE_NO_LICENCE
Your licence key may have expired or may not have been installed correctly.
See Section 2.7.5 in How to Use the NAG Library and its Documentation for further information.
NE_REAL
On entry, ${\mathbf{sigsq}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{sigsq}}>0.0$.
On entry, ${\mathbf{tss}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{tss}}>0.0$.
NE_REAL_ARRAY_ELEM_CONS
A value of ${C}_{p}$ is less than $0.0$. This may occur if sigsq is too large or if rss, n or IP are incorrect.
On entry, ${\mathbf{rss}}\left[〈\mathit{\text{value}}〉\right]=〈\mathit{\text{value}}〉$ and ${\mathbf{tss}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{rss}}\left[i\right]\le {\mathbf{tss}}$, for all $i$.

7Accuracy

Accuracy is sufficient for all practical purposes.

8Parallelism and Performance

nag_cp_stat (g02ecc) is not threaded in any implementation.

None.

10Example

The data, from an oxygen uptake experiment, is given by Weisberg (1985). The independent and dependent variables are read and the residual sums of squares for all possible models computed using nag_all_regsn (g02eac). The values of ${R}^{2}$ and ${C}_{p}$ are then computed and printed along with the names of variables in the models.

10.1Program Text

Program Text (g02ecce.c)

10.2Program Data

Program Data (g02ecce.d)

10.3Program Results

Program Results (g02ecce.r)

© The Numerical Algorithms Group Ltd, Oxford, UK. 2017