NAG Library Function Document

1Purpose

nag_regsn_mult_linear_newyvar (g02dgc) calculates the estimates of the arguments of a general linear regression model for a new dependent variable after a call to nag_regsn_mult_linear (g02dac).

2Specification

 #include #include
 void nag_regsn_mult_linear_newyvar (Integer n, const double wt[], double *rss, Integer ip, Integer rank, double cov[], double q[], Integer tdq, Nag_Boolean svd, const double p[], const double y[], double b[], double se[], double res[], const double com_ar[], NagError *fail)

3Description

nag_regsn_mult_linear_newyvar (g02dgc) uses the results given by nag_regsn_mult_linear (g02dac) to fit the same set of independent variables to a new dependent variable.
nag_regsn_mult_linear (g02dac) computes a $QR$ decomposition of the matrix of $p$ independent variables and also, if the model is not of full rank, a singular value decomposition (SVD). These results can be used to compute estimates of the arguments for a general linear model with a new dependent variable. The $QR$ decomposition leads to the formation of an upper triangular $p$ by $p$ matrix $R$ and an $n$ by $n$ orthogonal matrix $Q$. In addition the vector $c={Q}^{\mathrm{T}}y$ (or ${Q}^{\mathrm{T}}{W}^{1/2}y$) is computed. For a new dependent variable, ${y}_{\mathrm{new}}$, nag_regsn_mult_linear_newyvar (g02dgc) computes a new value of $c={Q}^{\mathrm{T}}{y}_{\mathrm{new}}$ or ${Q}^{\mathrm{T}}{W}^{1/2}{y}_{\mathrm{new}}$.
If $R$ is of full rank, then the least squares parameter estimates, $\stackrel{^}{\beta }$, are the solution to: $R\stackrel{^}{\beta }={c}_{1}$, where ${c}_{1}$ is the first $p$ elements of $c$.
If $R$ is not of full rank, then nag_regsn_mult_linear (g02dac) will have computed the SVD of $R$,
 $R = Q * D 0 0 0 PT$
where $D$ is a $k$ by $k$ diagonal matrix with nonzero diagonal elements, $k$ being the rank of $R$, and ${Q}_{*}$ and $P$ are $p$ by $p$ orthogonal matrices. This gives the solution
 $β ^ = P 1 D -1 Q * 1 T c 1$
${P}_{1}$ being the first $k$ columns of $P$, i.e., $P=\left({P}_{1}{P}_{0}\right)$ and ${Q}_{{*}_{1}}$ being the first $k$ columns of ${Q}_{*}$. Details of the SVD are made available by nag_regsn_mult_linear (g02dac) in the form of the matrix ${P}^{*}$:
 $P * = D -1 P1T P0T .$
The matrix ${Q}_{*}$ is made available through the com_ar argument of nag_regsn_mult_linear (g02dac).
In addition to parameter estimates, the new residuals are computed and the variance-covariance matrix of the parameter estimates are found by scaling the variance-covariance matrix for the original regression.
Golub G H and Van Loan C F (1996) Matrix Computations (3rd Edition) Johns Hopkins University Press, Baltimore
Hammarling S (1985) The singular value decomposition in multivariate statistics SIGNUM Newsl. 20(3) 2–25
Searle S R (1971) Linear Models Wiley

5Arguments

1:    $\mathbf{n}$IntegerInput
On entry: the number of observations, $n$.
Constraint: ${\mathbf{n}}\ge 2$.
2:    $\mathbf{wt}\left[{\mathbf{n}}\right]$const doubleInput
On entry: optionally, the weights to be used in the weighted regression.
If ${\mathbf{wt}}\left[i-1\right]=0.0$, then the $i$th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights. The values of res and h will be set to zero for observations with zero weights (see nag_regsn_mult_linear (g02dac)).
If weights are not provided then wt must be set to NULL and the effective number of observations is n.
Constraint: if ${\mathbf{wt}}\phantom{\rule{0.25em}{0ex}}\text{is not}\phantom{\rule{0.25em}{0ex}}\mathbf{NULL}$, ${\mathbf{wt}}\left[\mathit{i}-1\right]=0.0$, for $\mathit{i}=1,2,\dots ,n$.
3:    $\mathbf{rss}$double *Input/Output
On entry: the residual sum of squares for the original dependent variable.
On exit: the residual sum of squares for the new dependent variable.
4:    $\mathbf{ip}$IntegerInput
On entry: the number $p$ of independent variables in the model (including the mean if fitted).
Constraint: $1\le {\mathbf{ip}}\le {\mathbf{n}}$.
5:    $\mathbf{rank}$IntegerInput
On entry: the rank of the independent variables, as given by nag_regsn_mult_linear (g02dac).
Constraint: ${\mathbf{rank}}>0$ and if ${\mathbf{svd}}=\mathrm{Nag_FALSE}$, ${\mathbf{rank}}={\mathbf{ip}}$ otherwise ${\mathbf{rank}}\le {\mathbf{ip}}$.
6:    $\mathbf{cov}\left[{\mathbf{ip}}×\left({\mathbf{ip}}+1\right)/2\right]$doubleInput/Output
On entry: the covariance matrix of the parameter estimates as given by nag_regsn_mult_linear (g02dac).
On exit: the upper triangular part of the variance-covariance matrix of the ip parameter estimates given in b. They are stored packed by column, i.e., the covariance between the parameter estimate given in ${\mathbf{b}}\left[i\right]$ and the parameter estimate given in ${\mathbf{b}}\left[j\right]$, $j\ge i$, is stored in ${\mathbf{cov}}\left[j\left(j+1\right)/2+i\right]$ for $i=0,1,\dots ,{\mathbf{ip}}-1$ and $j=i,i+1,\dots ,{\mathbf{ip}}-1$.
7:    $\mathbf{q}\left[{\mathbf{n}}×{\mathbf{tdq}}\right]$doubleInput/Output
Note: the $\left(i,j\right)$th element of the matrix $Q$ is stored in ${\mathbf{q}}\left[\left(i-1\right)×{\mathbf{tdq}}+j-1\right]$.
On entry: the results of the $QR$ decomposition as returned by nag_regsn_mult_linear (g02dac).
On exit: the first column of q contains the new values of $c$, the remainder of q will be unchanged.
8:    $\mathbf{tdq}$IntegerInput
On entry: the stride separating matrix column elements in the array q.
Constraint: ${\mathbf{tdq}}\ge {\mathbf{ip}}+1$.
9:    $\mathbf{svd}$Nag_BooleanInput
On entry: indicates if a singular value decomposition was used by nag_regsn_mult_linear (g02dac).
${\mathbf{svd}}=\mathrm{Nag_TRUE}$
A singular value decomposition was used by nag_regsn_mult_linear (g02dac).
${\mathbf{svd}}=\mathrm{Nag_FALSE}$
A singular value decomposition was not used by nag_regsn_mult_linear (g02dac).
10:  $\mathbf{p}\left[2×{\mathbf{ip}}+{\mathbf{ip}}×{\mathbf{ip}}\right]$const doubleInput
On entry: details of the $QR$ decomposition and SVD, if used, as returned in array p by nag_regsn_mult_linear (g02dac).
If ${\mathbf{svd}}=\mathrm{Nag_FALSE}$, only the first ip elements of p are used, these will contain details of the Householder vector in the $QR$ decomposition (Sections 2.2.1 and 3.3.6 in the f08 Chapter Introduction).
If ${\mathbf{svd}}=\mathrm{Nag_TRUE}$, the first ip elements of p will contain details of the Householder vector in the $QR$ decomposition (Sections 2.2.1 and 3.3.6 in the f08 Chapter Introduction) and the next ip elements of p contain singular values. The following ip by ip elements contain the matrix ${P}^{*}$ stored by rows.
11:  $\mathbf{y}\left[{\mathbf{n}}\right]$const doubleInput
On entry: the new dependent variable ${y}_{\mathrm{new}}$.
12:  $\mathbf{b}\left[{\mathbf{ip}}\right]$doubleOutput
On exit: ${\mathbf{b}}\left[i\right]$, $i=0,1,\dots ,{\mathbf{ip}}-1$ contain the least squares estimates of the arguments of the regression model, $\stackrel{^}{\beta }$.
13:  $\mathbf{se}\left[{\mathbf{ip}}\right]$doubleOutput
On exit: ${\mathbf{se}}\left[i\right]$, $i=0,1,\dots ,{\mathbf{ip}}-1$ contain the standard errors of the ip parameter estimates given in b.
14:  $\mathbf{res}\left[{\mathbf{n}}\right]$doubleOutput
On exit: the residuals for the new regression model.
15:  $\mathbf{com_ar}\left[{\mathbf{ip}}+{\mathbf{ip}}×{\mathbf{ip}}\right]$const doubleInput
On entry: if ${\mathbf{svd}}=\mathrm{Nag_TRUE}$, com_ar must be unaltered from the previous call to nag_regsn_mult_linear (g02dac).
16:  $\mathbf{fail}$NagError *Input/Output
The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6Error Indicators and Warnings

NE_2_INT_ARG_LT
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$ while ${\mathbf{ip}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{n}}\ge {\mathbf{ip}}$.
On entry, ${\mathbf{tdq}}=〈\mathit{\text{value}}〉$ while ${\mathbf{ip}}+1=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{tdq}}\ge {\mathbf{ip}}+1$.
NE_INT_ARG_LE
On entry, ${\mathbf{rank}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{rank}}>0$.
NE_INT_ARG_LT
On entry, ${\mathbf{ip}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ip}}\ge 1$.
NE_REAL_ARG_LE
On entry, rss must not be less than or equal to 0.0: ${\mathbf{rss}}=〈\mathit{\text{value}}〉$.
NE_REAL_ARG_LT
On entry, ${\mathbf{wt}}\left[〈\mathit{\text{value}}〉\right]$ must not be less than 0.0: ${\mathbf{wt}}\left[〈\mathit{\text{value}}〉\right]=〈\mathit{\text{value}}〉$.
NE_SVD_RANK_GT_IP
On entry, the Boolean variable, svd, is Nag_TRUE and rank must not be greater than ip: rank = $〈\mathit{\text{value}}〉$, ${\mathbf{ip}}=〈\mathit{\text{value}}〉$.
NE_SVD_RANK_NE_IP
On entry, the Boolean variable, svd, is Nag_FALSE and rank must be equal to ip: ${\mathbf{rank}}=〈\mathit{\text{value}}〉$, ${\mathbf{ip}}=〈\mathit{\text{value}}〉$.

7Accuracy

The same accuracy as nag_regsn_mult_linear (g02dac) is obtained.

8Parallelism and Performance

nag_regsn_mult_linear_newyvar (g02dgc) is not threaded in any implementation.

The values of the leverages, ${h}_{i}$, are unaltered by a change in the dependent variable so a call to nag_regsn_std_resid_influence (g02fac) can be made using the value of h from nag_regsn_mult_linear (g02dac).

9.1Internal Changes

Internal changes have been made to this function as follows:
• At Mark 26.1: The documented minimum length of the array argument com_ar was too large. The documented minimum length was given as ${\mathbf{ip}}×{\mathbf{ip}}×5×\left({\mathbf{ip}}-1\right)$ but the actual minimum length is ${\mathbf{ip}}×{\mathbf{ip}}+{\mathbf{ip}}$ which is much less for non-trivial cases, ${\mathbf{ip}}>1$.
In addition, provided example programs that called nag_regsn_mult_linear_newyvar (g02dgc) allocated lengths of ${\mathbf{ip}}×{\mathbf{ip}}+5×\left({\mathbf{ip}}-1\right)$ for the array argument, which was also larger than necessary for non-trivial problems.
The nag_regsn_mult_linear_newyvar (g02dgc) routine document has been updated to document the actual minimum length requirement for com_ar, and those example programs that call nag_regsn_mult_linear_newyvar (g02dgc) have been updated to allocate the actual minimum length required for com_ar.
For details of all known issues which have been reported for the NAG Library please refer to the Known Issues list.

10Example

A dataset consisting of 12 observations with four independent variables and two dependent variables is read in. A model with all four independent variables is fitted to the first dependent variable by nag_regsn_mult_linear (g02dac) and the results printed. The model is then fitted to the second dependent variable by nag_regsn_mult_linear_newyvar (g02dgc) and those results printed.

10.1Program Text

Program Text (g02dgce.c)

10.2Program Data

Program Data (g02dgce.d)

10.3Program Results

Program Results (g02dgce.r)

© The Numerical Algorithms Group Ltd, Oxford, UK. 2017