NAG Library Function Document

nag_hier_mixed_init (g02jcc)

 Contents

    1  Purpose
    7  Accuracy
    10  Example

1
Purpose

nag_hier_mixed_init (g02jcc) preprocesses a dataset prior to fitting a linear mixed effects regression model of the following form via either nag_reml_hier_mixed_regsn (g02jdc) or nag_ml_hier_mixed_regsn (g02jec).

2
Specification

#include <nag.h>
#include <nagg02.h>
void  nag_hier_mixed_init (Nag_OrderType order, Integer n, Integer ncol, const double dat[], Integer pddat, const Integer levels[], const double y[], const double wt[], const Integer fixed[], Integer lfixed, Integer nrndm, const Integer rndm[], Integer lrndm, Integer *nff, Integer *nlsv, Integer *nrf, double rcomm[], Integer lrcomm, Integer icomm[], Integer licomm, NagError *fail)

3
Description

nag_hier_mixed_init (g02jcc) must be called prior to fitting a linear mixed effects regression model with either nag_reml_hier_mixed_regsn (g02jdc) or nag_ml_hier_mixed_regsn (g02jec).
The model fitting functions nag_reml_hier_mixed_regsn (g02jdc) and nag_ml_hier_mixed_regsn (g02jec) fit a model of the following form:
y=Xβ+Zν+ε  
where y is a vector of n observations on the dependent variable,
X is an n by p design matrix of fixed independent variables,
β is a vector of p unknown fixed effects,
Z is an n by q design matrix of random independent variables,
ν is a vector of length q of unknown random effects,
ε is a vector of length n of unknown random errors,
and ν and ε are Normally distributed with expectation zero and variance/covariance matrix defined by
Var ν ε = G 0 0 R  
where R= σ R 2 I , I is the n×n identity matrix and G is a diagonal matrix.
Case weights can be incorporated into the model by replacing X and Z with Wc1/2X and Wc1/2Z respectively where Wc is a diagonal weight matrix.

4
References

None.

5
Arguments

1:     order Nag_OrderTypeInput
On entry: the order argument specifies the two-dimensional storage scheme being used, i.e., row-major ordering or column-major ordering. C language defined storage is specified by order=Nag_RowMajor. See Section 3.3.1.3 in How to Use the NAG Library and its Documentation for a more detailed explanation of the use of this argument.
Constraint: order=Nag_RowMajor or Nag_ColMajor.
2:     n IntegerInput
On entry: n, the number of observations.
The effective number of observations, that is the number of observations with nonzero weight (see wt for more detail), must be greater than the number of fixed effects in the model (as returned in nff).
Constraint: n1.
3:     ncol IntegerInput
On entry: the number of columns in the data matrix, dat.
Constraint: ncol0.
4:     dat[dim] const doubleInput
Note: the dimension, dim, of the array dat must be at least
  • max1,pddat×ncol when order=Nag_ColMajor;
  • max1,n×pddat when order=Nag_RowMajor.
Where DATi,j appears in this document, it refers to the array element
  • dat[j-1×pddat+i-1] when order=Nag_ColMajor;
  • dat[i-1×pddat+j-1] when order=Nag_RowMajor.
On entry: a matrix of data, with DATi,j holding the ith observation on the jth variable. The two design matrices X and Z are constructed from dat and the information given in fixed (for X) and rndm (for Z).
Constraint: if levels[j-1]1,1DATi,jlevels[j-1].
5:     pddat IntegerInput
On entry: the stride separating row or column elements (depending on the value of order) in the array dat.
Constraints:
  • if order=Nag_ColMajor, pddatn;
  • if order=Nag_RowMajor, pddatncol.
6:     levels[ncol] const IntegerInput
On entry: levels[i-1] contains the number of levels associated with the ith variable held in dat.
If the ith variable is continuous or binary (i.e., only takes the values zero or one) then levels[i-1] must be set to 1. Otherwise the ith variable is assumed to take an integer value between 1 and levels[i-1], (i.e., the ith variable is discrete with levels[i-1] levels).
Constraint: levels[i-1]1, for i=1,2,,ncol.
7:     y[n] const doubleInput
On entry: y, the vector of observations on the dependent variable.
8:     wt[n] const doubleInput
On entry: optionally, the weights to be used in the weighted regression.
If wt[i-1]=0.0, the ith observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights.
If weights are not provided then wt must be set to the null pointer, i.e., (double *)0, and the effective number of observations is n.
Constraint: if wtis notNULL, wt[i-1]0.0, for i=1,2,,n.
9:     fixed[lfixed] const IntegerInput
On entry: defines the structure of the fixed effects design matrix, X.
fixed[0]
The number of variables, NF, to include as fixed effects (not including the intercept if present).
fixed[1]
The fixed intercept flag which must contain 1 if a fixed intercept is to be included and 0 otherwise.
fixed[2+i-1]
The column of dat holding the ith fixed variable, for i=1,2,,fixed[0].
See Section 9.1 for more details on the construction of X.
Constraints:
  • fixed[0]0;
  • fixed[1]=0​ or ​1;
  • 1fixed[2+i-1]ncol, for i=1,2,,fixed[0].
10:   lfixed IntegerInput
On entry: length of the vector fixed.
Constraint: lfixed2+fixed[0].
11:   nrndm IntegerInput
On entry: the second dimension of the random effects design matrix rndm.
Constraint: nrndm>0.
12:   rndm[lrndm×nrndm] const IntegerInput
Note: where RNDMi,j appears in this document, it refers to the array element
  • rndm[j-1×lrndm+i-1] when order=Nag_ColMajor;
  • rndm[i-1×nrndm+j-1] when order=Nag_RowMajor.
On entry: RNDMi,j defines the structure of the random effects design matrix, Z. The bth column of rndm defines a block of columns in the design matrix Z.
RNDM1,b
The number of variables, NRb, to include as random effects in the bth block (not including the random intercept if present).
RNDM2,b
The random intercept flag which must contain 1 if block b includes a random intercept and 0 otherwise.
RNDM2+i,b
The column of dat holding the ith random variable in the bth block, for i=1,2,,RNDM1,b.
RNDM3+NRb,b
The number of subject variables, NSb, for the bth block. The subject variables define the nesting structure for this block.
RNDM3+NRb+i,b
The column of dat holding the ith subject variable in the bth block, for i=1,2,,RNDM3+NRb,b.
See Section 9.2 for more details on the construction of Z.
Constraints:
  • RNDM1,b0;
  • RNDM2,b=0​ or ​1;
  • at least one random variable or random intercept must be specified in each block, i.e., RNDM1,b + RNDM2,b > 0 ;
  • the column identifiers associated with the random variables must be in the range 1 to ncol, i.e., 1 RNDM2+i,b ncol , for i=1,2,,RNDM1,b;
  • RNDM3+NRb,b 0 ;
  • the column identifiers associated with the subject variables must be in the range 1 to ncol, i.e., 1 RNDM3+ N R b +i ,b ncol , for i=1,2,,RNDM3+NRb,b.
13:   lrndm IntegerInput
On entry: maximum number of entries in any column of rndm.
Constraint: lrndm max b 3+NRb+NSb .
14:   nff Integer *Output
On exit: p, the number of fixed effects estimated, i.e., the number of columns in the design matrix X.
15:   nlsv Integer *Output
On exit: the number of levels for the overall subject variable (see Section 9.2 for a description of what this means). If there is no overall subject variable, nlsv=1.
16:   nrf Integer *Output
On exit: the number of random effects estimated in each of the overall subject blocks. The number of columns in the design matrix Z is given by q=nrf×nlsv.
17:   rcomm[lrcomm] doubleCommunication Array
On exit: communication array as required by the analysis functions nag_reml_hier_mixed_regsn (g02jdc) and nag_ml_hier_mixed_regsn (g02jec).
18:   lrcomm IntegerInput
On entry: the dimension of the array rcomm.
Constraint: lrcommnrf×nlsv+nff+nff×nlsv+nrf×nlsv+nff+2.
19:   icomm[licomm] IntegerCommunication Array
On exit: if licomm=2, icomm[0] holds the minimum required value for licomm and icomm[1] holds the minimum required value for lrcomm, otherwise icomm is a communication array as required by the analysis functions nag_reml_hier_mixed_regsn (g02jdc) and nag_ml_hier_mixed_regsn (g02jec).
20:   licomm IntegerInput
On entry: the dimension of the array icomm.
Constraint: licomm=2 or licomm34+ NF×MFL+1+ nrndm×MNR×MRL+LRNDM+2×nrndm+ ncol+LDID×LB,
where
  • MNR = maxb N R b ,
  • MFL=maxi levels[fixed[2+i-1]-1] ,
  • MRL=maxb,i levels[RNDM2+i,b-1] ,
  • LDID=maxb NSb ,
  • LB=nff+nrf×nlsv, and
  • LRNDM= max b 3+NRb+NSb  
21:   fail NagError *Input/Output
The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6
Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
See Section 2.3.1.2 in How to Use the NAG Library and its Documentation for further information.
NE_BAD_PARAM
On entry, argument value had an illegal value.
NE_INT
On entry, lfixed=value.
Constraint: lfixedvalue.
On entry, licomm=value.
Constraint: licommvalue.
On entry, lrcomm=value.
Constraint: lrcommvalue.
On entry, lrndm=value.
Constraint: lrndmvalue.
On entry, n=value.
Constraint: n1.
On entry, ncol=value.
Constraint: ncol0.
On entry, nrndm=value.
Constraint: nrndm>0.
NE_INT_2
On entry, pddat=value and n=value.
Constraint: pddatn.
On entry, pddat=value and ncol=value.
Constraint: pddatncol.
NE_INT_ARRAY
On entry, index of fixed variable j is less than 1 or greater than ncol: j=value, index =value and ncol=value.
On entry, index of random variable j in random statement i is less than 1 or greater than ncol: i=value, j=value, index =value and ncol=value.
On entry, invalid value for fixed intercept flag: value =value.
On entry, invalid value for random intercept flag for random statement i: i=value, value =value.
On entry, levels[value]=value.
Constraint: levels[i-1]1.
On entry, must be at least one parameter, or an intercept in each random statement i: i=value.
On entry, nesting variable j in random statement i has one level: j=value, i=value.
On entry, number of fixed parameters, value is less than zero.
On entry, number of random parameters for random statement i is less than 0: i=value, number of parameters =value.
On entry, number of subject parameters for random statement i is less than 0: i=value, number of parameters =value.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 2.7.6 in How to Use the NAG Library and its Documentation for further information.
NE_NO_LICENCE
Your licence key may have expired or may not have been installed correctly.
See Section 2.7.5 in How to Use the NAG Library and its Documentation for further information.
NE_REAL_ARRAY
On entry, no observations due to zero weights.
On entry, variable j of observation i is less than 1 or greater than levels[j-1]: i=value, j=value, value =value, levels[j-1]=value.
On entry, wt[value]=value.
Constraint: wt[i-1]0.0.
NE_TOO_MANY
On entry, more fixed factors than observations, n=value.
Constraint: nvalue.

7
Accuracy

Not applicable.

8
Parallelism and Performance

nag_hier_mixed_init (g02jcc) is not threaded in any implementation.

9
Further Comments

9.1
Construction of the fixed effects design matrix, X

Let
The design matrix for the fixed effects, X, is constructed as follows:
The number of columns in the design matrix, X, is therefore given by
p= 1+ j=1 N F levels[fixed[2+j-1]-1] -1 .  
This quantity is returned in nff.
In summary, nag_hier_mixed_init (g02jcc) converts all non-binary categorical variables (i.e., where LFj>1) to dummy variables. If a fixed intercept is included in the model then the first level of all such variables is dropped. If a fixed intercept is not included in the model then the first level of all such variables, other than the first, is dropped. The variables are added into the model in the order they are specified in fixed.

9.2
Construction of random effects design matrix, Z

Let
The design matrix for the random effects, Z, is constructed as follows:
In summary, each column of rndm defines a block of consecutive columns in Z. nag_hier_mixed_init (g02jcc) converts all non-binary categorical variables (i.e., where LRjb or LSjb>1) to dummy variables. All random variables defined within a column of rndm are nested within all subject variables defined in the same column of rndm. In addition each of the subject variables are nested within each other, starting with the first (i.e., each of the Rjb,j=1,2,,NRb are nested within S1b which in turn is nested within S2b, which in turn is nested within S3b, etc.).
If the last subject variable in each column of rndm are the same (i.e., SNS11 = SNS22 = = SNSbb ) then all random effects in the model are nested within this variable. In such instances the last subject variable ( SNS11 ) is called the overall subject variable. The fact that all of the random effects in the model are nested within the overall subject variable means that ZTZ is block diagonal in structure. This fact can be utilised to improve the efficiency of the underlying computation and reduce the amount of internal storage required. The number of levels in the overall subject variable is returned in nlsv=LSNS11.
If the last k subject variables in each column of rndm are the same, for k>1 then the overall subject variable is defined as the interaction of these k variables and
nlsv= j=NS1-k+1 NS1 LSj1 .  
If there is no overall subject variable then nlsv=1.
The number of columns in the design matrix Z is given by q=nrf×nlsv.

9.3
The rndm argument

To illustrate some additional points about the rndm argument, we assume that we have a dataset with three discrete variables, V1, V2 and V3, with 2,4 and 3 levels respectively, and that V1 is in the first column of dat, V2 in the second and V3 the third. Also assume that we wish to fit a model containing V1 along with V2 nested within V3, as random effects. In order to do this the rndm matrix requires two columns:
rndm= 1 1 0 0 1 2 0 1 0 3  
The first column, 1,0,1,0,0, indicates one random variable (RNDM1,1=1), no intercept (RNDM2,1=0), the random variable is in the first column of dat (RNDM3,1=1), there are no subject variables; as no nesting is required for V1 (RNDM4,1=0). The last element in this column is ignored.
The second column, 1,0,2,1,3, indicates one random variable (RNDM1,2=1), no intercept (RNDM2,2=0), the random variable is in the second column of dat RNDM3,2=2, there is one subject variable (RNDM4,2=1), and the subject variable is in the third column of dat RNDM5,2=3.
The corresponding Z matrix would have 14 columns, with 2 coming from V1 and 12 (4×3) from V2 nested within V3. The, symmetric, ZTZ matrix has the form
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0 0 0 0 0 0 0 0 - - - - - - 0 0 0 0 0 0 0 0 - - - - - - 0 0 0 0 0 0 0 0 - - - - - - 0 0 0 0 0 0 0 0 - - 0 0 0 0 - - - - 0 0 0 0 - - 0 0 0 0 - - - - 0 0 0 0 - - 0 0 0 0 - - - - 0 0 0 0 - - 0 0 0 0 - - - - 0 0 0 0 - - 0 0 0 0 0 0 0 0 - - - - - - 0 0 0 0 0 0 0 0 - - - - - - 0 0 0 0 0 0 0 0 - - - - - - 0 0 0 0 0 0 0 0 - - - -  
where 0 indicates a structural zero, i.e., it always takes the value 0, irrespective of the data, and - a value that is not a structural zero. The first two rows and columns of ZTZ correspond to V1. The block diagonal matrix in the 12 rows and columns in the bottom right correspond to V2 nested within V3. With the 4×4 blocks corresponding to the levels of V2. There are three blocks as the subject variable (V3) has three levels.
The model fitting functions, nag_reml_hier_mixed_regsn (g02jdc) and nag_ml_hier_mixed_regsn (g02jec), use the sweep algorithm to calculate the log-likelihood function for a given set of variance components. This algorithm consists of moving down the diagonal elements (called pivots) of a matrix which is similar in structure to ZTZ, and updating each element in that matrix. When using the k diagonal element of a matrix A, an element a i j ,ik,jk , is adjusted by an amount equal to a i k a i j / a k k . This process can be referred to as sweeping on the kth pivot. As there are no structural zeros in the first row or column of the above ZTZ, sweeping on the first pivot of ZTZ would alter each element of the matrix and therefore destroy the structural zeros, i.e., we could no longer guarantee they would be zero.
Reordering the rndm matrix to
rndm= 1 1 0 0 2 1 1 0 3 0  
i.e., the swapping the two columns, results in a ZTZ matrix of the form
- - - - 0 0 0 0 0 0 0 0 - - - - - - 0 0 0 0 0 0 0 0 - - - - - - 0 0 0 0 0 0 0 0 - - - - - - 0 0 0 0 0 0 0 0 - - 0 0 0 0 - - - - 0 0 0 0 - - 0 0 0 0 - - - - 0 0 0 0 - - 0 0 0 0 - - - - 0 0 0 0 - - 0 0 0 0 - - - - 0 0 0 0 - - 0 0 0 0 0 0 0 0 - - - - - - 0 0 0 0 0 0 0 0 - - - - - - 0 0 0 0 0 0 0 0 - - - - - - 0 0 0 0 0 0 0 0 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
This matrix is identical to the previous one, except the first two rows and columns have become the last two rows and columns. Sweeping a matrix, A=aij, of this form on the first pivot will only affect those elements aij, where ai10​ and ​a1j0, which is only the 13th and 14th row and columns, and the top left hand block of 4 rows and columns. The block diagonal nature of the first 12 rows and columns therefore greatly reduces the amount of work the algorithm needs to perform.
nag_hier_mixed_init (g02jcc) constructs the ZTZ as specified by the rndm matrix, and does not attempt to reorder it to improve performance. Therefore for best performance some thought is required on what ordering to use. In general it is more efficient to structure rndm in such a way that the first row relates to the deepest level of nesting, the second to the next level, etc..

10
Example

See Section 10 in nag_reml_hier_mixed_regsn (g02jdc) and nag_ml_hier_mixed_regsn (g02jec).
© The Numerical Algorithms Group Ltd, Oxford, UK. 2017