NAG Library Function Document

nag_mv_discrim (g03dac)

 Contents

    1  Purpose
    7  Accuracy

1
Purpose

nag_mv_discrim (g03dac) computes a test statistic for the equality of within-group covariance matrices and also computes matrices for use in discriminant analysis.

2
Specification

#include <nag.h>
#include <nagg03.h>
void  nag_mv_discrim (Integer n, Integer m, const double x[], Integer tdx, const Integer isx[], Integer nvar, const Integer ing[], Integer ng, const double wt[], Integer nig[], double gmean[], Integer tdg, double det[], double gc[], double *stat, double *df, double *sig, NagError *fail)

3
Description

Let a sample of n  observations on p  variables come from n g  groups with n j  observations in the j th group and n j = n . If the data is assumed to follow a multivariate Normal distribution with the variance-covariance matrix of the j th group Σ j , then to test for equality of the variance-covariance matrices between groups, that is, Σ 1 = Σ 2 = = Σ n g = Σ , the following likelihood-ratio test statistic, G , can be used;
G = C n-n g log S - j=1 n g n j - 1 log S j ,  
where
C = 1 - 2 p 2 + 3 p - 1 6 p+1 n g - 1 j=1 n g 1 n j - 1 - 1 n-n g ,  
and S j  are the within-group variance-covariance matrices and S  is the pooled variance-covariance matrix given by
S = j=1 n g n j - 1 S j n-n g .  
For large n , G  is approximately distributed as a χ 2  variable with 1 2 p p+1 n g - 1  degrees of freedom, see Morrison (1967) for further comments. If weights are used, then S  and S j  are the weighted pooled and within-group variance-covariance matrices and n  is the effective number of observations, that is, the sum of the weights.
Instead of calculating the within-group variance-covariance matrices and then computing their determinants in order to calculate the test statistic, nag_mv_discrim (g03dac) uses a QR  decomposition. The group means are subtracted from the data and then for each group, a QR  decomposition is computed to give an upper triangular matrix R j * . This matrix can be scaled to give a matrix R j  such that S j = RjT R j . The pooled R  matrix is then computed from the R j  matrices. The values of S  and the S j  can then be calculated from the diagonal elements of R  and the R j .
This approach means that the Mahalanobis squared distances for a vector observation x  can be computed as zT z , where R j z = x - x - j , x - j  being the vector of means of the j th group. These distances can be calculated by nag_mv_discrim_mahaldist (g03dbc). The distances are used in discriminant analysis and nag_mv_discrim_group (g03dcc) uses the results of nag_mv_discrim (g03dac) to perform several different types of discriminant analysis. The differences between the discriminant methods are, in part, due to whether or not the within-group variance-covariance matrices are equal.

4
References

Aitchison J and Dunsmore I R (1975) Statistical Prediction Analysis Cambridge
Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press
Morrison D F (1967) Multivariate Statistical Methods McGraw–Hill

5
Arguments

1:     n IntegerInput
On entry: the number of observations, n .
Constraint: n1 .
2:     m IntegerInput
On entry: the number of variables in the data array x.
Constraint: mnvar .
3:     x[n×tdx] const doubleInput
On entry: x[k-1×tdx+l-1]  must contain the k th observation for the l th variable, for k=1,2,,n and l=1,2,,m.
4:     tdx IntegerInput
On entry: the stride separating matrix column elements in the array x.
Constraint: tdxm .
5:     isx[m] const IntegerInput
On entry: isx[l-1]  indicates whether or not the l th variable in x is to be included in the variance-covariance matrices.
If isx[l-1] > 0  the l th variable is included, for l=1,2,,m; otherwise it is not referenced.
Constraint: isx[l-1] > 0  for nvar values of l .
6:     nvar IntegerInput
On entry: the number of variables in the variance-covariance matrices, p .
Constraint: nvar1 .
7:     ing[n] const IntegerInput
On entry: ing[k-1]  indicates to which group the k th observation belongs, for k=1,2,,n.
Constraint: 1 ing[k-1] ng , for k=1,2,,n
The values of ing must be such that each group has at least nvar members
8:     ng IntegerInput
On entry: the number of groups, n g .
Constraint: ng2 .
9:     wt[n] const doubleInput
On entry: the elements of wt must contain the weights to be used in the analysis and the effective number of observations for a group is the sum of the weights of the observations in that group. If wt[k-1] = 0.0  then the k th observation is excluded from the calculations.
If weights are not provided then wt must be set to NULL and the effective number of observations for a group is the number of observations in that group.
Constraints:
  • if wt is not NULL, wt[k-1] 0.0 , for k=1,2,,n;
  • the effective number of observations for each group must be greater than 1.
10:   nig[ng] IntegerOutput
On exit: nig[j-1]  contains the number of observations in the j th group, for j=1,2,, n g .
11:   gmean[ng×tdg] doubleOutput
Note: the i,jth element of the matrix is stored in gmean[i-1×tdg+j-1].
On exit: the j th row of gmean contains the means of the p  selected variables for the j th group, for j=1,2,, n g .
12:   tdg IntegerInput
On entry: the stride separating matrix column elements in the array gmean.
Constraint: tdgnvar .
13:   det[ng] doubleOutput
On exit: the logarithm of the determinants of the within-group variance-covariance matrices.
14:   gc[dim] doubleOutput
Note: the dimension, dim, of the array gc must be at least ng+1×nvar×nvar+1/2.
On exit: the first p p+1 / 2  elements of gc contain R  and the remaining n g  blocks of p p+1 / 2  elements contain the R j  matrices. All are stored in packed form by columns.
15:   stat double *Output
On exit: the likelihood-ratio test static, G .
16:   df double *Output
On exit: the degrees of freedom for the distribution of G .
17:   sig double *Output
On exit: the significance level for G .
18:   fail NagError *Input/Output
The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6
Error Indicators and Warnings

NE_2_INT_ARG_LT
On entry, m=value  while nvar=value . These arguments must satisfy mnvar .
On entry, tdg=value  while nvar=value . These arguments must satisfy tdgnvar .
On entry, tdx=value  while m=value . These arguments must satisfy tdxm .
NE_ALLOC_FAIL
Dynamic memory allocation failed.
NE_GROUP_OBSERV
On entry, group value has value effective observations.
Constraint: in each group the effective number of observations must be 1 .
NE_GROUP_VAR
On entry, group value has value members, while nvar=value .
Constraint: number of members in each group nvar .
NE_GROUP_VAR_RANK
The variables in group value are not of full rank.
NE_INT_ARG_LT
On entry, n=value.
Constraint: n1.
On entry, ng=value.
Constraint: ng2.
On entry, nvar=value.
Constraint: nvar1.
NE_INTARR_INT
On entry, ing[value] = value, ng=value .
Constraint: 1 ing[i-1] ng , for i=1,2,,n.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_NEG_WEIGHT_ELEMENT
On entry, wt[value] = value.
Constraint: when referenced, all elements of wt must be non-negative.
NE_VAR_INCL_INDICATED
The number of variables, nvar in the analysis =value , while number of variables included in the analysis via array isx=value . Constraint: these two numbers must be the same.
NE_VAR_RANK
The variables are not of full rank.

7
Accuracy

The accuracy is dependent on the accuracy of the computation of the QR  decomposition.

8
Parallelism and Performance

nag_mv_discrim (g03dac) is not threaded in any implementation.

9
Further Comments

The time will be approximately proportional to np2 .

10
Example

The data, taken from Aitchison and Dunsmore (1975), is concerned with the diagnosis of three ‘types’ of Cushing's syndrome. The variables are the logarithms of the urinary excretion rates (mg/24hr) of two steroid metabolites. Observations for a total of 21 patients are input and the statistics computed by nag_mv_discrim (g03dac). The printed results show that there is evidence that the within-group variance-covariance matrices are not equal.

10.1
Program Text

Program Text (g03dace.c)

10.2
Program Data

Program Data (g03dace.d)

10.3
Program Results

Program Results (g03dace.r)

© The Numerical Algorithms Group Ltd, Oxford, UK. 2017