NAG Library Function Document

1Purpose

nag_corr_cov (g02bxc) calculates the Pearson product-moment correlation coefficients and the variance-covariance matrix for a set of data. Weights may be used.

2Specification

 #include #include
 void nag_corr_cov (Integer n, Integer m, const double x[], Integer tdx, const Integer sx[], const double wt[], double *sw, double wmean[], double std[], double r[], Integer tdr, double v[], Integer tdv, NagError *fail)

3Description

For $n$ observations on $m$ variables the one-pass algorithm of West (1979) as implemented in nag_sum_sqs (g02buc) is used to compute the means, the standard deviations, the variance-covariance matrix, and the Pearson product-moment correlation matrix for $p$ selected variables. Suitables weights may be used to indicate multiple observations and to remove missing values. The quantities are defined by:
(a) The means
 $x - j = ∑ i=1 n w i x ij ∑ i=1 n w i j = 1 , … , p$
(b) The variance-covariance matrix
 $C jk = ∑ i=1 n w i x ij - x - j x ik - x - k ∑ i=1 n w i - 1 j , k = 1 , … , p$
(c) The standard deviations
 $s j = C jj j = 1 , … , p$
(d) The Pearson product-moment correlation coefficients
 $R jk = C jk C jj C kk j , k = 1 , … , p$
where ${x}_{ij}$ is the value of the $i$th observation on the $j$th variable and ${w}_{i}$ is the weight for the $i$th observation which will be 1 in the unweighted case.
Note that the denominator for the variance-covariance is ${\sum }_{i=1}^{n}{w}_{i}-1$, so the weights should be scaled so that the sum of weights reflects the true sample size.
Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

5Arguments

1:    $\mathbf{n}$IntegerInput
On entry: the number of observations in the dataset, $n$.
Constraint: ${\mathbf{n}}>1$.
2:    $\mathbf{m}$IntegerInput
On entry: the total number of variables, $m$.
Constraint: ${\mathbf{m}}\ge 1$.
3:    $\mathbf{x}\left[{\mathbf{n}}×{\mathbf{tdx}}\right]$const doubleInput
On entry: the data ${\mathbf{x}}\left[\left(\mathit{i}-1\right)×{\mathbf{tdx}}+\mathit{j}-1\right]$ must contain the $\mathit{i}$th observation on the $\mathit{j}$th variable, ${x}_{\mathit{i}\mathit{j}}$, for $\mathit{i}=1,2,\dots ,n$ and $\mathit{j}=1,2,\dots ,m$.
4:    $\mathbf{tdx}$IntegerInput
On entry: the stride separating matrix column elements in the array x.
Constraint: ${\mathbf{tdx}}\ge {\mathbf{m}}$.
5:    $\mathbf{sx}\left[{\mathbf{m}}\right]$const IntegerInput
On entry: indicates which $p$ variables to include in the analysis.
${\mathbf{sx}}\left[j-1\right]>0$
The $j$th variable is to be included.
${\mathbf{sx}}\left[j-1\right]=0$
The $j$th variable is not to be included.
sx is set to NULL
All variables are included in the analysis, i.e., $p=m$.
Constraint: ${\mathbf{sx}}\left[\mathit{i}\right]\ge 0$, for $\mathit{i}=1,2,\dots ,m$.
6:    $\mathbf{wt}\left[{\mathbf{n}}\right]$const doubleInput
On entry: $w$, the optional frequency weighting for each observation, with ${\mathbf{wt}}\left[i-1\right]={w}_{i}$. Usually ${w}_{i}$ will be an integral value corresponding to the number of observations associated with the $i$th data value, or zero if the $i$th data value is to be ignored. If wt is NULL then ${w}_{i}$ is set to $1$ for all $i$.
Constraint: if wt is not NULL, $\sum _{\mathit{i}=1}^{{\mathbf{n}}}{\mathbf{wt}}\left[\mathit{i}-1\right]>1.0$, ${\mathbf{wt}}\left[\mathit{i}-1\right]\ge 0.0$, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}$.
7:    $\mathbf{sw}$double *Output
On exit: the sum of weights if wt is not NULL, otherwise sw contains the number of observations, $n$.
8:    $\mathbf{wmean}\left[{\mathbf{m}}\right]$doubleOutput
On exit: the sample means. ${\mathbf{wmean}}\left[j-1\right]$ contains the mean for the $j$th variable.
9:    $\mathbf{std}\left[{\mathbf{m}}\right]$doubleOutput
On exit: the standard deviations. ${\mathbf{std}}\left[j-1\right]$ contains the standard deviation for the $j$th variable.
10:  $\mathbf{r}\left[{\mathbf{m}}×{\mathbf{tdr}}\right]$doubleOutput
On exit: the matrix of Pearson product-moment correlation coefficients. ${\mathbf{r}}\left[\left(j-1\right)×{\mathbf{tdr}}+k-1\right]$ contains the correlation between variables $j$ and $k$, for $j,k=1,\dots ,p$.
11:  $\mathbf{tdr}$IntegerInput
On entry: the stride separating matrix column elements in the array r.
Constraint: ${\mathbf{tdr}}\ge {\mathbf{m}}$.
12:  $\mathbf{v}\left[{\mathbf{m}}×{\mathbf{tdv}}\right]$doubleOutput
On exit: the variance-covariance matrix. ${\mathbf{v}}\left[\left(j-1\right)×{\mathbf{tdv}}+k-1\right]$ contains the covariance between variables $j$ and $k$, for $j,k=1,\dots ,p$.
13:  $\mathbf{tdv}$IntegerInput
On entry: the stride separating matrix column elements in the array v.
Constraint: ${\mathbf{tdv}}\ge {\mathbf{m}}$.
14:  $\mathbf{fail}$NagError *Input/Output
The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6Error Indicators and Warnings

NE_2_INT_ARG_LT
On entry, ${\mathbf{tdr}}=〈\mathit{\text{value}}〉$ while ${\mathbf{m}}=〈\mathit{\text{value}}〉$.
The arguments must satisfy ${\mathbf{tdr}}\ge {\mathbf{m}}$.
On entry, ${\mathbf{tdv}}=〈\mathit{\text{value}}〉$ while ${\mathbf{m}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{tdv}}\ge {\mathbf{m}}$.
On entry, ${\mathbf{tdx}}=〈\mathit{\text{value}}〉$ while ${\mathbf{m}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{tdx}}\ge {\mathbf{m}}$.
NE_ALLOC_FAIL
Dynamic memory allocation failed.
NE_INT_ARG_LE
On entry, n must be greater than 1: ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
NE_INT_ARG_LT
On entry, ${\mathbf{m}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{m}}\ge 1$.
NE_NEG_SX
On entry, at least one element of sx is negative.
NE_NEG_WEIGHT
On entry, at least one of the weights is negative.
NE_POS_SX
On entry, no element of sx is positive.
NE_SW_LT_ONE
On entry, the sum of weights is less than 1.0.
NE_VAR_EQ_ZERO
A variable has zero variance.
At least one variable has zero variance. In this case v and std are as calculated, but r will contain zero for any correlation involving a variable with zero variance.

7Accuracy

For a discussion of the accuracy of the one pass algorithm see Chan et al. (1982) and West (1979).

8Parallelism and Performance

nag_corr_cov (g02bxc) is not threaded in any implementation.

Correlation coefficients based on ranks can be computed using nag_ken_spe_corr_coeff (g02brc).

10Example

A program to calculate the means, standard deviations, variance-covariance matrix and a matrix of Pearson product-moment correlation coefficients for a set of 3 observations of 3 variables.

10.1Program Text

Program Text (g02bxce.c)

10.2Program Data

Program Data (g02bxce.d)

10.3Program Results

Program Results (g02bxce.r)

© The Numerical Algorithms Group Ltd, Oxford, UK. 2017