void	nag_mv_prin_coord_analysis (Nag_Eigenvalues roots, Integer n, const double d[], Integer ndim, double x[], Integer tdx, double eval[], NagError *fail)

3

Description

For a set of

n

objects, a distance matrix

D

can be calculated such that

d_{i j}

is a measure of how ‘far apart’ objects

i

and

j

are (see nag_mv_distance_mat (g03eac) for example). Principal coordinate analysis or metric scaling starts with a distance matrix and finds points

X

in Euclidean space such that those points have the same distance matrix. The aim is to find a small number of dimensions,

k < < (n - 1)

, that provide an adequate representation of the distances.

The principal coordinates of the points are computed from the eigenvectors of the matrix

E

where

e_{i j} = - \frac{1}{2} (d_{i j}^{2} - d_{i .}^{2} - d_{. j}^{2} - d_{. .}^{2})

with

d_{i}^{2}

denoting the average of

d_{i j}^{2}

over the suffix

j

etc.. The eigenvectors are then scaled by multiplying by the square root of the value of the corresponding eigenvalue.

Provided that the ordered eigenvalues,

λ_{i}

, of the matrix

E

are all positive,

\sum_{i = 1}^{k} λ_{i} / \sum_{i = 1}^{n - 1} λ_{i}

shows how well the data is represented in

k

dimensions. The eigenvalues will be non-negative if

E

is positive semidefinite. This will be true provided

d_{i j}

satisfies the inequality:

d_{i j} \leq d_{i k} + d_{j k}

for all

i, j, k

. If this is not the case the size of the negative eigenvalue reflects the amount of deviation from this condition and the results should be treated cautiously in the presence of large negative eigenvalues. See Krzanowski (1990) for further discussion. nag_mv_prin_coord_analysis (g03fac) provides the option for all eigenvalues to be computed so that the smallest eigenvalues can be checked.

4

References

Chatfield C and Collins A J (1980) Introduction to Multivariate Analysis Chapman and Hall

Gower J C (1966) Some distance properties of latent root and vector methods used in multivariate analysis Biometrika 53 325–338

Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press

5

Arguments

1: $roots$ – Nag_EigenvaluesInput

On entry: indicates if all the eigenvalues are to be computed or just the ndim largest.

$roots = Nag_AllEigVals$: All the eigenvalues are computed.
$roots = Nag_LargeEigVals$: Only the largest ndim eigenvalues are computed.

Constraint:

roots = Nag_AllEigVals

Nag_LargeEigVals

2: $n$ – IntegerInput

On entry: the number of objects in the distance matrix,

n

Constraint:

n > ndim

3: $d [n \times (n - 1) / 2]$ – const doubleInput

On entry: the lower triangle of the distance matrix

D

stored packed by rows. That is,

d [(i - 1) \times (i - 2) / 2 + j - 1]

must contain

d_{i j}

, for

i = 2, 3, \dots, n

and

j = 1, 2, \dots, i - 1

Constraint:

d [i - 1] \geq 0.0

, for

i = 1, 2, \dots, n (n - 1) / 2

4: $ndim$ – IntegerInput

On entry: the number of dimensions used to represent the data,

k

Constraint:

ndim \geq 1

5: $x [n \times tdx]$ – doubleOutput

Note: the

(i, j)

th element of the matrix

X

is stored in

x [(i - 1) \times tdx + j - 1]

On exit: the

i

th row contains

k

coordinates for the

i

th point,

i = 1, 2, \dots, n

6: $tdx$ – IntegerInput

On entry: the stride separating matrix column elements in the array x.

Constraint:

tdx \geq ndim

7: $eval [n]$ – doubleOutput

On exit: if

roots = Nag_AllEigVals

, eval contains the

n

scaled eigenvalues of the matrix

E

. If

roots = Nag_LargeEigVals

, eval contains the largest

k

scaled eigenvalues of the matrix

E

. In both cases the eigenvalues are divided by the sum of the eigenvalues (that is, the trace of

E

8: $fail$ – NagError *Input/Output

The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6

Error Indicators and Warnings

NE_2_INT_ARG_LE: On entry, $n = 〈value〉$ while $ndim = 〈value〉$ . These arguments must satisfy $n > ndim$ .
NE_2_INT_ARG_LT: On entry, $tdx = 〈value〉$ while $ndim = 〈value〉$ . These arguments must satisfy $tdx \geq ndim$ .
NE_ALLOC_FAIL: Dynamic memory allocation failed.
NE_BAD_PARAM: On entry, argument roots had an illegal value.
NE_EIGVAL: The computation of eigenvalues or eigenvectors has failed.
NE_INT_ARG_LT: On entry, $ndim = 〈value〉$ .
Constraint: $ndim \geq 1$ .
NE_INTERNAL_ERROR: An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_NEG_ELEMENT: At least one element of the array d < 0.0.
Constraint: $d [i - 1] \geq 0.0$ , for $i = 1, 2, \dots, n \times (n - 1) / 2$ .
NE_NONZERO_EIGVALS: There are fewer than ndim eigenvalues greater than zero. Try a smaller number of dimensions (ndim) or use non-metric scaling (nag_mv_ordinal_multidimscale (g03fcc)).
NE_REALARR: On entry, $d [〈value〉] = 〈value〉$ .
Constraint: $d [i - 1] \geq 0.0$ , for $i = 1, 2, \dots, n \times (n - 1) / 2$ .

7

Accuracy

Alternative, non-metric, methods of scaling are provided by nag_mv_ordinal_multidimscale (g03fcc).

The relationship between principal coordinates and principal components (see nag_mv_ordinal_multidimscale (g03fcc)), is discussed in Krzanowski (1990) and Gower (1966).

8

Parallelism and Performance

nag_mv_prin_coord_analysis (g03fac) is not threaded in any implementation.

9

Further Comments

None.

10

Example

The data, given by Krzanowski (1990), are dissimilarities between water vole populations in Europe. The first two principal coordinates are computed.

NAG Library Function Document

nag_mv_prin_coord_analysis (g03fac)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

1

Purpose

2

Specification

3

Description

4

References

5

Arguments

6

Error Indicators and Warnings

7

Accuracy

8

Parallelism and Performance

9

Further Comments

10

Example

10.1

Program Text

10.2

Program Data

10.3

Program Results