NAG Library Function Document

nag_tabulate_stats (g11bac)

 Contents

    1  Purpose
    7  Accuracy

1
Purpose

nag_tabulate_stats (g11bac) computes a table from a set of classification factors using a selected statistic.

2
Specification

#include <nag.h>
#include <nagg11.h>
void  nag_tabulate_stats (Nag_TableStats stat, Nag_TableUpdate update, Nag_Weightstype weight, Integer n, Integer nfac, const Integer sf[], const Integer lfac[], const Integer factor[], Integer tdf, const double y[], const double wt[], double table[], Integer maxt, Integer *ncells, Integer *ndim, Integer idim[], Integer count[], double comm_ar[], NagError *fail)

3
Description

A dataset may include both classification variables and general variables. The classification variables, known as factors, take a small number of values known as levels. For example, the factor sex would have the levels male and female. These can be coded as 1 and 2 respectively. Given several factors, a multi-way table can be constructed such that each cell of the table represents one level from each factor. For example, the two factors sex and habitat, habitat having three levels: inner-city, suburban and rural, define the 2 by 3 contingency table:
Sex Habitat
  Inner-city Suburban Rural
Male      
Female      
For each cell statistics can be computed. If a third variable in the dataset was age, then for each cell the average age could be computed:
Sex Habitat
  Inner-city Suburban Rural
Male 25.5 30.3 35.6
Female 23.2 29.1 30.4
That is the average age for all observations for males living in rural areas is 35.6. Other statistics can also be computed: the number of observations, the total, the variance, the largest value and the smallest value.
nag_tabulate_stats (g11bac) computes a table for one of the selected statistics. The factors have to be coded with levels 1,2, . Weights can be used to eliminate values from the calculations, e.g., if they represent ‘missing values’. There is also the facility to update an existing table with the addition of new observations.

4
References

John J A and Quenouille M H (1977) Experiments: Design and Analysis Griffin
Kendall M G and Stuart A (1969) The Advanced Theory of Statistics (Volume 1) (3rd Edition) Griffin
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

5
Arguments

1:     stat Nag_TableStatsInput
On entry: indicates which statistic is to be computed for the table cells.
stat=Nag_TableStatsNObs
The number of observations for each cell.
stat=Nag_TableStatsTotal
The total for the variable in y for each cell.
stat=Nag_TableStatsAv
The average (mean) for the variable in y for each cell.
stat=Nag_TableStatsVar
The variance for the variable in y for each cell.
stat=Nag_TableStatsLarge
The largest value for the variable in y for each cell.
stat=Nag_TableStatsSmall
The smallest value for the variable in y for each cell.
Constraint: stat=Nag_TableStatsNObs, Nag_TableStatsTotal, Nag_TableStatsAv, Nag_TableStatsVar, Nag_TableStatsLarge or Nag_TableStatsSmall.
2:     update Nag_TableUpdateInput
On entry: indicates if an existing table is to be updated by further observation.
update=Nag_TableUpdateI
The table cells will be initialized to zero before tabulations take place.
update=Nag_TableUpdateU
The table input in table will be updated. The arguments ncells, table, count and comm_ar must remain unchanged from the previous call to nag_tabulate_stats (g11bac).
Constraint: update=Nag_TableUpdateI or Nag_TableUpdateU.
3:     weight Nag_WeightstypeInput
On entry: indicates if weights are to be used.
weight=Nag_NoWeights
Weights are not used and unit weights are assumed.
weight=Nag_Weights or Nag_Weightsvar
Weights are used and must be supplied in wt. The only difference between weight=Nag_Weights and weight=Nag_Weightsvar is if the variance is computed.
weight=Nag_Weights
The divisor for the variance is the sum of the weights minus one and if weight=Nag_Weightsvar, the divisor is the number of observations with nonzero weights minus one. The former is useful if the weights represent the frequency of the observed values.
If stat=Nag_TableStatsTotal or Nag_TableStatsAv, the weighted total or mean is computed respectively.
If stat=Nag_TableStatsNObs, Nag_TableStatsLarge or Nag_TableStatsSmall the only effect of weights is to eliminate values with zero weights from the computations.
Constraint: weight=Nag_NoWeights, Nag_Weightsvar or Nag_Weights.
4:     n IntegerInput
On entry: the number of observations.
Constraint: n2 .
5:     nfac IntegerInput
On entry: the number of classifying factors in factor.
Constraint: nfac1 .
6:     sf[nfac] const IntegerInput
On entry: indicates which factors in factor are to be used in the tabulation.
If sf[i-1] > 0  the i th factor in factor is included in the tabulation.
Note that if sf[i-1] 0  for i = 1 , 2 , , nfac  then the statistic for the whole sample is calculated and returned in a 1 by 1 table.
7:     lfac[nfac] const IntegerInput
On entry: the number of levels of the classifying factors in factor.
Constraint: if sf[i-1] > 0 , lfac[i-1] 2 , for i=1,2,,nfac.
8:     factor[n×tdf] const IntegerInput
On entry: the nfac coded classification factors for the n observations.
Constraint: 1 factor[i-1×tdf+j-1] lfac[j-1] , for i=1,2,,n and j=1,2,,nfac.
9:     tdf IntegerInput
On entry: the stride separating matrix column elements in the array factor.
Constraint: tdfnfac .
10:   y[n] const doubleInput
On entry: the variable to be tabulated.
If stat=Nag_TableStatsNObs, y is not referenced.
11:   wt[n] const doubleInput
On entry: if weight=Nag_Weights or Nag_Weightsvar, wt must contain the n weights. Otherwise wt is not referenced and can be set to null, (double *)0.
Constraint: if weight=Nag_Weights or Nag_Weightsvar, wt[i-1] 0.0 , for i=1,2,,n.
12:   table[maxt] doubleInput/Output
On entry: if update=Nag_TableUpdateU, table must be unchanged from the previous call to nag_tabulate_stats (g11bac), otherwise table need not be set.
On exit: the computed table. The ncells cells of the table are stored so that for any two factors the index relating to the factor referred to later in lfac and factor changes faster. For further details see Section 9.
13:   maxt IntegerInput
On entry: the maximum size of the table to be computed.
Constraint: maxt  product of the levels of the factors included in the tabulation.
14:   ncells Integer *Input/Output
On entry: if update=Nag_TableUpdateU, ncells must be unchanged from the previous call to nag_tabulate_stats (g11bac), otherwise ncells need not be set.
On exit: the number of cells in the table.
15:   ndim Integer *Output
On exit: the number of factors defining the table.
16:   idim[nfac] IntegerOutput
On exit: the first ndim elements contain the number of levels for the factors defining the table.
17:   count[maxt] IntegerInput/Output
On entry: if update=Nag_TableUpdateU, count must be unchanged from the previous call to nag_tabulate_stats (g11bac), otherwise count need not be set.
On exit: a table containing the number of observations contributing to each cell of the table, stored identically to table. Note if stat=Nag_TableStatsNObs this is the same as is returned in table.
18:   comm_ar[*] doubleInput/Output
On entry: if update=Nag_TableUpdateU, comm_ar must be unchanged from the previous call to nag_tabulate_stats (g11bac), otherwise comm_ar need not be set.
On exit: if stat=Nag_TableStatsAv or Nag_TableStatsVar, the first ncells values hold the table containing the sum of the weights for the observations contributing to each cell, stored identically to table. If stat=Nag_TableStatsVar, then the second set of ncells values hold the table of cell means. Otherwise comm_ar is not referenced.
19:   fail NagError *Input/Output
The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6
Error Indicators and Warnings

NE_2_INT_ARG_LT
On entry, tdf=value  while nfac=value . These arguments must satisfy tdfnfac .
NE_2_INT_ARRAY_CONS
On entry, sf[value] = value while lfac[0] = value.
Constraint: if sf[i] > 0 , lfac[i] 2  for i = 0 , 1 , , nfac .
NE_2D_1D_INT_ARRAYS_CONS
On entry, factor[value×tdf+value] = value while lfac[0] = value.
Constraint: factor[i×tdf+j] lfac[j] , for i=0,1,,n - 1 and j=0,1,,nfac - 1.
NE_2D_INT_ARRAY_CONS
On entry, factor[value×tdf+value] = value.
Constraint: factor[i×tdf+j] 1 , for i=0,1,,n - 1 and j=0,1,,nfac - 1.
NE_ALLOC_FAIL
Dynamic memory allocation failed.
NE_BAD_PARAM
On entry, argument stat had an illegal value.
On entry, argument update had an illegal value.
On entry, argument weight had an illegal value.
NE_G11BA_CHANGED
update=Nag_TableUpdateU and at least one of ncells, table, comm_ar or count have been changed since previous call to nag_tabulate_stats (g11bac).
NE_INT_ARG_LT
On entry, n=value.
Constraint: n2.
On entry, nfac=value.
Constraint: nfac1.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_MAXT
The maximum size of the table to be computed, maxt is too small.
NE_REAL_ARRAY_CONS
On entry, wt[value] = value.
Constraint: if weight=Nag_Weights or Nag_Weightsvar, wt[i] 0.0 .
NE_VAR_DIV
stat=Nag_TableStatsVar and the divisor for the variance 0.0 .
NE_WT_ARGS
The wt array argument must not be NULL when the weight argument indicates weights.

7
Accuracy

Only applicable when stat=Nag_TableStatsVar. In this case a one pass algorithm is used as described by West (1979).

8
Parallelism and Performance

nag_tabulate_stats (g11bac) is not threaded in any implementation.

9
Further Comments

The tables created by nag_tabulate_stats (g11bac) and stored in table, count and, depending on stat, also in comm_ar are stored in the following way. Let there be n  factors defining the table with factor k  having l k  levels, then the cell defined by the levels i 1 , i 2 , , i n  of the factors is stored in m th cell given by:
m = 1 + k=1 n i k - 1 c k ,  
where c j = k = j + 1 n l k , for j=1,2,,n - 1 and c n = 1 .

10
Example

The data, given by John and Quenouille (1977), is for a 3 by 6 factorial experiment in 3 blocks of 18 units. The data is input in the order: blocks, factor with 3 levels, factor with 6 levels, yield. The 3 by 6 table of treatment means for yield over blocks is computed and printed.

10.1
Program Text

Program Text (g11bace.c)

10.2
Program Data

Program Data (g11bace.d)

10.3
Program Results

Program Results (g11bace.r)

© The Numerical Algorithms Group Ltd, Oxford, UK. 2017