void	nag_robust_m_estim_1var (Nag_SigmaSimulEst sigma_est, Integer n, const double x[], Nag_PsiFun psifun, double c, double h1, double h2, double h3, double dchi, double theta, double sigma, Integer maxit, double tol, double rs[], Integer nit, double sorted_x[], NagError fail)

3

Description

The data consists of a sample of size

n

, denoted by

x_{1}, x_{2}, \dots, x_{n}

, drawn from a random variable

X

The

x_{i}

are assumed to be independent with an unknown distribution function of the form

F ((x_{i} - θ) / σ)

where

θ

is a location argument, and

σ

is a scale argument.

M

-estimators of

θ

and

σ

are given by the solution to the following system of equations:

\sum_{i = 1}^{n} ψ ((x_{i} - \hat{θ}) / \hat{σ}) = 0

(1)

\sum_{i = 1}^{n} χ ((x_{i} - \hat{θ}) / \hat{σ}) = (n - 1) β

(2)

where

ψ

and

χ

are given functions, and

β

is a constant, such that

\hat{σ}

is an unbiased estimator when

x_{i}

, for

i = 1, 2, \dots, n

, has a normal distribution. Optionally, the second equation can be omitted and the first equation is solved for

\hat{θ}

using an assigned value of

σ = σ_{c}

The values of

ψ (\frac{x_{i} - \hat{θ}}{\hat{σ}}) \hat{σ}

are known as the Winsorized residuals.

The following functions are available for

ψ

and

χ

in nag_robust_m_estim_1var (g07dbc):

(a)

Null Weights

ψ (t)

= t

χ (t) = \frac{t^{2}}{2}

Use of these null functions leads to the mean and standard deviation of the data.

(b)

Huber's Function

$ψ (t)$	$= \max (- c, \min (c, t))$		$χ (t) = \frac{{‖t‖}^{2}}{2} ‖t‖ \leq d$
			$χ (t) = \frac{d^{2}}{2} ‖t‖ > d$

(c)

Hampel's Piecewise Linear Function

$ψ_{h_{1}, h_{2}, h_{3}} (t)$	$= - ψ_{h_{1}, h_{2}, h_{3}} (- t)$
	$= t$	$0 \leq t \leq h_{1}$	$χ (t) = \frac{{\|t\|}^{2}}{2} \|t\| \leq d$
	$= h_{1}$	$h_{1} \leq t \leq h_{2}$
	$= h_{1} (h_{3} - t) / (h_{3} - h_{2})$	$h_{2} \leq t \leq h_{3}$	$χ (t) = \frac{d^{2}}{2} \|t\| > d$
	$= 0$	$t > h_{3}$

(d)

Andrew's Sine Wave Function

$ψ (t)$	$= \sin t$	$- π \leq t \leq π$	$χ (t) = \frac{{\|t\|}^{2}}{2} \|t\| \leq d$
	$= 0$	otherwise	$χ (t) = \frac{d^{2}}{2} \|t\| > d$

(e)

Tukey's Bi-weight

$ψ (t)$	$= t {(1 - t^{2})}^{2}$	$\|t\| \leq 1$	$χ (t) = \frac{{\|t\|}^{2}}{2} \|t\| \leq d$
	$= 0$	otherwise	$χ (t) = \frac{d^{2}}{2} \|t\| > d$

where

c

h_{1}

h_{2}

h_{3}

and

d

are constants.

Equations (1) and (2) are solved by a simple iterative procedure suggested by Huber:

{\hat{σ}}_{k} = \sqrt{\frac{1}{β (n - 1)} (\sum_{i = 1}^{n} χ (\frac{x_{i} - {\hat{θ}}_{k - 1}}{{\hat{σ}}_{k - 1}})) {\hat{σ}}_{k - 1}^{2}}

and

{\hat{θ}}_{k} = {\hat{θ}}_{k - 1} + \frac{1}{n} \sum_{i = 1}^{n} ψ (\frac{x_{i} - {\hat{θ}}_{k - 1}}{{\hat{σ}}_{k}}) {\hat{σ}}_{k}

{\hat{σ}}_{k} = σ_{c}, if ​ σ ​ is fixed.

The initial values for

\hat{θ}

and

\hat{σ}

may either be user-supplied or calculated within nag_robust_m_estim_1var (g07dbc) as the sample median and an estimate of

σ

based on the median absolute deviation respectively.

nag_robust_m_estim_1var (g07dbc) is based upon subroutine LYHALG within the ROBETH library, see Marazzi (1987).

4

References

Hampel F R, Ronchetti E M, Rousseeuw P J and Stahel W A (1986) Robust Statistics. The Approach Based on Influence Functions Wiley

Huber P J (1981) Robust Statistics Wiley

Marazzi A (1987) Subroutines for robust estimation of location and scale in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 1 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

5

Arguments

1: $sigma_est$ – Nag_SigmaSimulEstInput

On entry: the value assigned to sigma_est determines whether

\hat{σ}

is to be simultaneously estimated.

$sigma_est = Nag_SigmaBypas$: The estimation of $\hat{σ}$ is bypassed and sigma is set equal to $σ_{c}$ ;
$sigma_est = Nag_SigmaSimul$: $\hat{σ}$ is estimated simultaneously.

Constraint:

sigma_est = Nag_SigmaBypas

Nag_SigmaSimul

2: $n$ – IntegerInput

On entry: the number of observations,

n

Constraint:

n > 1

3: $x [n]$ – const doubleInput

On entry: the vector of observations,

x_{1}, x_{2}, \dots, x_{n}

4: $psifun$ – Nag_PsiFunInput

On entry: which

ψ

function is to be used.

$psifun = Nag_Lsq$

ψ (t) = t .

$psifun = Nag_HuberFun$

Huber's function.

$psifun = Nag_HampelFun$

Hampel's piecewise linear function.

$psifun = Nag_AndrewFun$

Andrew's sine wave.

$psifun = Nag_TukeyFun$

Tukey's bi-weight.

Constraint:

psifun = Nag_Lsq

Nag_HuberFun

Nag_HampelFun

Nag_AndrewFun

Nag_TukeyFun

5: $c$ – doubleInput

On entry: must specify the argument,

c

, of Huber's

ψ

function, if

psifun = Nag_HuberFun

. c is not referenced if

psifun \neq Nag_HuberFun

Constraint:

c > 0.0

psifun = Nag_HuberFun

6: $h1$ – doubleInput

7: $h2$ – doubleInput

8: $h3$ – doubleInput

On entry: h1, h2, and h3 must specify the arguments

h_{1}

h_{2}

, and

h_{3}

, of Hampel's piecewise linear

ψ

function, if

psifun = Nag_HampelFun

. h1, h2, and h3 are not referenced if

psifun \neq Nag_HampelFun

Constraint:

0 \leq h1 \leq h2 \leq h3

and

h3 > 0.0

psifun = Nag_HampelFun

9: $dchi$ – doubleInput

On entry: the argument,

d

, of the

χ

function. dchi is not referenced if

psifun = Nag_Lsq

Constraint:

dchi > 0.0

psifun \neq Nag_Lsq

10: $theta$ – double *Input/Output

On entry: if

sigma > 0

then theta must be set to the required starting value of the estimation of the location argument

\hat{θ}

. A reasonable initial value for

\hat{θ}

will often be the sample mean or median.

On exit: the

M

-estimate of the location argument,

\hat{θ}

11: $sigma$ – double *Input/Output

The role of sigma depends on the value assigned to sigma_est (see above) as follows.

sigma_est = Nag_SigmaSimul

On entry: sigma must be assigned a value which determines the values of the starting points for the calculations of

\hat{θ}

and

\hat{σ}

. If

sigma \leq 0.0

then nag_robust_m_estim_1var (g07dbc) will determine the starting points of

\hat{θ}

and

\hat{σ}

. Otherwise the value assigned to sigma will be taken as the starting point for

\hat{σ}

, and theta must be assigned a value before entry, see above.

sigma_est = Nag_SigmaBypas

On entry: sigma must be assigned a value which determines the value of

σ_{c}

, which is held fixed during the iterations, and the starting value for the calculation of

\hat{θ}

. If

sigma \leq 0

, then nag_robust_m_estim_1var (g07dbc) will determine the value of

σ_{c}

as the median absolute deviation adjusted to reduce bias (see G07DAF) and the starting point for

\hat{θ}

. Otherwise, the value assigned to sigma will be taken as the value of

σ_{c}

and theta must be assigned a relevant value before entry, see above.

On exit: sigma contains the

M

– estimate of the scale argument,

\hat{σ}

, if

sigma_est = Nag_SigmaSimul

on entry, otherwise sigma will contain the initial fixed value

σ_{c}

12: $maxit$ – IntegerInput

On entry: the maximum number of iterations that should be used during the estimation.

Suggested value: p

maxit = 50

Constraint:

maxit > 0

13: $tol$ – doubleInput

On entry: the relative precision for the final estimates. Convergence is assumed when the increments for theta, and sigma are less than

tol \times \max (1.0, σ_{k - 1})

Constraint:

tol > 0.0

14: $rs [n]$ – doubleOutput

On exit: the Winsorized residuals.

15: $nit$ – Integer *Output

On exit: the number of iterations that were used during the estimation.

16: $sorted_x [n]$ – doubleOutput

On exit: if

sigma \leq 0.0

on entry, sorted_x will contain the

n

observations in ascending order.

17: $fail$ – NagError *Input/Output

The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6

Error Indicators and Warnings

NE_2_REAL_ENUM_ARG_CONS: On entry, $h1 = 〈value〉$ , $h2 = 〈value〉$ and $psifun = 〈value〉$ . These arguments must satisfy $h1 \leq h2$ , $psifun = Nag_HampelFun$ .

On entry, $h1 = 〈value〉$ , $h3 = 〈value〉$ and $psifun = 〈value〉$ . These arguments must satisfy $h1 \leq h3$ , $psifun = Nag_HampelFun$ .

On entry, $h2 = 〈value〉$ , $h3 = 〈value〉$ and $psifun = 〈value〉$ . These arguments must satisfy $h2 \leq h3$ , $psifun = Nag_HampelFun$ .
NE_3_REAL_ENUM_ARG_CONS: On entry, $h1 = 〈value〉$ , $h2 = 〈value〉$ , $h3 = 〈value〉$ , $psifun = 〈value〉$ . These arguments must satisfy $h1 = h2$ = $h3 \neq 0.0$ , $psifun = Nag_HampelFun$ .
NE_ALL_ELEMENTS_EQUAL: On entry, all the values in the array x must not be equal.
NE_BAD_PARAM: On entry, argument psifun had an illegal value.

On entry, argument sigma_est had an illegal value.
NE_ESTIM_SIGMA_ZERO: The estimated value of sigma was $\leq 0.0$ during an iteration.
NE_INT_ARG_LE: On entry, $maxit = 〈value〉$ .
Constraint: $maxit > 0$ .

On entry, $n = 〈value〉$ .
Constraint: $n > 1$ .
NE_INTERNAL_ERROR: An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_REAL_ARG_LE: On entry, tol must not be less than or equal to 0.0: $tol = 〈value〉$ .
NE_REAL_ENUM_ARG_CONS: On entry, $c = 〈value〉$ , $psifun = 〈value〉$ . These arguments must satisfy $c > 0.0$ , $psifun = Nag_HuberFun$ .

On entry, $dchi = 〈value〉$ , $psifun = 〈value〉$ . These arguments must satisfy $dchi > 0.0$ , $psifun \neq Nag_Lsq$ .

On entry, $h1 = 〈value〉$ , $psifun = 〈value〉$ . These arguments must satisfy $h1 \geq 0.0$ , $psifun = Nag_HampelFun$ .
NE_TOO_MANY: Too many iterations ( $〈value〉$ ).
NE_WINS_RES_ZERO: The Winsorized residuals are zero.
On completion of the iterations, the Winsorized residuals were all zero. This may occur when using the $sigma_est = Nag_SigmaBypas$ option with a redescending $ψ$ function, i.e., Hampel's piecewise linear function, Andrew's sine wave, and Tukey's biweight.
If the given value of $σ$ is too small, then the standardized residuals $\frac{x_{i} - {\hat{θ}}_{k}}{σ_{c}}$ , will be large and all the residuals may fall into the region for which $ψ (t) = 0$ . This may incorrectly terminate the iterations thus making theta and sigma invalid.
Re-enter the function with a larger value of $σ_{c}$ or with $sigma_est = Nag_SigmaSimul$ .

7

Accuracy

On successful exit the accuracy of the results is related to the value of TOL, see Section 4.

8

Parallelism and Performance

nag_robust_m_estim_1var (g07dbc) is not threaded in any implementation.

9

Further Comments

When you supply the initial values, care has to be taken over the choice of the initial value of

σ

. If too small a value of

σ

is chosen then initial values of the standardized residuals

\frac{x_{i} - {\hat{θ}}_{k}}{σ}

will be large. If the redescending

ψ

functions are used, i.e., Hampel's piecewise linear function, Andrew's sine wave, or Tukey's bi-weight, then these large values of the standardized residuals are Winsorized as zero. If a sufficient number of the residuals fall into this category then a false solution may be returned, see Hampel et al. (1986).

10

Example

The following program reads in a set of data consisting of eleven observations of a variable

X

For this example, Hampels's Piecewise Linear Function is used (

psifun = Nag_HampelFun

), values for

h_{1}

h_{2}

and

h_{3}

along with

d

for the

χ

function, being read from the data file.

Using the following starting values various estimates of

θ

and

σ

are calculated and printed along with the number of iterations used:

(a)	nag_robust_m_estim_1var (g07dbc) determines the starting values, $σ$ is estimated simultaneously.
(b)	You supply the starting values, $σ$ is estimated simultaneously.
(c)	nag_robust_m_estim_1var (g07dbc) determines the starting values, $σ$ is fixed.
(d)	You supply the starting values, $σ$ is fixed.

NAG Library Function Document

nag_robust_m_estim_1var (g07dbc)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

1

Purpose

2

Specification

3

Description

4

References

5

Arguments

6

Error Indicators and Warnings

7

Accuracy

8

Parallelism and Performance

9

Further Comments

10

Example

10.1

Program Text

10.2

Program Data

10.3

Program Results