17.6.2.2 Algorithms (Cox Proportional Hazard Regression)

Let $t_i\,\!$,for i = 1, 2, ?, n, be the failure time or censored time for the ith observation with the vector of p covariates $Z_j(j=1,2,\ldots ,p)$. It is assumed that the failure and censored mechanisms are independent. The hazard function, $\lambda (z,t)\,\!$ , is the probability that an individual with covariates z fails at time t given that the individual survived up to time t. In the Cox proportional model is of the form:

$\lambda (z,t)=\lambda _0(t)\exp (z^{T}\beta +\omega )\,\!$

where $\lambda _0\,\!$ is the base-line hazard function, an unspecified function of time, $\beta \,\!$ is a vector of unknown parameters and $\omega\,\!$ is a known offset.

Assuming there are ties in the failure time giving $n_d < n\,\!$ distinct failure times, $t_{(1)} < t_{(2)} < ?< t_{(nd)}$ , such that $d_i\,\!$ individuals fail at $t_{(i)}\,\!$ , it follows that the marginal likelihood for $\beta$ is well approximated by:

 $L=\prod_{i=1}^{n_d}\frac{\exp (s_i^{T}\beta +\omega _i)}{[\sum_{l\in R(t_{(1)})}\exp (z_i^{T}\beta +\omega _i)]^{d_{i}}}$ (1)

where $s_i\,\!$ is the sum of covariate of individuals observed fail at $t_{(i)}\,\!$ and $R(t_{(i)})\,\!$ is the set of individuals at risk just prior to $t_{(i)}\,\!$ , that is it is all the individuals that fail or censored at time $t_{(i)}$ along with all individuals survived beyond the time $t_{(i)}\,\!$ . The MLE (maximum likelihood estimates) of $\beta\,\!$, given by$\hat \beta\,\!$, are obtained by maximizing (1) using a Newton-Raphson iteration technique that includes step having and utilizes the first and second partial derivatives of (1) which are given by (2) and (3) below:

 $U_j(\beta )=\frac{\partial Ln(L)}{\partial \beta _j}=\sum_{i=1}^{n_d}[s_{ji}-d_i\alpha _{ji}(\beta )]=0$ (2)

for j = 1, 2, ?, p, where $s_{ji}\,\!$ is the jth element in the vector $s_i\,\!$ and

$\alpha _{ji}(\beta )=\frac{\sum_{l\in R(t_{(1)})}z_{jl}\exp (z_l^{T}\beta +\omega _l)}{\sum_{l\in R(t_{(1)})}\exp (z_l^{T}\beta +\omega _l)}$

Similarly,

 $I_{hj}(\beta )=-\frac{\partial ^2Ln(L)}{\partial \beta _h\partial \beta _j}=\sum_{i=1}^{n_d}d_i\gamma _{hji}$ (3)

where $\gamma _{hji}=\frac{\sum_{l\in R(t_{(1)})}z_{hl}z_{jl}\exp (z_l^{T}\beta +\omega _l)}{\sum_{l\in R(t_{(1)})}\exp (z_l^{T}\beta +\omega _l)}-\alpha _{hi}(\beta )\alpha _{ji}(\beta )$ h, j = 1, ? p.

$U_j(\beta )\,\!$ is the jth component of a score vector $I_{hi}(\beta )\,\!$ is the (h, j) element of the observed information matrix $I(\beta )\,\!$ whose inverse $I(\beta )^{-1}=I_{hi}(\beta )^{-1}\,\!$ gives the variance-covariance matrix of $\beta\,\!$.

It should be noted that if a covariate or a linear combination of covariates is monotonically increasing or decreasing with time ,then one or more of the $\beta _j^{\prime }s$ will be infinite.

If $\lambda _0(t)\,\!$ varies across $\nu\,\!$ strata, where the number of individuals in the kth stratum is $n_k\,\!$, k = 1, ?, $\nu\,\!$, with $n=\sum_{k=1}^\nu n_k$ , then rather than maximizing (1) to obtain $\hat \beta\,\!$, the following marginal likelihood is maximized:

 $L=\prod_{k=1}^\nu L_k$ (4)

where $L_k\,\!$ is the contribution to likelihood for the $n_k\,\!$ observations in the kth stratum treated as a single sample in (1). When strata are concluded the covariate coefficients are constant across strata but there is a different base-line hazard function $\lambda _0(t)\,\!$.

The base-line survival function associated with a failure time $t_{(i)}\,\!$ , is estimated as

$exp(-\hat H(t_{(i)}))$ ,

where $\hat H(t_{(i)})=\sum_{t(j)\leq t(i)}(\frac{d_i}{\sum_{l\in R(t_{(j)})}\exp (z_l^T\hat \beta +\omega _l)})$

and $d_i\,\!$ is the number of failures at time $t_{(i)}\,\!$ . The residual of the lth observation is computed as:

$r(t_l)=\hat H(t_l)\exp (-z_l^T\hat \beta +\omega _l)$

where $\hat H(t_l)=\hat H(t_{(i)}),t_{(i)}\leq t_l .

The deviance is defined as $-2^*\,\!$ (logarithm of marginal likelihood). There are two ways to test whether individual covariates are significant: the differences between the deviances of nested models can be compared with appropriate $\chi ^2\,\!$-distribution; or, the asymptotic normality of the parameter estimates can be used to form z-test by dividing the estimates by their standard errors or the score function for the model under the null hypothesis can be used to form z-test.