Analysis of the Left Censored Data from the Generalized
Sharmishtha Mitra & Debasis Kundu
Department of Mathematics & Statistics
Indian Institute of Technology Kanpur
The generalized exponential distribution proposed by Gupta and Kundu (1999) is an important lifetime distribution in survival analysis. In this paper, we consider the maximum likelihood estimation procedure of the parameters of the generalized exponential distribution when the data are left censored. We obtain the maximum likelihood estimators of the unknown parameters and also obtain the Fisher Information matrix. Simulation studies are carried out to observe the performance of the estimators in small sample.
KEYWORDS: Fisher Information, generalized exponential distribution, left censoring, maximum likelihood estimator.
The generalized exponential (GE) distribution (Gupta and Kundu; 1999) has the cumulative distribution function (CDF)
with the corresponding probability density function (PDF) given by
(？1xx？？；；, for . fxee(;,)1(；(；，？x！0;；
and are the shape and scale parameters respectively. GE distribution with the Here (；
shape parameter and the scale parameter will be denoted by . It is GE(,)(；(；
known that the shape of the PDF of the two-parameter GE distribution is very similar to the corresponding shapes of gamma or Weibull distributions. It has been observed in Gupta and Kundu (1999) that the two-parameter can be used quite effectively GE(,)(；
in analyzing many lifetime data, particularly in place of two-parameter Gamma or two-parameter Weibull distributions. The two-parameter can have increasing and GE(,)(；
decreasing failure rate depending on the shape parameter. The readers are referred to Raqab (2002), Raqab and Ahsanullah (2001), Zheng (2003) and the references cited there for some recent developments on GE distribution.
Although several papers have already appeared on the estimation of the parameters of GE distribution for complete sample case, see for example the review article of Gupta and Kundu (2006b), but not much attention has been paid in case of censored sample. The main aim of this is to consider the statistical analysis of the unknown parameters when the data are left censored from a GE distribution. We obtain the maximum likelihood estimators (MLEs) of the unknown parameters of the GE distribution for left censored data. It is observed that the MLEs can not be obtained in explicit form and the MLE of the scale parameter can be obtained by solving a non-linear equation. We propose a simple iterative scheme to solve the non-linear equation. Once the MLE of the scale parameter is obtained, the MLE of the shape parameter can be obtained in explicit form. We have also obtained the explicit expression of the Fisher information matrix and it has been used to construct the asymptotic confidence intervals of the unknown parameters. Extensive simulation study has been carried to observe the behavior of the proposed methods for different sample sizes and for different parameter values and it is observed that the performances of the proposed methods are quite satisfactory.
There is a widespread application and use of left-censoring or left-censored data in survival analysis and reliability theory. For example, in medical studies patients are subject to regular examinations. Discovery of a condition only tells us that the onset of sickness fell in the period since the previous examination and nothing about the exact
date of the attack. Thus the time elapsed since onset has been left censored. Similarly, we have to handle left-censored data when estimating functions of exact policy duration without knowing the exact date of policy entry; or when estimating functions of exact age without knowing the exact date of birth. A study on the “Patterns of Health Insurance
Coverage among Rural and Urban Children” (Coburn, McBride and Ziller, 2001) faces
this problem due to the incidence of a higher proportion of rural children whose spells were "left censored" in the sample (i.e., those children who entered the sample uninsured), and who remained uninsured throughout the sample. Yet another study (Danzon, Nicholson and Pereira, 2004) which used data on over 900 firms for the period 1988-
(phases 1, 2 and 3) biotech and 2000 to estimate the effect on phase-specific
pharmaceutical R&D success rates of a firm‟s overall experience, its experience in the relevant therapeutic category, the diversification of its experience across categories, the industry‟s experience in the category, and alliances with large and small firms, saw that
the data suffered from left censoring. This occurred, for example, when a phase 2 trial was initiated for a particular indication where there was no information on the phase 1 trial. Application can also be traced in econometric model, for example, for the joint determination of wages and turnover. Here, after the derivation of the corresponding likelihood function, an appropriate dataset is used for estimation. For a model that is designed for a comprehensive matched employer-employee panel dataset with fairly detailed information on wages, tenure, experience and a range of other covariates, it may be seen that the raw dataset may contain both completed and uncompleted job spells. A job duration might be incomplete because the beginning of the job spells is not observed, which is an incidence of left censoring (Bagger, 2005). For some further examples, one may refer to Balakrishnan (1989), Balakrishnan and Varadan (1991), Lee et al. (1980),
The rest of the paper is organized as follows. In Section 2 we derive the maximum likelihood estimators of in the presence of left censoring. In Section 3, we GE(,)(；
provide the complete enumeration of the Fisher Information matrix and discuss certain issues on the limiting Fisher information matrix. Simulation results and discussions are provided in Section 4.
2. MAXIMUM LIKELIHOOD ESTIMATION
In this section, maximum likelihood estimators of the are derived in presence GE(,)(；
of left censored observations. Let be the last order statistics from a XX,...,nr？(1)()rn？
random sample of size following distribution. Then the joint probability GE(,)(；n
density function of is given by XX,...,(1)()rn？
Then the log likelihood function denoted by (or simply,) is Lxx,...,;,(；L(；,;；;；(1)()rn？
The normal equations for deriving the maximum likelihood estimators become
n?？Lnr？？；；xx(1)()ri？ln1ln10(2.3)，？？？？，ree;；;；?((?1，？ir ？x；()innxe?？Lnrr(？x()i；(1)r？and (1)0.(2.4)，？？？？，xex(??(1)()ri？？？xx(1)()ri？；；?11？？ee；；11irir，？，？
From (2.3), we obtain the maximum likelihood estimator of as a function of , say (；ˆ where (；(),
nr？ˆ (2.5) (；()，？n？？；；xx(1)()ri？ln1ln1ree？？？;；;；?1，？ir
ˆPutting in (2.2) we obtain the profile log-likelihood on as (；()；
where, in (2.6) is a constant independent of Thus, the maximum likelihood k；.
ˆestimator of , say , can be obtained by maximizing (2.6) with respect to The ；；；.MLE
maximizing can be obtained (Gupta and Kundu; 1999b) from the fixed point solution ；
, (2.7) h()；；，
?g()；where, is obtained from the fact that and is given by h()；，0?；
？1？？；；xx(1)()ri？n?：rxexe(1)()ri？？?，?？？xxn；；(1)()ri？x1()i11ee？？ir，？1?，；().h，？ (2.8) ?？xn；()i?，？？xxnr？1？eir，？1(1)()ri？；；ln1ln1ree？？？;；;；??，?，ir，？1(?
ˆ the We apply iterative procedure to find the solution of (2.7). Once we obtain ；,MLE
ˆmaximum likelihood estimator of say can be obtained from (2.5) as ((
3. APPROXIMATE AND LIMITING FISHER INFORMATION
3.1 APPROXIMATE FISHER INFORMATION MATRIX
In this sub-section, we first obtain the approximate Fisher information matrix of the unknown parameters of GE distribution when the data are left censored, which can be used to construct asymptotic confidence intervals. The Fisher information matrix
can be written as follows; I(,)(；
222)?ELEL((；?????;；;；1?? (3.1) I(,).，？(；222n??ELEL?????；(；;；;；，?
Note that the elements of the Fisher Information matrix can be written as;
？？；；XX(1)()ri？22n?：rXeXe?：?：LL??(1)()ri？ (3.2) EEE，？，?，??，?，？？XX；；(1)()ri？?，(；；(????11ee？？1ir，？(?(?(?
?：？？；；XX22(1)()ri？2n(rXeXe?：?？Lnr(1)()ri？?，EE，？？？？1(. (3.3) ;；?，?2222?，？？XX；；(1)()ri？?；；ir，？1(??，11？？ee;；;；(?
Thus to compute (3.2) and (3.3) we are required to obtain explicit expressions of expectations of the forms
?：？；X？；X2()i()i?：XeXe()i?，()ithfor 1,...,Eirn，？ and . Note that the density of the i E?，2？X；?，()i？X?，；()i1e？?，1？e;；(?(?
order statistic from a random sample of size n following the distribution is GE(,)(；
((nk？？2ni？nink？？？((2C?：?：ni,niknkl？？？？？2(( (1)(1)，？？？(???，?，kl；kl，，00(?(? 1nkl？？？1(( ln~yydy?0
1~？？！(( ; for 0 (3.4)nkl2？？((()nkl
2 ~？？！(( ; for 0 (3.5)nkl3？？((()nkl
(3.4) and (3.5) are obtained using the fact that
For , in，
3.2 LIMITING FISHER INFORMATION MATRIX
In this sub-section we explore the asymptotic efficiency and hence attempt to obtain the
rlimiting information matrix when converges to, say, which lies in . For the p(0,1)n
left censored observations at the time point T, it has been observed by Gupta, Gupta and
Sankaran (2004) that the limiting Fisher information matrix can be written as
bb?：1112 (3.6) I(,)(；，?，bb2122(?
fx(;)，，，and the reversed hazard function. Moreover, it is also (,),(,),rx，(；，Fx(;)，
known, see Zheng and Gastwirth (2000), that for location and scale family, the Fisher information matrix for Type-I and Type-II (both for left and right censored data) are
asymptotically equivalent. It is also mentioned by Zheng and Gastwirth (2000) that for general case (not for location and scale family) the results for Type-II censored data (both for left and right) of the asymptotic Fisher information matrices are not very easy to obtain. Unfortunately, the GE family does not belong to the location and scale family and we could not obtain the explicit expression for the limiting Fisher information matrix in this case. Numerically, we have studied the limiting behavior of the Fisher
(assuming it is very large) and compare them information matrix by taking n，5000
with the different small samples and different „p‟ values. The numerical results are
reported in Section 4.
4. NUMERICAL RESULTS AND DISCUSSIONS
In this section we report extensive simulation results for different sample sizes, for different parameter values and for different censored proportions. We mainly observe the performance of the proposed MLEs and the confidence intervals based on the asymptotic distribution of the MLEs. The performance of MLEs are based on their means squared errors (MSEs) and the performance of the confidence intervals are based on the coverage percentages (CPs).
We begin with the generation of the random sample. Note that, if is a GE(；,U;；
1(random variable following an Uniform distribution in then [0,1],XU，？？ln1；;；;；
follows. Using the uniform random number generator, the generation of the GE(；,;；
GE random deviate is immediate. We consider different sample sizes ranging from small to large. Since is the scale parameter and the MLE is scale invariant, without loss of ；
generality, we take in all our computations and consider different values of . We ；，1(
report the average relative estimates and the average relative MSEs over 1000 replications for different cases.
We compute the maximum likelihood estimates when both the parameters are unknown. ˆˆ can be obtained from the fixed point solution of (2.7) and can be obtained from (；()；
(2.5). We consider the following sample sizes , whereas for n，15, 20, 50, 100(
different sample sizes are taken as For left censoring, we ( = 0.25, 0.5, 1.0, 2.0 and 2.5.
leave out the first 10% and 20% of the data in each of the above cases of different combinations of and Throughout, we consider and for each combination of ；，1n(.
and generate a sample of size from and estimate and in the case GE(,1n(n(；;；
ˆof left censoring of the given data of given order. We report the average values of , ((;；
ˆcalled the relative estimates, (also its relative estimate since the true parameter is ；
and also the corresponding average MSEs. All reported results are based on 1000 ；，1)
replications. Furthermore, using the asymptotic covariance matrix we obtain the average lower and the upper confidence limits of the estimates of both the shape and the scale parameters and also report the estimated coverage probability, computed as the proportion of the number of times, out of 1000 replications, the estimated confidence interval contains the true parameter value. The results corresponding to the shape parameter , for various sample sizes are reported in Tables 1-4 and the results for the (
scale parameter are presented in Table 5-8. ；
Table 1. Average relative estimates, average relative MSEs, confidence limit and coverage
probability of when is unknown (15)n，(；
(No. of observations Average relative MSE Average Average Coverage
in left censoring estimate LCL UCL probability
0.25 3 1.2133 0.2896 0.1025 0.5041 0.9650
2 1.2090 0.2816 0.1128 0.4917 0.9630
0.50 3 1.2833 0.5381 0.1563 1.1270 0.9750
2 1.2591 0.4321 0.1863 1.0728 0.9750
1.00 3 1.3991 0.9235 0.1245 2.6736 0.9700
2 1.3503 1.0461 0.2184 2.4822 0.9750
2.00 3 1.8132 2.2276 0.0000 8.5207 0.9580
2 1.4857 1.4957 0.0000 6.0154 0.9680
2.50 3 1.7451 3.7330 0.0000 9.8770 0.9620
2 1.4940 3.0090 0.0000 7.8995 0.9470