ROBUST ESTIMATION BASED ON THE UNIVARIATE GENERALIZED T (GT) DISTRIBUTION

 

 

Olcay Arslan and Ali İ. Genç

 

 

Department of Mathematics

Çukurova University

01330, Balcalı, Adana
Turkey

 

Key Words : Univariate GT Distribution; Box and Tiao (BT) Distribution; Generalized Gamma (GG) Distribution; Maximum Likelihood Estimation; Robustness; EM Algorithm

 

ABSTRACT

 

     In this paper, we consider the univariate generalized t distribution (GT), introduced by McDonald and Newey [1]. We derive the maximum likelihood estimators which can be considered as alternative redescending M-estimators for location and scale parameters of a univariate data set. We give an iteratively reweighting (IR) algorithm to compute the location and scale estimates and show that this algorithm can be identified as an EM algorithm. We give some examples to illustrate the performance of the location and scale estimators based on the GT distribution.

 

 

1. INTRODUCTION

 

     McDonald and Newey [1] introduced the univariate GT distribution as an alternative to the normal and t distributions for modeling errors in the regression. They used the GT distribution to develop robust partially adaptive estimation procedure. This procedure includes least squares, LAD (Least Absolute Deviation), Lp and several other estimators as special cases.

     Statistical distributions of returns on financial instruments such as stocks, bonds or options play a central role in much of the financial literature. The GT family applies to symmetric, fat-tailed distributions and therefore adjusts for the leptokurtosis of the nonnormal security returns. McDonald and Nelson [2] and Butler, McDonald, Nelson and White [3] applied the GT-based partially adaptive estimation procedure for the robust estimation of the market model. Partially adaptive estimates of ARMA time series models based on the GT distribution were given in McDonald [4], and applications of the GT to U.S. stock index returns were presented in Bollerslev, Engle and Nelson [5, pp. 3017-3027].

     Besides its usefulness in statistical economics, the GT distribution has not been received much interest in statistical literature. Although it has been widely used as an alternative robust modeling distribution to the normal distribution, its theoretical properties has not been considered much. Recently, Arslan and Genç [6] have investigated the existence and uniqueness of the solutions to the maximum likelihood estimating equations of the GT distribution.

     One of the main objective of this paper is to use the GT distribution to find estimates for the location and scale parameters of a univariate data set. We will model our data set with the GT distribution with known shape parameters and unknown location and scale parameters. The maximum likelihood estimators for the location and scale parameters of the GT distribution will provide alternative robust estimators for the location and scale of a univariate data set. The maximum likelihood estimating equations can be viewed as a set of redescending M-estimating equations.

     Like most of the robust estimates, location and scale estimates based on the GT distribution cannot be computed explicitly. Numerical methods have to be used to find estimates . Besides some fast convergent algorithms an IR algorithm can be updated from the estimating equations. Further this IR algorithm can be easily identified as a well-known EM algorithm.

     The paper is organized as follows. Section 2 introduces the GT distribution. Section 3 derives the maximum likelihood estimating equations. In Section 4, an IR algorithm to compute the maximum likelihood estimates and its relation with the EM algorithm are given. In Section 5 some examples based on the four different data sets are given to demonstrate the performance of the GT M-estimators on the other robust location and scale estimators. Paper is finalized with a conclusions section.

 

2. THE GENERALIZED T (GT) DISTRIBUTION

 

     Arslan and Genç [6] show that the  distribution is the scale mixture of a BT and the  distribution. They obtained the  distribution as the ratio of two independent random variables. Their result can be summarized as follows.

     Let the random variable U have a BT(u; p) distribution with the shape parameter  and let the random variable T have a GG(t; p/2,1, q) distribution with the shape parameters , 1 and . Assume that U and T are independent. Then the random variable

X=                                              (1)

has a  distribution with the density function

,     (2)

where B(.) denotes the beta function,  is a scale parameter, Ñ is a location parameter and , are shape parameters.

     The details of this result can be found in [6].

     As we vary the parameters p and q we obtain densities with very different tail behavior. Larger values of  and  are associated with thinner tails of the density. Similarly , smaller values of  and  correspond to thicker tails. It can be easily shown that the density function (2) is symmetric and unimodal.

     The  family includes several important distributions as special or limiting cases [1]. For example, for the case  we get the usual t distribution with the degrees of freedom . In this case the location and scale parameters are  and , respectively. The density function of the  distribution approaches the  and uniform  as  and , respectively.

     The moments of the GT distribution about the origin can be obtained easily. Since the density of the GT distribution is symmetric, the odd ordered moments are zero. The even ordered moments are

,                          (3)

if . For the standard GT density the expected value is E(X)=0 and the variance is

,                              (4)

if  [1].

 

3. MAXIMUM LIKELIHOOD ESTIMATION

 

     Let  be a data set in Ñ. Suppose we model  with the  distribution with known shape parameters and unknown location and scale parameters. We would like to find estimates for the location and scale parameters  and  of the  distribution. The likelihood function up to a scalar constant is

,                      (5)

and the corresponding log-likelihood function is

.               (6)

Taking derivatives of (6) with respect to  and , and setting them to zero give the following estimating equations.

,          (7)

.              (8)

Note that  exists for . (When p<1, it can be defined only at the points .)

The equations (7) and (8) can be rewritten as

,                                            (9)

,                                     (10)

where

.                                    (11)

Here the weight function , where , is a nonnegative, decreasing function of s so that the outlying observations will receive very small weights and the effect of outlying observations on the estimators will be reduced. It can be easily seen that . The equations given in (9) and (10) can be considered as redescending M-estimators for  and .

     Let , where m is the number of coincident observations. Then  and  are the sufficient conditions for the likelihood function for the location and the scale parameters of the  distribution to have a unique maximum in the parameter space ÑxÑ+ (See 6.)

     If  and  does not hold, the uniqueness of the maximum cannot be guaranteed, and the likelihood function may have other critical points : local maxima or saddle points. However, we will never get any local minima (See 6.)

     In the location-only case (scale parameter being fixed), the likelihood function has always a local maximum in the parameter space Ñ. Especially, if the parameter q is chosen small enough then it will have n local maxima, each close to one sample point, and n-1 local minima. The likelihood function will have cusps at each observation when  (See 6.)

     In the scale-only case (location parameter being fixed) the necessary and sufficient condition for the likelihood function to have a unique maximum in the parameter space  is , where  (See 6.)

 

4. COMPUTATION OF THE ESTIMATES

 

4.1. The IR Algorithm

     The estimating equations (9) and (10) give no closed expressions for  and  since the weights  depend on the parameters to be estimated. Thus a numerical method is needed to compute the estimates. Newton-Raphson type algorithms can be proposed to find the solution to the estimating equations (9) and (10). However, one simple and natural choice is the following IR algorithm suggested by the updating equations (9) and (10).

,                                          (12)

,                                   (13)

where , and is the iteration number.

 

4.2. Relation Between the IR and the EM Algorithms

     The EM algorithm (Dempster, Laird and Rubin [7]) is an iterative computation method to find a maximum likelihood estimate when data can be conveniently viewed as incomplete.

     Suppose we observe X directly but we regard T as missing in X=.Thus the data set  can be viewed as incomplete with the missing information  so that  will be regarded as the complete data. Thus the joint density of the random variables X and T will be the density of the complete data. That is;

,                    (14)

where .

The complete data log-likelihood function (ignoring the constant) is

                           

                                          .      (15)

     In the E-step of the EM algorithm we take the conditional expectation of (15) given x,and . This conditional expectation is

|.                (16)

     In the M-step of the algorithm, we maximize | with respect to  and . The solution of the equations  yields the equations given in (12) and (13). Therefore we have obtained that the IR algorithm derived from the estimating equations (9) and (10) is an EM algorithm.

 

5. EXAMPLES

 

     To see the performance of the estimators from the GT distribution, we have used two artificial samples and two real data sets. We have modeled each data set by T and GT distributions for a comparison. Table 1 and Table 2 show the results of the estimates for the location and the scale parameters for the data sets obtained from the T and GT distributions with known shape parameters. The estimates for the GT distribution have been calculated using the IR algorithm given above. For the T distribution a similar algorithm is given in [8], as labeled Algorithm I. Note that the algorithm given in this paper give the same algorithm as given in [8] when we set p=2. The convergency behavior of this algorithm is under consideration. From our limited experience the convergency rate of the algorithm is very similar to that of given in [8].

     Sample 1 consists of 20 normal N(0,1) and 5 normal N(10,1) random numbers. Sample 2 consists of 20 normal N(0,1) and 10 normal N(10,1) random numbers. The real data sets are from Cushney & Peebles (1905) and Rosner (1977), which are often used in the literature to illustrate various robust estimators of location. Cushney & Peebles data consist of the differences of excess hours of 10 patients’ sleep under the influence of two different drugs (Fig.1). Rosner data consist of 10 monthly diastolic blood pressure measurements (Fig.2).

 

Table 1. Location and scale estimates of two samples generated from the normal distribution (mve= minimum volume ellipsoid estimate for the univariate data).

 

                               Sample1                      Sample 2

Distribution

Normal

1.8074

15.3355

3.2414

24.0842

Cauchy

.0603

.9658

.0560

1.7597

T()

.1118

1.9399

.7609

8.4400

T()

.1045

.5076

-.0385

0.6938

GT(p=1.6,q=1)

.0943

2.7244

.2130

7.5065

GT(p=1.6,q=.3)

.1234

.9466

-.0508

1.3253

GT(p=2.6,q=.4)

.0302

1.9747

.0896

3.6265

GT(p=2.8,q=.3)

.0235

1.6004

.0330

2.5647

GT(p=2.8,q=.2)

.0530

1.0989

-.0410

1.4602

GT(p=2.9,q=.1)

.1482

.5441

-.0403

.6521

mve

-.0848

1.1542

-.1597

.9720

 

Table 2. Location and scale estimates of two real data sets (mve= minimum volume ellipsoid estimate for the univariate data).

 

                  Cushney & Peebles              Rosner

Distribution

Normal

1.5800

1.3616

82.2000

232.3600