Estimation Procedures for Multinomial Models with Overdispersion

Author/Creator ORCID

Date

2008-01-01

Department

Mathematics and Statistics

Program

Statistics

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan through a local library, pending author/copyright holder's permission.

Abstract

The phenomenon of overdispersion arises when categorical or count data exhibit variability larger than expected. Multinomial data commonly demonstrate this phenomenon when the counts arise from correlated or cluster observations. There are several ways of analyzing overdispersed multinomial data. Likelihood approaches include the Dirichlet-Multinomial distribution by Mosimann (1962) and the Finite-Mixture distribution by Morel and Nagaraj (1993) and Neerchal and Morel (1998). Moment based approaches, such as Generalized Estimating Equations (GEE) by Liang and Zeger (1986) and Zeger and Liang (1986) and Quasi-Likelihood estimation by Wedderburn (1974) are also available. The likelihood methods require full knowledge of the data generating process and are computationally intensive. The Dirichlet-Multinomial and Finite-Mixture likelihood models and their computational challenges are discussed in detail by Neerchal and Morel (2005). The moment based approaches only assume knowledge of the moment structure of the data and can be computed easily, but are less efficient than their maximum likelihood counterparts. The GEE and Quasi-Likelihood approaches only require correct specification of the first two moments. The first moment is used to obtain parameter estimates and then the variance-covariance structure is used to estimate the corresponding standard errors. The standard errors are computed either by inflating the standard errors under the assumption of independence as in the case of Quasi-Likelihood estimation, or through the use of a robust or empirical variance estimator as outlined in the GEE methodology. Although this accounts for overdispersion, we will show that these methods tend to provide inflated standard errors. In this thesis, we present extensions of GEE which incorporate higher order moment structures. By introducing this additional information in the estimation procedure we are able to obtain estimates and standard errors that account for the overdispersion without overly inflating the standard errors. In particular, we apply this method for multinomial data and show that this estimator is more efficient than the standard GEE and Quasi-Likelihood approaches while still maintaining the computational appeal of the moment based approaches. In addition, we provide a generalized form of this estimator that has the ability to model multiple sources of correlation. We provide the asymptotic properties and several results on the efficiency of the estimators. Finally, we demonstrate these methods via simulation studies and several practical examples.