A note on bias due to fitting prospective multivariate generalized linear models to categorical outcomes ignoring retrospective sampling schemes
Outcome-dependent sampling designs are commonly used in economics, market research and epidemiological studies. Case-control sampling design is a classic example of outcome-dependent sampling, where exposure information is collected on subjects conditional on their disease status. In many situations, the outcome under consideration may have multiple categories instead of a simple dichotomization. For example, in a case-control study, there may be disease sub-classification among the "cases" based on progression of the disease, or in terms of other histological and morphological characteristics of the disease. In this note, we investigate the issue of fitting prospective multivariate generalized linear models to such multiple-category outcome data, ignoring the retrospective nature of the sampling design. We first provide a set of necessary and sufficient conditions for the link functions that will allow for equivalence of prospective and retrospective inference for the parameters of interest. We show that for categorical outcomes, prospective-retrospective equivalence does not hold beyond the generalized multinomial logit link. We then derive an approximate expression for the bias incurred when link functions outside this class are used. Most popular models for ordinal response fall outside the multiplicative intercept class and one should be cautious while performing a naive prospective analysis of such data as the bias could be substantial. We illustrate the extent of bias through a real data example, based on the ongoing Prostate, Lung, Colorectal and Ovarian (PLCO) cancer screening trial by the National Cancer Institute. The simulations based on the real study illustrate that the bias approximations work well in practice.
Year of publication: |
2009
|
---|---|
Authors: | Mukherjee, Bhramar ; Liu, Ivy |
Published in: |
Journal of Multivariate Analysis. - Elsevier, ISSN 0047-259X. - Vol. 100.2009, 3, p. 459-472
|
Publisher: |
Elsevier |
Keywords: | 62F12 62H20 62H05 Choice-based sampling Colorectal adenoma Cumulative logit Link function Model mis-specification Ordered response |
Saved in:
Online Resource
Saved in favorites
Similar items by person
-
Graphical diagnostics to check model misspecification for the proportional odds regression model
Liu, Ivy, (2009)
-
Fitting stratified proportional odds models by amalgamating conditional likelihoods
Mukherjee, Bhramar, (2008)
-
Mukherjee, Bhramar, (2007)
- More ...