Order selection in finite mixtures of linear regressions
Finite mixture models can adequately model population heterogeneity when this heterogeneity arises from a finite number of relatively homogeneous clusters. An example of such a situation is market segmentation. Order selection in mixture models, i.e. selecting the correct number of components, however, is a problem which has not been satisfactorily resolved. Existing simulation results in the literature do not completely agree with each other. Moreover, it appears that the performance of different selection methods is affected by the type of model and the parameter values. Furthermore, most existing results are based on simulations where the true generating model is identical to one of the models in the candidate set. In order to partly fill this gap we carried out a (relatively) large simulation study for finite mixture models of normal linear regressions. We included several types of model (mis)specification to study the robustness of 18 order selection methods. Furthermore, we compared the performance of these selection methods based on unpenalized and penalized estimates of the model parameters. The results indicate that order selection based on penalized estimates greatly improves the success rates of all order selection methods. The most successful methods were <InlineEquation ID="IEq1"> <EquationSource Format="TEX">$$MDL2$$</EquationSource> <EquationSource Format="MATHML"> <math xmlns:xlink="http://www.w3.org/1999/xlink"> <mrow> <mi mathvariant="italic">MDL</mi> <mn>2</mn> </mrow> </math> </EquationSource> </InlineEquation>, <InlineEquation ID="IEq2"> <EquationSource Format="TEX">$$MRC$$</EquationSource> <EquationSource Format="MATHML"> <math xmlns:xlink="http://www.w3.org/1999/xlink"> <mrow> <mi mathvariant="italic">MRC</mi> </mrow> </math> </EquationSource> </InlineEquation>, <InlineEquation ID="IEq3"> <EquationSource Format="TEX">$$MRC_k$$</EquationSource> <EquationSource Format="MATHML"> <math xmlns:xlink="http://www.w3.org/1999/xlink"> <mrow> <msub> <mrow> <mi mathvariant="italic">MRC</mi> </mrow> <mrow> <mi>k</mi> </mrow> </msub> </mrow> </math> </EquationSource> </InlineEquation>, <InlineEquation ID="IEq4"> <EquationSource Format="TEX">$$ICL$$</EquationSource> <EquationSource Format="MATHML"> <math xmlns:xlink="http://www.w3.org/1999/xlink"> <mrow> <mi mathvariant="italic">ICL</mi> </mrow> </math> </EquationSource> </InlineEquation>–<InlineEquation ID="IEq5"> <EquationSource Format="TEX">$$BIC$$</EquationSource> <EquationSource Format="MATHML"> <math xmlns:xlink="http://www.w3.org/1999/xlink"> <mrow> <mi mathvariant="italic">BIC</mi> </mrow> </math> </EquationSource> </InlineEquation>, <InlineEquation ID="IEq6"> <EquationSource Format="TEX">$$ICL$$</EquationSource> <EquationSource Format="MATHML"> <math xmlns:xlink="http://www.w3.org/1999/xlink"> <mrow> <mi mathvariant="italic">ICL</mi> </mrow> </math> </EquationSource> </InlineEquation>, <InlineEquation ID="IEq7"> <EquationSource Format="TEX">$$CAIC$$</EquationSource> <EquationSource Format="MATHML"> <math xmlns:xlink="http://www.w3.org/1999/xlink"> <mrow> <mi mathvariant="italic">CAIC</mi> </mrow> </math> </EquationSource> </InlineEquation>, <InlineEquation ID="IEq8"> <EquationSource Format="TEX">$$BIC$$</EquationSource> <EquationSource Format="MATHML"> <math xmlns:xlink="http://www.w3.org/1999/xlink"> <mrow> <mi mathvariant="italic">BIC</mi> </mrow> </math> </EquationSource> </InlineEquation> and <InlineEquation ID="IEq9"> <EquationSource Format="TEX">$$CLC$$</EquationSource> <EquationSource Format="MATHML"> <math xmlns:xlink="http://www.w3.org/1999/xlink"> <mrow> <mi mathvariant="italic">CLC</mi> </mrow> </math> </EquationSource> </InlineEquation> but not one method was consistently good or best for all types of model (mis)specification. Copyright Springer-Verlag Berlin Heidelberg 2014