Does the Choice of Model or Benchmark Affect Inference in Measuring Mutual Fund Performance?
We address the practical question of whether investors and researchers are likely to make invalid inferences about fund manager performance when using the wrong model and/or benchmark. We consider three well-known models, those of Jensen (1968), Treynor and Mazuy (1966), and Henriksson and Merton (1981), and two commonly used timing benchmarks, the Samp;P 500 index and CRSP value-weighted index. Although prior studies recognize the possibility of model and benchmark misspecification, the existing literature does not explore empirically the existence, magnitude, and significance, if any, of potential inferential errors. Based on Monte Carlo simulations calibrated to real mutual fund data, we find that: (1) model misspecification results in severely biased measures of both selectivity and timing ability, especially for extreme (good and bad) performers; (2) but biases in measures of overall performance are economically insignificant; (3) benchmark misspecification results in qualitatively similar difficulties, with the addition that overall performance as well can be biased; and (4) model and benchmark misspecification do not appreciably alter the power to detect ability and distinguish a good fund from a bad fund. These results are robust to alternative asset pricing specifications, alternative simulation schemes, varying length of the return series, and periodicity of the simulated series. The use of daily fund returns amplifies our conclusions about the biases induced by model misspecifications. Moreover, the biases we identify appear to be difficult to correct by using standard model selection criteria and misspecification tests. If the benchmark is known but the timing model is not, investors should use measures of overall performance to evaluate funds and managers