Variable Selection for Large Unbalanced Datasets Using Non-Standard Optimisation of Information Criteria and Variable Reduction Methods
We consider forecasting key macroeconomic variables using many predictors extracted from the Eurostat PEEIs dataset. To avoid the curse of dimensionality, we rely on model selection and model reduction. For model selection we use heuristic optimisation of information criteria, including simulated annealing, genetic algorithms, MC^3 and sequential testing. For model reduction we employ the methods of principal components, partial least squares and Bayesian shrinkage regression. The problem of unbalanced datasets is discussed and potential solutions are suggested. We provide adequate evidence that these methods could be useful in forecasting. Their predictive performance is evaluated in a pseudo out-of-sample exercise, comparing the results relative to a univariate AR(1) benchmark. Our findings are very encouraging for forecasting the growth rate of quarterly consumption and GDP, and monthly industrial production and inflation