Forecasting with a Real-Time Data Set for Macroeconomists
This paper discusses how forecasts may be affected by the use of real-time data rather than latest-available data. The key issue is this: In the literature on developing forecasting models, new models are put together based on the results they yield using the data set available to the model developer. But those aren't the data that were available to a forecaster in real time. How much difference does the vintage of the data make for such forecasts? We explore this issue with a variety of exercises designed to answer this question. In particular, we find that real-time data matters for some forecasting issues but not for others. It matters for choosing lag length in a univariate context. It may matter considerably for a short-horizon forecast, though is less important for longer-horizon forecasts. Preliminary evidence suggests that the span--or number--of forecast observations used to evaluate models may also be critical: we find that standard measures of forecast accuracy can be vintage-sensitive when constructed on the short spans (5 years of quarterly data) of data sometimes used by researchers for forecast evaluation. The differences between using real-time and latest-available data may depend on what's being used as the "actual" or realization, and we explore several alternatives that can be used. Perhaps of most importance, we show that measures of forecast error, such as root-mean-squared error and mean absolute error can be deceptively lower when using latest-available data rather than real-time data. Thus, developing a model using latest-available data is questionable; model development may be much better if it's based on real-time data.