This section gives three real-data examples. The data for all three were taken from textbooks on ARIMA or other forecasting methods, and were presumably included in those books at least partly because they cast those methods in a favorable light. Yet in all three cases I found regression models that fitted the data substantially better than the original textbook methods did, by the criterion of adjusted root mean square error. This figure is called the standard error of estimate in many regression packages, so we shall denote it SEE. SEE avoids giving an unfair advantage to models with more terms--an important point since our models all have more terms than the competing models. The regression models are also more interpretable than the competing models.

To avoid the charge that I engaged in too much post hoc model selection, I defined a "standard" model which included TIME, TIME*TIME, TIME*TIME*TIME, and the averaged lagged terms A1-A6 described earlier, plus whatever seasonal or cyclical terms were used in the competing ARIMA models. This standard model fitted each of these three data sets better than any ARIMA method, even though in each case up to 30 ARIMA models were applied to the same data. I then proceeded to engage in the same general sort of post hoc model selection that is commonly done in any area of model-building. That post hoc process is described below separately for each data set.

Housing Permits

The first example is Case 4 from Pankratz (1983). It concerns the number of permits for housing construction issued in the US for each of the 84 quarters from 1947 through 1967. Pankratz comments, "This is an especially challenging series to model" (p. 369), though it gave me no trouble. Surprisingly, Pankratz found no substantial seasonal effect in this series, and I confirmed that finding.

Of the several ARIMA models Pankratz tried, the one with the smallest SEE included two AR terms and two MA terms plus a constant, and showed SEE = 6.72. After observing SEE = 6.22 with my standard model, I removed the quadratic and cubic terms because they contributed little. I then applied stepwise regression with default options to a model with a constant, TIME, and 10 lagged terms B1-B10. The stepwise program selected a model with a constant, TIME, and B1, B3, B4, B6, B7, B10, and showed SEE = 5.40, well below either previous value. The stepwise program did not seem to capitalize on chance substantially more than Pankratz did, especially since the absolute values of t for the six lagged terms in the stepwise model were all over 3.0.

The presence of several long-lag variables in this model indicates that the model is in effect fitting local slopes. For instance, a model which gave B1 a positive weight (as this model did) and B10 a negative weight (as it also did) would be doing something rather similar to fitting a straight line to those two points and projecting it to the right to make the forecast.

College Enrollment

The second example is Case 14 from Pankratz (1983). It concerns the number of students enrolled in colleges nationwide for the 54 semesters from fall 1954 through spring 1981. After much analysis, Pankratz ended up differencing twice and then fitting a model with 3 MA terms plus a seasonal term to allow for the fact that college enrollments are usually higher in the fall semester than in the spring. This model showed SEE = 31.44 (in thousands of students). My standard model, with the same seasonal term Pankratz used, showed SEE = 26.72.

The inclusion of linear, quadratic, and cubic terms for TIME may seem like overkill for a small data set, but in the various models I tried with this data set, even the cubic term remained highly significant; in the final model described below it showed t = -5.484, df = 42, p = .000002. Due partly to the large semester effect, the cubic effect is hardly visible in this data set until one looks for it, at which point it does become visible. That is, the series rises visibly to about 1965, then falls to about 1970, then rises.

I also added 5 lagged variables B1 through B5. Testing them as a set yielded p = .0054. After some experimentation I dropped B1, B2, and B4, leaving B3 and B5. The t's for these variables were respectively -3.74 and -2.72, and testing these two variables as a set yielded p = .00030. These results seem stronger than could easily be explained by chance selection of two variables from five; the number of ways of selecting two items from 5 is only 10. The final model showed SEE = 25.03, well below the ARIMA value of 31.44.


The third example uses a 41-case series on sales of an unspecified product. The data are from Gilchrist (1976, p. 64), who had not applied ARIMA to these data. I fitted a large number of ARIMA models to this series. For those familiar with ARIMA terminology, I fitted AR1, AR2, AR3, MA1, MA2, and MA3 models, plus ARMA11, ARMA12, ARMA21, and ARMA22 models. Because of the strong downward trend in the data, differencing seemed customary within the ARIMA context. I thus fitted each of these 10 models with 0, 1, and 2 levels of differencing, making 30 models altogether. For several of these models the ARIMA program refused to run, or warned that results may be unreliable, but of the others the lowest SEE that I found was .145, for an MA2 model with two levels of differencing. My standard model yielded SEE = .086. I was unable to improve on this model.

Again the nonlinear polynomial terms were highly significant. The cubic nature of the curve is clear once it is pointed out; the trend falls steeply, then less steeply, then more steeply again. This series also illustrates the substantive reasonableness of using polynomial terms. If you had to project the next observation in a series like this (that falls steeply, then less steeply, then more steeply again), would you rather fit a straight line to the entire series, or take advantage of the fact that the last part of the curve is falling even more steeply than the curve as a whole? A polynomial essentially does the latter.


In all three examples regression models fitted data substantially better than ARIMA models did, even though two of the data sets had been taken from an ARIMA textbook. These were not the best of many examples I tried; they are nearly the only examples I have tried so far.

In the housing-permit example the best model used both short-lag and long-lag terms. In the college-enrollment and sales examples the best models used both polynomial terms and long-lag terms. In the sales example A-terms worked best for lag-terms, while in the housing and college examples B-terms worked best. It may be that each observation in the sales example included more random error, though A-terms are especially recommended for small sample sizes and the sales data set was the smallest of the three.


Box, George E. P and Gwilym M. Jenkins (1976). Time Series Analysis: forecasting and control. Oakland, Calif: Holden-Day

Darlington, Richard (1968). Multiple regression in psychological research and practice. Psychological Bulletin, vol 69, 161-182

Darlington, Richard (1990). Regression and Linear Models. New York: McGraw-Hill

Gilchrist, Warren (1976). Statistical Forecasting. New York: Wiley

Pankratz, Alan (1983). Forecasting with Univariate Box-Jenkins Models: concepts and cases. New York: Wiley