To avoid the charge that I engaged in too much post hoc model selection, I defined a "standard" model which included TIME, TIME*TIME, TIME*TIME*TIME, and the averaged lagged terms A1-A6 described earlier, plus whatever seasonal or cyclical terms were used in the competing ARIMA models. This standard model fitted each of these three data sets better than any ARIMA method, even though in each case up to 30 ARIMA models were applied to the same data. I then proceeded to engage in the same general sort of post hoc model selection that is commonly done in any area of model-building. That post hoc process is described below separately for each data set.

Of the several ARIMA models Pankratz tried, the one with the smallest
SEE included two AR terms and two MA terms plus a constant, and showed
SEE = 6.72. After observing SEE = 6.22 with my standard model, I
removed the quadratic and cubic terms because they contributed little. I
then applied stepwise regression with default options to a model with a
constant, TIME, and 10 lagged terms B1-B10. The stepwise program selected
a model with a constant, TIME, and B1, B3, B4, B6, B7, B10, and showed
SEE = 5.40, well below either previous value. The stepwise program did not
seem to capitalize on chance substantially more than Pankratz did, especially
since the absolute values of *t* for the six lagged terms in the stepwise model
were all over 3.0.

The presence of several long-lag variables in this model indicates that the model is in effect fitting local slopes. For instance, a model which gave B1 a positive weight (as this model did) and B10 a negative weight (as it also did) would be doing something rather similar to fitting a straight line to those two points and projecting it to the right to make the forecast.

The inclusion of linear, quadratic, and cubic terms for TIME may seem
like overkill for a small data set, but in the various models I tried with this
data set, even the cubic term remained highly significant; in the final model
described below it showed *t* = -5.484, df = 42, *p* = .000002. Due
partly to
the large semester effect, the cubic effect is hardly visible in this data set until
one looks for it, at which point it does become visible. That is, the series
rises visibly to about 1965, then falls to about 1970, then rises.

I also added 5 lagged variables B1 through B5. Testing them as a
set yielded *p* = .0054. After some experimentation I dropped B1, B2, and
B4, leaving B3 and B5. The *t*'s for these variables were respectively -3.74
and -2.72, and testing these two variables as a set yielded *p* = .00030. These
results seem stronger than could easily be explained by chance selection of two
variables from five; the number of ways of selecting two items from 5 is only
10. The final model showed SEE = 25.03, well below the ARIMA value of
31.44.

Again the nonlinear polynomial terms were highly significant. The cubic nature of the curve is clear once it is pointed out; the trend falls steeply, then less steeply, then more steeply again. This series also illustrates the substantive reasonableness of using polynomial terms. If you had to project the next observation in a series like this (that falls steeply, then less steeply, then more steeply again), would you rather fit a straight line to the entire series, or take advantage of the fact that the last part of the curve is falling even more steeply than the curve as a whole? A polynomial essentially does the latter.

In the housing-permit example the best model used both short-lag and long-lag terms. In the college-enrollment and sales examples the best models used both polynomial terms and long-lag terms. In the sales example A-terms worked best for lag-terms, while in the housing and college examples B-terms worked best. It may be that each observation in the sales example included more random error, though A-terms are especially recommended for small sample sizes and the sales data set was the smallest of the three.

Darlington, Richard (1968). Multiple regression in psychological research and
practice. *Psychological Bulletin*, vol 69, 161-182

Darlington, Richard (1990). *Regression and Linear Models*. New York:
McGraw-Hill

Gilchrist, Warren (1976). *Statistical Forecasting.* New York: Wiley

Pankratz, Alan (1983). *Forecasting with Univariate Box-Jenkins Models:
concepts and cases.* New York: Wiley