Automating operations with minitab for time series

Did you find yourself having to do certain tests more than once? If so, would you like to give your mouse a rest? With MINITAB you can easily automate operations to save time. There are several ways to do this, from the quick and easy cut / paste method to the more powerful method using a local macro.

How does this work? Almost all operations in MINITAB can be done using a command session. In fact, when you complete a dialog box and click OK, MINITAB generates a command session that contains all the information you selected in it. You can use those command sessions "as is." or modify them if you wish, load them in one step and MINITAB will run the entire analysis.

Suppose you have a weekly data collection and you generate three different graphs from that data. Of course, every week you should fill out the dialog boxes for all three charts, which would mean lots of mouse clicks. Instead, you could load the script that generated those charts in one quick step.

This article includes some simple examples of how to automate Time Series operations in MINITAB.

Minitab Time Series.

This manual contains the concept, application and execution in the Minitab version 15 system, of the Time Series topic.

BASIC CONCEPTS OF TIME SERIES

1.1 INTRODUCTION

Every institution, be it the family, the company or the government, has to make plans for the future if it is to survive and progress. Today, various institutions require to know the future behavior of certain phenomena in order to plan, forecast or prevent.

Rational planning requires anticipating future events that are likely to occur. Forecasting, in turn, is often based on what has happened in the past. Thus, there is a new type of statistical inference that is made about the future of some variable or compound of variables based on past events. The most important technique for making inferences about the future based on what happened in the past is time series analysis.

There are countless applications that can be cited in different areas of knowledge, such as in economics, physics, geophysics, chemistry, electricity, demography, marketing, telecommunications, transportation, etc.

Time series	Examples
1. Economic series:	- Prices of an article - Unemployment rates - Inflation rate - Price index, etc.
2. Physical Series:	- Meteorology- Amount of water dropped- Maximum daily temperature - Wind speed (wind energy) - Solar energy, etc.
3. Geophysics:	- Seismology series
4. Demographic series:	- Population growth rates - Birth rate, mortality - Results of population censuses
5. Marketing series:	- Demand series, expenses, offers
6. Telecommunication series:	- Signal analysis
7. Transport series:	- Traffic series

One of the problems that the time series tries to solve is that of prediction. This is given a series {x (t1),…, x (tn)} our objectives of interest are to describe the behavior of the series, investigate the generating mechanism of the time series, look for possible time patterns that allow us to overcome the uncertainty of the future.

From now on, we will study how to build a model to explain the structure and forecast the evolution of a variable that we observe over time. The variables of interest can be macroeconomic (consumer price index, electricity demand, series of exports or imports, etc.), macroeconomic (sales of a company, stock in a warehouse, advertising expenses of a sector), physical (wind speed in a wind power station, temperature in a process, flow of a river, concentration in the atmosphere of a polluting agent), or social (number of births, marriages, deaths, or votes to a political party).

1.2 DEFINITION OF TIME SERIES

In many areas of knowledge, observations of interest are obtained at successive instants of time, for example, every hour, for 24 hours, monthly, quarterly, semi-annually or recorded by some team on a continuous basis.

We call a series of measurements of a certain phenomenon or experiment recorded sequentially in time. These observations will be denoted by {x (t1), x (t2),…, x (tn)} = {x (t): t Î T Í R} with x (ti) the value of the variable x at the instant you. If T = Z the time series is said to be discrete and if T = R the time series is said to be continuous. When ti + 1 - ti = k for all i = 1,…, n-1, the series is said to be equispaced, otherwise it will be non-equispaced.

From now on, we will work with discrete time series, equally spaced in which case we will assume and without loss of generality that: {x (t1), x (t2),…, x (tn)} = {x (1), x (2),…, x (n)}.

1.3 FIRST STEP WHEN ANALYZING ANY TIME SERIES

The first step in time series analysis is to plot the series. This allows us to detect the essential components of the series.

The graph of the series will allow:

a) Detect Outlier: refers to points in the series that are beyond normal. An outliers is an observation of the series that corresponds to an abnormal behavior of the phenomenon (without future incidents) or to a measurement error.

It must be determined from the outside whether a given point is outlier or not. If it is found to be, it must be omitted or replaced with another value before parsing the series.

For example, in a study of daily production in a factory, the following situation occurred, see figure 1.1:

The two points framed in a circle seem to correspond to an abnormal behavior of the series. When investigating these two points it was found that they corresponded to two days of unemployment, which naturally affected production on those days. The problem was solved by removing the observations and interpolating.

b) It allows detecting trend: the trend represents the predominant behavior of the series. This can be loosely defined as the change in the mean over a period (see figure 1.2).

c) Seasonal variation: seasonal variation represents a periodic movement of the time series. The unit length of the period is generally less than one year. It can be a quarter, a month or a day, etc. (see figure 1.3).

Mathematically, we can say that the series represents seasonal variation if there exists a number s such that x (t) = x (t + k × s).

The main forces that cause seasonal variation are weather conditions, such as:

in winter ice cream sales in summer wool sale fruit export in March.

All these phenomena show a seasonal behavior (annual, weekly, etc.)

d) Irregular Variations (Random Component): Irregular (random) movements represent all types of movements in a time series other than trend, seasonal variations and cyclical fluctuations.

2. CLASSIC TIME SERIES MODELS

2.1 DECOMPOSITION MODELS

A classic model for a time series, assumes that a series x (1),…, x (n) can be expressed as the sum or product of three components: trend, seasonality, and a random error term.

There are three time series models, which are generally accepted as good approximations of the true relationships, between the components of the observed data. These are:

Additive: X (t) = T (t) + E (t) + A (t) Multiplicative: X (t) = T (t) • E (t) • A (t) Mixed: X (t) = T (t) • E (t) + A (t)

Where:

X (t) series observed at time tT (t) trend component E (t) seasonal component A (t) random (accidental) component

A usual assumption is that A (t) is a random component or white noise with zero mean and constant variance.

An additive model (1) is suitable, for example, when E (t) does not depend on other components, such as T (t), if on the contrary seasonality varies with the trend, the most suitable model is a multiplicative model (two). It is clear that model 2 can be transformed into an additive, taking logarithms. The problem that arises is to properly model the components of the series.

Figure 2.1 illustrates possible patterns that could be followed by series represented by models (1), (2) and (3).

2.2 ESTIMATING THE TREND

We will assume here that the seasonal component E (t) is not present and that the additive model is adequate, that is:

X (t) = T (t) + A (t), where A (t) is white noise.

There are several methods to estimate T (t). The most widely used consist of:

Fit a function of time, such as a polynomial, an exponential, or another smooth function of t. Soften (or filter) the values in the series. Use differences.

2.2.1 SETTING A FUNCTION

The following graphs illustrate some of the shapes of these curves.

Note:

the trend curve must cover a relatively long period to be a good representation of the long-term trend. The rectilinear and exponential trend are applicable in the short term, since a long-term S curve may appear to be a straight line in a restricted period of time (for example).

In Figure 2.2 both curves (straight and Gompertz) fit well but the projections diverge greatly in the long run.

Example 1: Table 2.1 shows quarterly data for housing units started in the United States from the third quarter of 1964 to the second quarter of 1972. (It should be noted that for the trend analysis the period considered should be longer. However, since the main purpose is to illustrate the decomposition method and the techniques to infer from the elements thus decomposed, the insufficiency of the data does not have to interest.)

Table 2.1: New housing units started in the United States from the third quarter of 1964 to the second quarter of 1972 (in thousands of units).

Year	I	II	III	IV	Total annual
1964			398	352
1965	283	454	392	3. 4. 5	1,474
1966	274	392	290	210	1,166
1967	218	382	382	340	1,322
1968	298	452	423	372	1,545
1969	336	468	387	309	1,500
1970	264	399	408	396	1,467
1971	389	604	579	513	2,085
1972	510	661

Let t be each of the 32 quarters from 1964 to 1972, that is, t = 1 for the third quarter of 1964, t = 2 for the fourth quarter, and so on. So the definition domain of t is the set of integers from 1 to 32 inclusive. Let T (t) be housing starts quarterly. The values of t and T (t) are given in Table 2.2. To calculate the values of a and b on the trend line

T (t) = a + bt

The following figures are obtained from the data in Table 2.1.

Table 2.2: Calculation of the trend in housing started in the United States from the third quarter of 1964 to the second quarter of 1972

Quarter year	t	T (t)	Trend
1964: 3	one	398	291.73
4	two	352	298.07
1965: 1	3	283	304.41
two	4	454	310.75
3	5	392	317.09
4	6	3. 4. 5	323.43
1966: 1	7	274	329.77
two	8	392	336.11
3	9	290	342.45
4	10	210	348.79
1967: 1	eleven	218	355.13
two	12	382	361.47
3	13	382	367.81
4	14	340	374.15
1968: 1	fifteen	298	380.49
two	16	452	386.83
3	17	423	393.17
4	18	372	399.51
1969: 1	19	336	405.85
two	twenty	468	412.19
3	twenty-one	387	418.53
4	22	309	424.87
1970: 1	2. 3	264	431.21
two	24	399	437.55
3	25	408	443.89
4	26	396	450.23
1971: 1	27	389	456.57
two	28	604	462.91
3	29	579	469.25
4	30	513	475.59
1972: 1	31	510	481.93
two	32	661	488.27

So the trend line is

T (t) = 285.39 + 6.34 × t

Figure 2.3 graphically shows the trend line adjusted for the quarterly data in Table 2.2. The dashed line after 1972 represents projections (see section 3 Predictions).

Development in Minitab:

Open Minitab. Copy the data to the Minitab worksheet. Select: Stat à Time Series à Trend Analysis.

In the Trend Analysis window we select with a click the variable, we leave the Model Type as Linear and click OK

Minitab displays the following graph, which as we can see is similar to the one presented in the course of the exercise.

If we want to obtain 4 graphs in a single window, select the option Graphs…

Click Four in one.

Click OK

Minitab displays the following graph.

2.2.2 SOFTENING. LINEAR FILTERS

One way to visualize the trend is by smoothing the series. The central idea is to define from the observed series a new series that smoothes out the non-trend effects (seasonality, random effects), so that we can determine the direction of the trend (see figure 2.4).

What we do is use a linear expression that transforms the series X (t) into a smoothed series Z (t): Z (t) = F (X (t)), t = 1,…, n

such that F (X (t)) = T (t). Function F is called Linear Filter. The most used linear filter is the moving average.

2.2.2.1 MOVING AVERAGES

The goal is to remove seasonal and accidental components from the series. For a monthly series with annual seasonality (s = 12), the smoothed series is obtained,

For a quarterly series, with annual seasonality (s = 4), the smoothed series is given by

This procedure is called: finite symmetric filter.

Note: it softens when there are many sudden changes, irregular movements.

Example 2: From the data in example 1, a moving average is calculated by adding the values for a certain number of successive periods and then dividing the sum thus obtained by the number of periods covered. In this case it is a quarterly series and formula (2) is used for this.

Table 2.3: Four Quarter Centered Moving Average Calculation of Housing Initiations in the US, Third Quarter 1964 to Second Quarter 1972 (in thousands of units)

Year by quarter	Original Data AND	Mobile Total in four quarters	Four Quarter Moving Average	Four Quarter Centered Moving Average
(one)	(two)	(3)	(4)	(5)
1964: 3	398
4	352
1965: 1	283	1,487	372	371
two	454	1,481	370	369
3	392	1,474	369	367
4	3. 4. 5	1,465	366	359
1966: 1	274	1,403	351	338
two	392	1,301	325	308
3	290	1,166	292	285
4	210	1,110	278	276
1967: 1	218	1,100	275	287
two	382	1,192	298	314
3	382	1,322	331	341
4	340	1,402	351	359
1968: 1	298	1,472	368	373
two	452	1,513	378	382
3	423	1,545	386	391
4	372	1,583	396	398
1969: 1	336	1,599	400	395
two	468	1,563	391	383
3	387	1,500	375	366
4	309	1,428	357	348
1970: 1	264	1,359	340	342
two	399	1,380	3. 4. 5	356
3	408	1,467	367	382
4	396	1,592	398	424
1971: 1	389	1,797	449	471
two	604	1,968	492	507
3	579	2,085	521	536
4	513	2,206	552	559
1972: 1	510	2,263	566
two	661

In Table 2.3, for example, the four-quarter moving average for the first quarter of 1965 is obtained by adding the values of the third and fourth quarters of 1964 and the first and second quarters of 1965 and then dividing the sum by 4. The average for the second quarter of 1965 it is obtained by adding the values of the fourth quarter of 1964 with those of the first, second and third quarters of 1965 and then dividing the sum by 4. Therefore, for each successive average, the quarter that comes first is subtracted and the last one is added.

Column 4 of Table 2.3 shows the moving averages for four quarters obtained, based on the data on housing starts for 1964 to 1972. The moving average does not eliminate the very marked fluctuations in the series, but it substantially reduces the amplitude of the variations. of the original data.

If an odd number of periods enters the calculation of a moving average, the process will be easier since the number of periods before and after the period for which the average is calculated are the same. If the number of periods is even, as in this example, you cannot use the same number of periods before and after a specified period. Therefore, the moving average must be halfway between the values of two consecutive periods and is not related to any period. This problem can be solved by calculating a series-centered moving average, which is accomplished by first obtaining a two-quarter centered moving average of the moving averages already obtained. The first centered moving average is the average of the first two four-quarter moving averages,the second centered moving average is the average of the moving averages of four second and third quarters, etc. In this way, there will be an equal number of periods after and before the specified period for which the centered moving average is being calculated. The centered moving averages are seen in column 5 of Table 2.3.

According to formula 2, the calculation would be as follows:

This value corresponds to the Centered Moving Average shown in column 5.

Figure 2.5 graphically shows the adjustment through the moving average, according to Table 2.3, where the black segment represents the original series and the blue segment the smoothed series.

Development in Minitab:

Open Minitab. Copy the data to the Minitab worksheet:

Select: Stat à Time Series à Moving Average…

Select with a click the variable with the time series and place the MA length.

In this case it is equal to 4 (4 quarters per year). Click OK

Minitab displays the graph with the moving average.

Summary

Time Series is called a set of measurements of a certain phenomenon or experiment recorded sequentially in time, for example, every hour, monthly, quarterly, semi-annually, etc. In this note we worked with discrete time series, equally spaced in in which case it is assumed that:: {x (t1), x (t2),…, x (tn)} = {x (1), x (2),…, x (n)}. Due to the introductory nature, it was restricted to the case of univariate time series.

When analyzing a time series, the first thing to do is graph the series. This allows us to detect the essential components of the series. The series graph will allow: detect Outlier, detect trends, seasonal variation, irregular variations (or random component).

A classic time series model can be expressed as the sum or product of three components: trend, seasonal, and a random error term. There are three time series models. These are:

Additive: X (t) = T (t) + E (t) + A (t) Multiplicative: X (t) = T (t) • E (t) • A (t) Mixed: X (t) = T (t) • E (t) + A (t)

In order to obtain a model, it is necessary to estimate the trend and seasonality. To estimate the trend, it is assumed that the seasonal component is not present. Estimation is achieved by fitting a polynomial or smoothing of the series to a function of time through the moving averages. To estimate seasonality, it is necessary to have decided on the model to be used (mixed or additive). Once the trend and seasonality have been estimated, we are able to predict.

The methods reviewed in this note are descriptive in nature, so judgment and knowledge of the phenomenon play an important role in model selection.

The classic methods have the disadvantage that they adapt over time, which implies that the estimation process must be started again in the face of the knowledge of a new data.

Team consisting of:

Ing. Gerardo Valdes Fuentes

Ing. Rosa Isela Meléndez López

Lic. José Luis Chávez Dávila

Ing. Renato Elmer Vázquez García

Master in Administration and Leadership.

Northeast Autonomous University.

Bibliography:

Statistics for Administrators, Richard I. Levin & David S. Rubin.

Editorial Prentice Hall