Logo en.artbmxmagazine.com

Automating operations with minitab for time series

Table of contents:

Anonim

Did you find yourself having to do certain tests more than once? If so, would you like to give your mouse a rest? With MINITAB you can easily automate operations to save time. There are several ways to do this, from the quick and easy cut / paste method to the more powerful method using a local macro.

How does this work? Almost all operations in MINITAB can be done using a command session. In fact, when you complete a dialog box and click OK, MINITAB generates a command session that contains all the information you selected in it. You can use those command sessions "as is." or modify them if you wish, load them in one step and MINITAB will run the entire analysis.

Suppose you have a weekly data collection and you generate three different graphs from that data. Of course, every week you should fill out the dialog boxes for all three charts, which would mean lots of mouse clicks. Instead, you could load the script that generated those charts in one quick step.

This article includes some simple examples of how to automate Time Series operations in MINITAB.

Minitab Time Series.

This manual contains the concept, application and execution in the Minitab version 15 system, of the Time Series topic.

BASIC CONCEPTS OF TIME SERIES

1.1 INTRODUCTION

Every institution, be it the family, the company or the government, has to make plans for the future if it is to survive and progress. Today, various institutions require to know the future behavior of certain phenomena in order to plan, forecast or prevent.

Rational planning requires anticipating future events that are likely to occur. Forecasting, in turn, is often based on what has happened in the past. Thus, there is a new type of statistical inference that is made about the future of some variable or compound of variables based on past events. The most important technique for making inferences about the future based on what happened in the past is time series analysis.

There are countless applications that can be cited in different areas of knowledge, such as in economics, physics, geophysics, chemistry, electricity, demography, marketing, telecommunications, transportation, etc.

Time series

Examples

1. Economic series: - Prices of an article - Unemployment rates - Inflation rate

- Price index, etc.

2. Physical Series: - Meteorology- Amount of water dropped- Maximum daily temperature

- Wind speed (wind energy)

- Solar energy, etc.

3. Geophysics: - Seismology series
4. Demographic series: - Population growth rates - Birth rate, mortality - Results of population censuses
5. Marketing series: - Demand series, expenses, offers
6. Telecommunication series: - Signal analysis
7. Transport series: - Traffic series

One of the problems that the time series tries to solve is that of prediction. This is given a series {x (t1),…, x (tn)} our objectives of interest are to describe the behavior of the series, investigate the generating mechanism of the time series, look for possible time patterns that allow us to overcome the uncertainty of the future.

From now on, we will study how to build a model to explain the structure and forecast the evolution of a variable that we observe over time. The variables of interest can be macroeconomic (consumer price index, electricity demand, series of exports or imports, etc.), macroeconomic (sales of a company, stock in a warehouse, advertising expenses of a sector), physical (wind speed in a wind power station, temperature in a process, flow of a river, concentration in the atmosphere of a polluting agent), or social (number of births, marriages, deaths, or votes to a political party).

1.2 DEFINITION OF TIME SERIES

In many areas of knowledge, observations of interest are obtained at successive instants of time, for example, every hour, for 24 hours, monthly, quarterly, semi-annually or recorded by some team on a continuous basis.

We call a series of measurements of a certain phenomenon or experiment recorded sequentially in time. These observations will be denoted by {x (t1), x (t2),…, x (tn)} = {x (t): t Î T Í R} with x (ti) the value of the variable x at the instant you. If T = Z the time series is said to be discrete and if T = R the time series is said to be continuous. When ti + 1 - ti = k for all i = 1,…, n-1, the series is said to be equispaced, otherwise it will be non-equispaced.

From now on, we will work with discrete time series, equally spaced in which case we will assume and without loss of generality that: {x (t1), x (t2),…, x (tn)} = {x (1), x (2),…, x (n)}.

1.3 FIRST STEP WHEN ANALYZING ANY TIME SERIES

The first step in time series analysis is to plot the series. This allows us to detect the essential components of the series.

The graph of the series will allow:

a) Detect Outlier: refers to points in the series that are beyond normal. An outliers is an observation of the series that corresponds to an abnormal behavior of the phenomenon (without future incidents) or to a measurement error.

It must be determined from the outside whether a given point is outlier or not. If it is found to be, it must be omitted or replaced with another value before parsing the series.

For example, in a study of daily production in a factory, the following situation occurred, see figure 1.1:

The two points framed in a circle seem to correspond to an abnormal behavior of the series. When investigating these two points it was found that they corresponded to two days of unemployment, which naturally affected production on those days. The problem was solved by removing the observations and interpolating.

b) It allows detecting trend: the trend represents the predominant behavior of the series. This can be loosely defined as the change in the mean over a period (see figure 1.2).

c) Seasonal variation: seasonal variation represents a periodic movement of the time series. The unit length of the period is generally less than one year. It can be a quarter, a month or a day, etc. (see figure 1.3).

Mathematically, we can say that the series represents seasonal variation if there exists a number s such that x (t) = x (t + k × s).

The main forces that cause seasonal variation are weather conditions, such as:

  1. in winter ice cream sales in summer wool sale fruit export in March.

All these phenomena show a seasonal behavior (annual, weekly, etc.)

d) Irregular Variations (Random Component): Irregular (random) movements represent all types of movements in a time series other than trend, seasonal variations and cyclical fluctuations.

2. CLASSIC TIME SERIES MODELS

2.1 DECOMPOSITION MODELS

A classic model for a time series, assumes that a series x (1),…, x (n) can be expressed as the sum or product of three components: trend, seasonality, and a random error term.

There are three time series models, which are generally accepted as good approximations of the true relationships, between the components of the observed data. These are:

  1. Additive: X (t) = T (t) + E (t) + A (t) Multiplicative: X (t) = T (t) • E (t) • A (t) Mixed: X (t) = T (t) • E (t) + A (t)

Where:

  • X (t) series observed at time tT (t) trend component E (t) seasonal component A (t) random (accidental) component

A usual assumption is that A (t) is a random component or white noise with zero mean and constant variance.

An additive model (1) is suitable, for example, when E (t) does not depend on other components, such as T (t), if on the contrary seasonality varies with the trend, the most suitable model is a multiplicative model (two). It is clear that model 2 can be transformed into an additive, taking logarithms. The problem that arises is to properly model the components of the series.

Figure 2.1 illustrates possible patterns that could be followed by series represented by models (1), (2) and (3).

2.2 ESTIMATING THE TREND

We will assume here that the seasonal component E (t) is not present and that the additive model is adequate, that is:

X (t) = T (t) + A (t), where A (t) is white noise.

There are several methods to estimate T (t). The most widely used consist of:

  1. Fit a function of time, such as a polynomial, an exponential, or another smooth function of t. Soften (or filter) the values ​​in the series. Use differences.

2.2.1 SETTING A FUNCTION

The following graphs illustrate some of the shapes of these curves.

Note:

  1. the trend curve must cover a relatively long period to be a good representation of the long-term trend. The rectilinear and exponential trend are applicable in the short term, since a long-term S curve may appear to be a straight line in a restricted period of time (for example).

In Figure 2.2 both curves (straight and Gompertz) fit well but the projections diverge greatly in the long run.

Example 1: Table 2.1 shows quarterly data for housing units started in the United States from the third quarter of 1964 to the second quarter of 1972. (It should be noted that for the trend analysis the period considered should be longer. However, since the main purpose is to illustrate the decomposition method and the techniques to infer from the elements thus decomposed, the insufficiency of the data does not have to interest.)

Table 2.1: New housing units started in the United States from the third quarter of 1964 to the second quarter of 1972 (in thousands of units).

Year I II III IV Total annual
1964 398 352
1965 283 454 392 3. 4. 5 1,474
1966 274 392 290 210 1,166
1967 218 382 382 340 1,322
1968 298 452 423 372 1,545
1969 336 468 387 309 1,500
1970 264 399 408 396 1,467
1971 389 604 579 513 2,085
1972 510 661

Let t be each of the 32 quarters from 1964 to 1972, that is, t = 1 for the third quarter of 1964, t = 2 for the fourth quarter, and so on. So the definition domain of t is the set of integers from 1 to 32 inclusive. Let T (t) be housing starts quarterly. The values ​​of t and T (t) are given in Table 2.2. To calculate the values ​​of a and b on the trend line

T (t) = a + bt

The following figures are obtained from the data in Table 2.1.

Table 2.2: Calculation of the trend in housing started in the United States from the third quarter of 1964 to the second quarter of 1972

Quarter year

t

T (t)

Trend
1964: 3

one

398

291.73

4

two

352

298.07

1965: 1

3

283

304.41

two

4

454

310.75

3

5

392

317.09

4

6

3. 4. 5

323.43

1966: 1

7

274

329.77

two

8

392

336.11

3

9

290

342.45

4

10

210

348.79

1967: 1

eleven

218

355.13

two

12

382

361.47

3

13

382

367.81

4

14

340

374.15

1968: 1

fifteen

298

380.49

two

16

452

386.83

3

17

423

393.17

4

18

372

399.51

1969: 1

19

336

405.85

two

twenty

468

412.19

3

twenty-one

387

418.53

4

22

309

424.87

1970: 1

2. 3

264

431.21

two

24

399

437.55

3

25

408

443.89

4

26

396

450.23

1971: 1

27

389

456.57

two

28

604

462.91

3

29

579

469.25

4

30

513

475.59

1972: 1

31

510

481.93

two

32

661

488.27

So the trend line is

T (t) = 285.39 + 6.34 × t

Figure 2.3 graphically shows the trend line adjusted for the quarterly data in Table 2.2. The dashed line after 1972 represents projections (see section 3 Predictions).

Development in Minitab:

  1. Open Minitab. Copy the data to the Minitab worksheet. Select: Stat à Time Series à Trend Analysis.

  1. In the Trend Analysis window we select with a click the variable, we leave the Model Type as Linear and click OK

  1. Minitab displays the following graph, which as we can see is similar to the one presented in the course of the exercise.

  1. If we want to obtain 4 graphs in a single window, select the option Graphs…

Click Four in one.

Click OK

Minitab displays the following graph.

2.2.2 SOFTENING. LINEAR FILTERS

One way to visualize the trend is by smoothing the series. The central idea is to define from the observed series a new series that smoothes out the non-trend effects (seasonality, random effects), so that we can determine the direction of the trend (see figure 2.4).

What we do is use a linear expression that transforms the series X (t) into a smoothed series Z (t): Z (t) = F (X (t)), t = 1,…, n

such that F (X (t)) = T (t). Function F is called Linear Filter. The most used linear filter is the moving average.

2.2.2.1 MOVING AVERAGES

The goal is to remove seasonal and accidental components from the series. For a monthly series with annual seasonality (s = 12), the smoothed series is obtained,

For a quarterly series, with annual seasonality (s = 4), the smoothed series is given by

This procedure is called: finite symmetric filter.

Note: it softens when there are many sudden changes, irregular movements.

Example 2: From the data in example 1, a moving average is calculated by adding the values ​​for a certain number of successive periods and then dividing the sum thus obtained by the number of periods covered. In this case it is a quarterly series and formula (2) is used for this.

Table 2.3: Four Quarter Centered Moving Average Calculation of Housing Initiations in the US, Third Quarter 1964 to Second Quarter 1972 (in thousands of units)

Year by quarter

Original Data AND

Mobile Total in four quarters

Four Quarter Moving Average

Four Quarter Centered Moving Average

(one)

(two)

(3)

(4)

(5)

1964: 3

398

4

352

1965: 1

283

1,487

372

371

two

454

1,481

370

369

3

392

1,474

369

367

4

3. 4. 5

1,465

366

359

1966: 1

274

1,403

351

338

two

392

1,301

325

308

3

290

1,166

292

285

4

210

1,110

278

276

1967: 1

218

1,100

275

287

two

382

1,192

298

314

3

382

1,322

331

341

4

340

1,402

351

359

1968: 1

298

1,472

368

373

two

452

1,513

378

382

3

423

1,545

386

391

4

372

1,583

396

398

1969: 1

336

1,599

400

395

two

468

1,563

391

383

3

387

1,500

375

366

4

309

1,428

357

348

1970: 1

264

1,359

340

342

two

399

1,380

3. 4. 5

356

3

408

1,467

367

382

4

396

1,592

398

424

1971: 1

389

1,797

449

471

two

604

1,968

492

507

3

579

2,085

521

536

4

513

2,206

552

559

1972: 1

510

2,263

566

two

661

In Table 2.3, for example, the four-quarter moving average for the first quarter of 1965 is obtained by adding the values ​​of the third and fourth quarters of 1964 and the first and second quarters of 1965 and then dividing the sum by 4. The average for the second quarter of 1965 it is obtained by adding the values ​​of the fourth quarter of 1964 with those of the first, second and third quarters of 1965 and then dividing the sum by 4. Therefore, for each successive average, the quarter that comes first is subtracted and the last one is added.

Column 4 of Table 2.3 shows the moving averages for four quarters obtained, based on the data on housing starts for 1964 to 1972. The moving average does not eliminate the very marked fluctuations in the series, but it substantially reduces the amplitude of the variations. of the original data.

If an odd number of periods enters the calculation of a moving average, the process will be easier since the number of periods before and after the period for which the average is calculated are the same. If the number of periods is even, as in this example, you cannot use the same number of periods before and after a specified period. Therefore, the moving average must be halfway between the values ​​of two consecutive periods and is not related to any period. This problem can be solved by calculating a series-centered moving average, which is accomplished by first obtaining a two-quarter centered moving average of the moving averages already obtained. The first centered moving average is the average of the first two four-quarter moving averages,the second centered moving average is the average of the moving averages of four second and third quarters, etc. In this way, there will be an equal number of periods after and before the specified period for which the centered moving average is being calculated. The centered moving averages are seen in column 5 of Table 2.3.

According to formula 2, the calculation would be as follows:

This value corresponds to the Centered Moving Average shown in column 5.

Figure 2.5 graphically shows the adjustment through the moving average, according to Table 2.3, where the black segment represents the original series and the blue segment the smoothed series.

Development in Minitab:

  1. Open Minitab. Copy the data to the Minitab worksheet:

  1. Select: Stat à Time Series à Moving Average…

  1. Select with a click the variable with the time series and place the MA length.

In this case it is equal to 4 (4 quarters per year). Click OK

  1. Minitab displays the graph with the moving average.

Summary

Time Series is called a set of measurements of a certain phenomenon or experiment recorded sequentially in time, for example, every hour, monthly, quarterly, semi-annually, etc. In this note we worked with discrete time series, equally spaced in in which case it is assumed that:: {x (t1), x (t2),…, x (tn)} = {x (1), x (2),…, x (n)}. Due to the introductory nature, it was restricted to the case of univariate time series.

When analyzing a time series, the first thing to do is graph the series. This allows us to detect the essential components of the series. The series graph will allow: detect Outlier, detect trends, seasonal variation, irregular variations (or random component).

A classic time series model can be expressed as the sum or product of three components: trend, seasonal, and a random error term. There are three time series models. These are:

  1. Additive: X (t) = T (t) + E (t) + A (t) Multiplicative: X (t) = T (t) • E (t) • A (t) Mixed: X (t) = T (t) • E (t) + A (t)

In order to obtain a model, it is necessary to estimate the trend and seasonality. To estimate the trend, it is assumed that the seasonal component is not present. Estimation is achieved by fitting a polynomial or smoothing of the series to a function of time through the moving averages. To estimate seasonality, it is necessary to have decided on the model to be used (mixed or additive). Once the trend and seasonality have been estimated, we are able to predict.

The methods reviewed in this note are descriptive in nature, so judgment and knowledge of the phenomenon play an important role in model selection.

The classic methods have the disadvantage that they adapt over time, which implies that the estimation process must be started again in the face of the knowledge of a new data.

Team consisting of:

Ing. Gerardo Valdes Fuentes

Ing. Rosa Isela Meléndez López

Lic. José Luis Chávez Dávila

Ing. Renato Elmer Vázquez García

Master in Administration and Leadership.

Northeast Autonomous University.

Bibliography:

Statistics for Administrators, Richard I. Levin & David S. Rubin.

Editorial Prentice Hall

Automating operations with minitab for time series