Article Writing Question
Read chapter 10 of the textbook.
Write a summary of Chapter 10 with 1 page (times new roman, 12 pt font, and single-spaced).
CHAPTER 10 Time Series Analysis and Forecasting LEARNING OBJECTIVES After completing this chapter, you will be able to: Explain time series analysis. Identify analytical techniques used to develop forecasting models. One of the primary goals of data mining and other analytics is to make forecasts or predictions. For instance, Nina, the sales manager at GB, may want to forecast the impact on sales and profitability for her division if she increases the price of off-road bicycles. Although Nina’s question is seemingly straightforward, many variables will come into play to affect sales and profitability. As one example, a price change will also change the contribution margin per bicycle and, thus, the profit. This is an easy calculation. But, how do we know how many bicycles customers will buy at the higher price? Further, how will the change in bicycle purchases affect companion sales that usually occur when a person buys a bicycle; for example, a helmet? Also, will increasing the price of off-road bikes in Nina’s division affect sales in the western U.S. or in Europe? Will the tweets and other comments on social networks that result from the price change affect sales of the bikes? How will market forces influence bicycle pricing? Will competitors move in to steal customers from GB? As you can see, what appears to be a fairly straightforward question is, in fact, very complex. To address this complexity and to forecast sales in Nina’s division with any degree of accuracy, we need to employ both forecasting and predictive modeling methods. This chapter will focus on the analytical techniques Nina employs to forecast sales and profitability. Chapter 11 will focus on the predictive models that answer questions regarding customer decisions to purchase—or not to purchase—bicycles. Before we go deeper into this chapter, there are two fundamental terms you need to understand: forecasting and prediction. Linguistically, both terms pertain to foretelling what will happen in the future. However, academics distinguish between the two. A forecast is an estimation of the value of a variable in the future. For example, what is the forecast of sales for next month? What is the weather forecast for Los Angeles for tomorrow? Forecasting usually involves time series analysis, which we discuss in the next section. In contrast, prediction is used to uncover and understand relationships between and among variables in order to predict what will happen to a target variable. For example, which customer is likely to buy a bicycle? Which customer is likely to move to a competitor? Prediction is concerned with individual outcomes, whereas forecasting focuses on the macro level over time. Although we have differentiated between forecasting and prediction, in business and industry these terms are loosely interpreted and are often used interchangeably. Although this textbook explains the techniques for both, we, too, use these terms interchangeably. This chapter discusses different methods of forecasting to address various business questions. To optimize your forecast results, it is important for you to understand which tool to use for which job. In predictive analytics not every model will fit every forecasting question. Therefore, in this chapter you will learn which models fit time series analysis, a powerful forecasting technique. In the following chapter, we will describe predictive data models and their uses. TIME SERIES ANALYSIS A time series is a sequence of values of an attribute, or variable, measured at equidistant intervals of time. Figure 10-1 illustrates a time series of the number of bikes sold per day over a period of 10 days. Note that the time intervals are constant: one day. In addition, the time axis is arranged in increasing chronological order, which replicates our perception of time: It moves forward. FIGURE 10-1: TIME SERIES FOR NUMBER OF SALES PER DAY Time series analysis is a technique that analysts use to (a) uncover any implicit structure—that is, patterns or trends—in the data and (b) model that structure to make forecasts. The assumption is that the future, at least in the short term, will continue the structure of the past. This technique is useful for forecasting values such as sales quantities, airline passenger volume, economic metrics, and traffic volume. The goal of time series analysis, then, is to uncover the structure of the past and then use that structure to forecast the future. The time axis sort order of a time series is increasing; that is, day 1, then day 2, then day 3, and so on. This order should be maintained during any forecasting undertaking because reordering the time axis breaks the natural progression of time. This observation is especially important when time is designated in combinations of years, months, quarters, weeks, days, hours, minutes, and seconds. Here is an example of the sort order for a time series using time values in 12-hour format: 10:59:00 AM, 11:59:00 AM, 12:59:00 PM, 1:59:00 PM, 2:59:00 PM… The simplest way to report findings of a time series is to report the average, which analysts refer to as the mean. In Figure 10-1, the mean number of bikes sold per day is 39.6. This kind of statistical metric is a good first attempt, but it does not reveal any trends or patterns in the data. If we were to base our inventory replenishment on an average of 39.6 bikes, then we could run out of bicycles on some days and have too many on hand on other days. Finding trends and patterns in the data is important because they can inform decision making—in this case stocking of inventory—that lead to improved operations and better planning. Let’s talk a bit more about time series and how they are classified. The example illustrated in Figure 10-1 is called a univariate time series. The term “univariate” refers to the fact that there is a single variable—in this case, the number of bikes sold—that varies over time. Other examples of univariate time series are interest rate, global temperature, inventory stock value, and population. In contrast, a multivariate time series is one in which multiple variables change over time, and we want to model the interactions among them. For example, we might want to measure temperature and carbon dioxide concentration (two variables) over the earth’s history (time). In another example, GB might want to measure the number of bicycles manufactured and the number of employees required to produce those bikes over time to determine trends in manufacturing efficiencies and to forecast future labor force requirements. We can use time series analysis to report additional statistical measures such as median, mode, maximum, minimum, and standard deviation. Similar to the example of average bikes sold per day, all of these measures are singlevalue summaries of the dynamic data series. Although they convey overall statistics, they don’t provide a detailed understanding of the data. In time series analysis we strive to understand the underlying structure within the data. We need a more nuanced technique than simply finding the mean. Instead, we attempt to unravel variations, trends, and seasonality to help predict future values of the variable. FIGURE 10-2: SAMPLE TIME SERIES Examine the time series in Figure 10-2. The chart reveals an obvious trend; namely, sales are increasing. It also indicates that sales are seasonal, meaning that sales quantities rise and fall periodically during the year. In addition, the seasonal variation is increasing over time; that is, the disparity between the low sales periods and the high sales periods is growing. A final observation is that the seasonality exhibits a degree of randomness; that is, it is not identical for every year. Randomness by definition cannot be forecasted with certainty. Consequently, for time series and other forecasts, we need to recognize that the data are unpredictable. We do so by providing a prediction interval for the forecasted value. As an example, we forecast tomorrow’s temperature to be 80 degrees Fahrenheit based on past temperature data. We provide a prediction interval of 78-82 degrees with 95% probability. Put simply, we are saying there is a 95% probability that the actual temperature will fall between 78 and 82 degrees. Let’s examine time series analysis in a bit more detail. The first goal of the time series is to discover patterns and trends. The second is to create the best possible forecast from those observations. Identifying trends helps us to understand the dependencies between future data and observations of the past. Identifying those dependencies enables us to forecast future trends. Uncovering the structure thus reduces to uncovering the components of the time series. If we decompose the time series of total number of bikes sold into its components—trend, seasonality, randomness, and cycles—and if we can identify a structure in those components, then we can make an accurate forecast. Number of bikes sold = trend + seasonal + random + cycle Trend In Chapter 6 we defined trend as the tendency of the mean of the data to increase, decrease, or stay the same over time. Trend is often referred to as the direction of the data change. For instance, if values are trending upward over time, then the trend is “growing,” “increasing,” or “positive.” If the opposite is true, then the trend is “shrinking,” “declining,” or “negative.” The trend in Figure 10-3 is positive: On average, sales of bicycles are increasing over time. This positive trend is emphasized on the chart by the addition of a trend line. The trend line is computed using one of the following methods: Average: We compute the mean of all of the data points, and we then plot the mean as the trend line. This approach is overly simplistic, but it is a starting point. Semi-average: We split the data points into two segments and take the average of each segment. We then draw a line between the two averages, thereby creating a trend line. Moving average: As the name suggests, instead of using just two averages (as in semi-average above), a moving average is the local average over several periods, or time intervals. The number of periods can be three or five or seven or more. As an example, using a threeperiod moving average, we would compute the average of periods 1, 2, and 3 and then plot that value. Next, we would compute the average of 2, 3, and 4 and plot that value. We would then compute the average of 3, 4, and 5, and so on. Finally, we would connect the moving averages to create the trend line. Least square fit: We fit a line to the data points in such a way that the sum of the deviations—that is, the differences between the line and the data points—is zero. We then square the deviations and sum these values. Squaring the deviations eliminates the negative numbers, which otherwise could offset the positive numbers. We repeat this process until we find the line with the smallest sum of the squares. This line indicates the closest-fitting line and is shown as a dotted line in Figure 10-3. FIGURE 10-3: TREND LINE Seasonality Now that we have identified the overall sales trend, we want to determine whether the data reveal any seasonality. Seasonality refers to a pattern of regular periodic fluctuations in the data over time. These patterns are considered “seasons.” Seasons can be any period of time such as months, quarters, calendar seasons, and weeks. Alternatively, they can be a specific time period such as the holiday season. Identifying seasonality is a basic function of trend analysis. Figure 10-4 illustrates a time series that displays distinct seasonality. The values exhibit definite highs and lows over time. The pattern repeats regularly: Sales are lowest each November and January and highest in the summer months. The figure also reveals a positive trend, meaning the average value is increasing. Average sales in 2012 are greater than 2011, sales in 2013 are greater than 2012, and so on. FIGURE 10-4: SEASONALITY Residual To better understand the structure of a time series, we can remove (subtract) the trend component and the seasonal component from the original data series. The remaining data are called the residual or irregular data. These data are the random, unpredictable part of the dataset. Irregular data cannot be avoided in real-world scenarios. As long as the value of the irregular component is not too large, then the prediction of the other components can be effectively utilized in forecasting. Figure 10-5 illustrates a time series decomposed into three components. The chart on the top presents the observed data. The second chart is the trend, which was estimated using one of the techniques discussed above. Then the trend data were removed from the original series, leaving the seasonal component. Finally, both the trend and the seasonal data were subtracted from the original data. The remaining data display the irregular or random component of the dataset. Trend and seasonality are not always sufficient to unravel time series data. After subtracting these elements, the resulting randomness in the data may still be too high to enable accurate forecasting or to offer insights into the data. FIGURE 10-5: IRREGULAR DATA (WESSA, P. 2015, FREE STATISTICS SOFTWARE, OFFICE FOR RESEARCH DEVELOPMENT AND EDUCATION, VERSION 1.1.23-R7, WWW.WESSA.NET) Cycle Another data pattern that we can identify using trend analysis is a cycle. A cycle is a pattern that displays highs and lows outside or in addition to the seasonal highs and lows. In contrast to the fixed period of seasonality, the length of a cycle does not need to be constant. Some cycles last longer than others. Nevertheless, the cyclic pattern should be easy to identify (although hard to forecast). FIGURE 10-6: CYCLES IN THE HOUSING INDUSTRY Figure 10-6 charts the cyclic nature of housing sales over time. The figure reveals a definite seasonality: House sales are greater during the spring and summer months than during the fall and winter. In addition, the real estate market experiences cycles of boom and bust. The periods of these cycles vary. The difference between seasonality and cycles is that seasonality involves short-term, regular highs and lows, whereas cycles are long-term, uneven highs and lows. Examine the growth and shrinkage patterns in Figure 10-6. From 1991-2006 housing sales grew to a peak (high) followed by a steep decline in 2007 during the housing “bust.” The period from 2007-2012 displays a trough (low) during a national recession, followed by an ongoing recovery. FORECASTING USING EXPONENTIAL SMOOTHING To decompose a time series into its components—trends, seasonality, cycles, and randomness—we use mathematical methods such as exponential smoothing. Smoothing produces flattened data; that is, data without all of the highs and lows. We can also employ this technique to make forecasts. We will focus on forecasting using seasonality, trend and randomness. Cyclic forecasting is beyond the scope of this book. In exponential smoothing we assign weights to data points. The essence of exponential smoothing is to value older data points with exponentially decreasing weights. In other words, we assign less credibility to the older data than to the newer data. The assumption is that more recent data more accurately forecast the future than older data. By comparison, in the simple moving average method we weigh all data points equally. There are three types of exponential smoothing techniques: single, double, and triple. We examine all three below. Single exponential smoothing uses a data smoothing parameter referred to as α (alpha). This parameter α represents smoothing of the time series. Dampening or smoothing determines how much influence the older data in the time series have on current values. All α values vary between 0 and 1. The smoothing factor determines how much influence older data have on newer data. In other words, the higher the smoothing factor α (closer to 1), the faster the older data points decay in terms of influence on the forecast. A smaller α provides more smoothing by giving recent data less weight than the older data. A larger α provides less smoothing because it gives recent data a higher weight resulting in a forecast that more closely resembles current data than older data. The α smoothing equation is represented below. The variable is the forecasted value of y at time x, and α is the smoothing factor. The reason this technique is called single exponential smoothing is because it uses a single damping factor. It is an exponential smoothing because older data points are assigned exponentially smaller weights. We can expand the equation by substituting the value for Combining the two equations we get You can see that points older in time are multiplied by exponentially smaller values; specifically, the square of a number below 1. Thus, the influence of older data points decreases exponentially. The rate of decay depends on the choice of α. Single exponential smoothing is an effective technique for working with data that are purely random with no trends or seasonality. It should not be used for data that have an inherent trend or seasonality. In double exponential smoothing a second parameter β (beta) is added to α. The β parameter, called the trend smoothing factor, functions similarly to the α factor in that (a) it varies between 0 and 1 and (b) values closer to 1 give more weight to recent data. In other words, the recent trends become more important than trends further in the past. Double exponential smoothing is utilized for data that exhibit trends but not seasonality. Triple exponential smoothing provides a means for decomposing data that have both trends and seasonality. It introduces a third parameter γ (gamma), which is the seasonal smoothing factor. In this equation, m is the number of time periods into the future we want to forecast, and l is length of the season. In cases where γ = 0, triple exponential smoothing simplifies to double exponential smoothing. If γ = 0 and β=0 then it simplifies to single exponential smoothing. These equations taken together are called the HoltWinters technique for time series analysis. We present an example of the results of the Holt-Winters technique in Figure 10-7. The actual values are charted in black, and triple exponential smoothed values are charted in red. FIGURE 10-7: HOLT-WINTERS METHOD (WESSA, P. 2015, FREE STATISTICS SOFTWARE, OFFICE FOR RESEARCH DEVELOPMENT AND EDUCATION, VERSION 1.1.23-R7, WWW.WESSA.NET) Figure 10-8 charts the future forecast of this same time series. This chart seems to capture the trend and seasonality quite well. Remember, however, that the irregular component of a time series cannot be forecast accurately. Therefore, a probability range is provided for the forecast. In Figure 10-8 the blue lines are the 95% upper and lower bound for prediction probability. In other words, the likelihood that the actual value will fall between the blue lines is 95%. The further into the future we run the model, the more uncertain the results become, and the larger the range of the predicted value(s). At some point the error could be as large as the value itself, as illustrated in Figure 10-9. As the error intervals grow quite large, the forecast becomes less useful or not useful at all. FIGURE 10-8: HOLT-WINTERS FORECAST (WESSA, P. 2015, FREE STATISTICS SOFTWARE, OFFICE FOR RESEARCH DEVELOPMENT AND EDUCATION, VERSION 1.1.23-R7, WWW.WESSA.NET) FIGURE 10-9: LARGE ERROR FOR FUTURE FORECAST (WESSA, P. 2015, FREE STATISTICS SOFTWARE, OFFICE FOR RESEARCH DEVELOPMENT AND EDUCATION, VERSION 1.1.23-R7, WWW.WESSA.NET) Exponential time series smoothing analysis provides a relatively easy-tounderstand approach for short-term forecasting. In addition, the HoltWinters technique is available as an algorithm in many predictive analysis software tools, enabling users to perform the analysis without becoming bogged down in the math. Analytics in Practice 10-1: Trend Analysis and Canadian Beer The Canadian government publishes statistics at its websites on commerce, demographics, and pretty much anything you would like to know about the country. We were interested in exploring data regarding the beer sales. The line chart in Figure 10-10 illustrates beer sales by month from June 2014 through July 2018 based on public data we web scraped from YCharts. The chart clearly indicates seasonality: Sales increase in December and during the short summer months between June and August. Our Canadian colleague has suggested that the spike in December is due to holiday celebrations and family gatherings. FIGURE 10-10: CANADIAN BEER SALES We wanted to forecast sales of beer in Canada into the future months. Therefore, we created a triple exponential smoothing model, which is represented in Figure 10-11. The green trend line indicates forecasted sales values through July 2020, and the blue bars show actual sales through July 2018. The trend line and the bars are relatively close together, which indicates that triple exponential smoothing was able to fit the data fairly accurately. An enterprising beer distributor might plan their sales promotions based on this forecast. FIGURE 10-11: TRIPLE EXPONENTIAL SMOOTHING – CANADIAN BEER Sales Now that you understand time series analysis and exponential smoothing techniques for forecasting, you are ready to prepare a forecast of sales and profit for Nina. Remember, however, that there are also some unanswered questions about the intended price increase—questions about who will buy at the higher price and whether other sales will be affected by the price change. In the following chapter we will learn how to answer those questions by making predictions. SUMMARY In this chapter we discussed the use of time series analysis to identify patterns, trends, and seasonality as well as the modeling techniques that enable us to separate the random values in a time series from those we can explain. We examined a real-life example of forecasting for Canadian beer sales in addition to the forecasting issues presented by the employees at GB. This eBook is licensed to Siddharth Kothamasu, [email protected]
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.