How do you describe the importance of data in analytics? Can one think of analytics without data? Explain.
Your analysis should take on a 3-paragraph format; Define, explain in detail, then present an actual example via research. Your paper must provide in-depth analysis of all the topics presented:
> How do you describe the importance of data in analytics?
> Can one think of analytics without data? Explain.
> Where does the data for business analytics come from?
> What are the sources and the nature of that incoming data?
> What are the most common metrics that make for analytics-ready data?
> Why is the original/raw data not readily usable by analytics tasks?
> How do you visualize the data?
Need 7-8 pages in APA format with introduction and conclusion. Must include minimum of 8 peer-reviewed citations.
153
LEARNING OBJECTIVES
Nature of Data, Statistical Modeling, and Visualization
■■ Understand the nature of data as they relate to business intelligence (BI) and analytics
■■ Learn the methods used to make real-world data analytics ready
■■ Describe statistical modeling and its relationship to business analytics
■■ Learn about descriptive and inferential statistics ■■ Define business reporting and understand its historical evolution
■■ Understand the importance of data/information visualization
■■ Learn different types of visualization techniques ■■ Appreciate the value that visual analytics brings to business analytics
■■ Know the capabilities and limitations of dashboards
I n the age of Big Data and business analytics in which we are living, the importance of data is undeniable. Newly coined phrases such as “data are the oil,” “data are the new bacon,” “data are the new currency,” and “data are the king” are further stress-
ing the renewed importance of data. But the type of data we are talking about is obvi- ously not just any data. The “garbage in garbage out—GIGO” concept/principle applies to today’s Big Data phenomenon more so than any data definition that we have had in the past. To live up to their promise, value proposition, and ability to turn into insight, data have to be carefully created/identified, collected, integrated, cleaned, transformed, and properly contextualized for use in accurate and timely decision making.
Data are the main theme of this chapter. Accordingly, the chapter starts with a de- scription of the nature of data: what they are, what different types and forms they can come in, and how they can be preprocessed and made ready for analytics. The first few sections of the chapter are dedicated to a deep yet necessary understanding and process- ing of data. The next few sections describe the statistical methods used to prepare data as input to produce both descriptive and inferential measures. Following the statistics sections are sections on reporting and visualization. A report is a communication artifact
3 C H A P T E R
M03_SHAR1552_11_GE_C03.indd 153 07/01/20 4:33 PM
154 Part I • Introduction to Analytics and AI
prepared with the specific intention of converting data into information and knowledge and relaying that information in an easily understandable/digestible format. Today, these reports are visually oriented, often using colors and graphical icons that collectively look like a dashboard to enhance the information content. Therefore, the latter part of the chapter is dedicated to subsections that present the design, implementation, and best practices regarding information visualization, storytelling, and information dashboards.
This chapter has the following sections:
3.1 Opening Vignette: SiriusXM Attracts and Engages a New Generation of Radio Consumers with Data-Driven Marketing 154
3.2 Nature of Data 157 3.3 Simple Taxonomy of Data 161 3.4 Art and Science of Data Preprocessing 165 3.5 Statistical Modeling for Business Analytics 175 3.6 Regression Modeling for Inferential Statistics 187 3.7 Business Reporting 199 3.8 Data Visualization 202 3.9 Different Types of Charts and Graphs 207
3.10 Emergence of Visual Analytics 212 3.11 Information Dashboards 218
3.1 OPENING VIGNETTE: SiriusXM Attracts and Engages a New Generation of Radio Consumers with Data-Driven Marketing
SiriusXM Radio is a satellite radio powerhouse, the largest radio company in the world with $3.8 billion in annual revenues and a wide range of hugely popular music, sports, news, talk, and entertainment stations. The company, which began broadcasting in 2001 with 50,000 subscribers, had 18.8 million subscribers in 2009, and today has nearly 29 million.
Much of SiriusXM’s growth to date is rooted in creative arrangements with automo- bile manufacturers; today, nearly 70 percent of new cars are SiriusXM enabled. Yet the company’s reach extends far beyond car radios in the United States to a worldwide pres- ence on the Internet, on smartphones, and through other services and distribution chan- nels, including SONOS, JetBlue, and Dish.
BUSINESS CHALLENGE
Despite these remarkable successes, changes in customer demographics, technology, and a competitive landscape over the past few years have posed a new series of business challenges and opportunities for SiriusXM. Here are some notable ones:
• As its market penetration among new cars increased, the demographics of its buy- ers changed, skewing toward younger people with less discretionary income. How could SiriusXM reach this new demographic?
• As new cars become used cars and change hands, how could SiriusXM identify, engage, and convert second owners to paying customers?
• With its acquisition of the connected vehicle business from Agero—the leading pro- vider of telematics in the U.S. car market—SiriusXM gained the ability to deliver its service via satellite and wireless networks. How could it successfully use this acqui- sition to capture new revenue streams?
M03_SHAR1552_11_GE_C03.indd 154 07/01/20 4:33 PM
Chapter 3 • Nature of Data, Statistical Modeling, and Visualization 155
PROPOSED SOLUTION: SHIFTING THE VISION TOWARD DATA-DRIVEN MARKETING
SiriusXM recognized that to address these challenges, it would need to become a high- performance, data-driven marketing organization. The company began making that shift by establishing three fundamental tenets. First, personalized interactions—not mass marketing—would rule the day. The company quickly understood that to conduct more personalized marketing, it would have to draw on past history and interactions as well as on a keen understanding of the consumer’s place in the subscription life cycle.
Second, to gain that understanding, information technology (IT) and its external tech- nology partners would need the ability to deliver integrated data, advanced analytics, integrated marketing platforms, and multichannel delivery systems.
And third, the company could not achieve its business goals without an integrated and consistent point of view across the company. Most important, the technology and business sides of SiriusXM would have to become true partners to best address the chal- lenges involved in becoming a high-performance marketing organization that draws on data-driven insights to speak directly with consumers in strikingly relevant ways.
Those data-driven insights, for example, would enable the company to differentiate between consumers, owners, drivers, listeners, and account holders. The insights would help SiriusXM to understand what other vehicles and services are part of each household and cre- ate new opportunities for engagement. In addition, by constructing a coherent and reliable 360-degree view of all its consumers, SiriusXM could ensure that all messaging in all cam- paigns and interactions would be tailored, relevant, and consistent across all channels. The important bonus is that a more tailored and effective marketing is typically more cost-efficient.
IMPLEMENTATION: CREATING AND FOLLOWING THE PATH TO HIGH-PERFORMANCE MARKETING
At the time of its decision to become a high-performance marketing company, SiriusXM was working with a third-party marketing platform that did not have the capacity to support SiriusXM’s ambitions. The company then made an important, forward-thinking decision to bring its marketing capabilities in-house—and then carefully plotted what it would need to do to make the transition successfully.
1. Improve data cleanliness through improved master data management and governance. Although the company was understandably impatient to put ideas into action, data hygiene was a necessary first step to create a reliable window into consumer behavior.
2. Bring marketing analytics in-house and expand the data warehouse to enable scale and fully support integrated marketing analytics.
3. Develop new segmentation and scoring models to run in databases, eliminating la- tency and data duplication.
4. Extend the integrated data warehouse to include marketing data and scoring, lever- aging in-database analytics.
5. Adopt a marketing platform for campaign development. 6. Bring all of its capability together to deliver real-time offer management across all
marketing channels: call center, mobile, Web, and in-app.
Completing those steps meant finding the right technology partner. SiriusXM chose Teradata because its strengths were a powerful match for the project and company. Teradata offered the ability to:
• Consolidate data sources with an integrated data warehouse (IDW), advanced ana- lytics, and powerful marketing applications.
• Solve data-latency issues.
M03_SHAR1552_11_GE_C03.indd 155 07/01/20 4:33 PM
156 Part I • Introduction to Analytics and AI
• Significantly reduce data movement across multiple databases and applications. • Seamlessly interact with applications and modules for all marketing areas. • Scale and perform at very high levels for running campaigns and analytics in-database. • Conduct real-time communications with customers. • Provide operational support, either via the cloud or on premise.
This partnership has enabled SiriusXM to move smoothly and swiftly along its road map, and the company is now in the midst of a transformational, five-year process. After establishing its strong data governance process, SiriusXM began by implementing its IDW, which allowed the company to quickly and reliably operationalize insights through- out the organization.
Next, the company implemented Customer Interaction Manager—part of the Teradata Integrated Marketing Cloud, which enables real-time, dialog-based customer interaction across the full spectrum of digital and traditional communication channels. SiriusXM also will incorporate the Teradata Digital Messaging Center.
Together, the suite of capabilities allows SiriusXM to handle direct communications across multiple channels. This evolution will enable real-time offers, marketing messages, and recommendations based on previous behavior.
In addition to streamlining the way it executes and optimizes outbound marketing activities, SiriusXM is also taking control of its internal marketing operations with the implementation of Marketing Resource Management, also part of the Teradata Integrated Marketing Cloud. The solution will allow SiriusXM to streamline workflow, optimize mar- keting resources, and drive efficiency through every penny of its marketing budget.
RESULTS: REAPING THE BENEFITS
As SiriusXM continues its evolution into a high-performance marketing organization, it already is benefiting from its thoughtfully executed strategy. Household-level consumer insights and a complete view of marketing touch strategy with each consumer enable SiriusXM to create more targeted offers at the household, consumer, and device levels. By bringing the data and marketing analytics capabilities in-house, SiriusXM achieved the following:
• Campaign results in near real time rather than four days, resulting in massive reduc- tions in cycle times for campaigns and the analysts who support them.
• Closed-loop visibility, allowing the analysts to support multistage dialogs and in-campaign modifications to increase campaign effectiveness.
• Real-time modeling and scoring to increase marketing intelligence and sharpen cam- paign offers and responses at the speed of their business.
Finally, SiriusXM’s experience has reinforced the idea that high-performance market- ing is a constantly evolving concept. The company has implemented both processes and the technology that give it the capacity for continued and flexible growth.
u QUESTIONS FOR THE OPENING VIGNETTE
1. What does SiriusXM do? In what type of market does it conduct its business?
2. What were its challenges? Comment on both technology and data-related challenges.
3. What were the proposed solutions?
4. How did the company implement the proposed solutions? Did it face any implementation challenges?
5. What were the results and benefits? Were they worth the effort/investment?
6. Can you think of other companies facing similar challenges that can potentially benefit from similar data-driven marketing solutions?
M03_SHAR1552_11_GE_C03.indd 156 07/01/20 4:33 PM
Chapter 3 • Nature of Data, Statistical Modeling, and Visualization 157
WHAT WE CAN LEARN FROM THIS VIGNETTE
Striving to thrive in a fast-changing competitive industry, SiriusXM realized the need for a new and improved marketing infrastructure (one that relies on data and analytics) to effectively communicate its value proposition to its existing and potential custom- ers. As is the case in any industry, success or mere survival in entertainment depends on intelligently sensing the changing trends (likes and dislikes) and putting together the right messages and policies to win new customers while retaining the existing ones. The key is to create and manage successful marketing campaigns that resonate with the target population of customers and have a close feedback loop to adjust and modify the message to optimize the outcome. At the end, it was all about the preci- sion in the way that SiriusXM conducted business: being proactive about the changing nature of the clientele and creating and transmitting the right products and services in a timely manner using a fact-based/data-driven holistic marketing strategy. Source identification, source creation, access and collection, integration, cleaning, transforma- tion, storage, and processing of relevant data played a critical role in SiriusXM’s suc- cess in designing and implementing a marketing analytics strategy as is the case in any analytically savvy successful company today, regardless of the industry in which they are participating.
Sources: C. Quinn, “Data-Driven Marketing at SiriusXM,” Teradata Articles & News, 2016. http://bigdata. teradata.com/US/Articles-News/Data-Driven-Marketing-At-SiriusXM/ (accessed August 2016); “SiriusXM Attracts and Engages a New Generation of Radio Consumers.” http://assets.teradata.com/resourceCenter/ downloads/CaseStudies/EB8597.pdf?processed=1.
3.2 NATURE OF DATA
Data are the main ingredient for any BI, data science, and business analytics initiative. In fact, they can be viewed as the raw material for what popular decision technolo- gies produce—information, insight, and knowledge. Without data, none of these tech- nologies could exist and be popularized—although traditionally we have built analytics models using expert knowledge and experience coupled with very little or no data at all; however, those were the old days, and now data are of the essence. Once perceived as a big challenge to collect, store, and manage, data today are widely considered among the most valuable assets of an organization with the potential to create invaluable insight to better understand customers, competitors, and the business processes.
Data can be small or very large. They can be structured (nicely organized for computers to process), or they can be unstructured (e.g., text that is created for humans and hence not readily understandable/consumable by computers). Data can come in small batches continuously or can pour in all at once as a large batch. These are some of the characteristics that define the inherent nature of today’s data, which we often call Big Data. Even though these characteristics of data make them more challenging to process and consume, they also make the data more valuable because the character- istics enrich them beyond their conventional limits, allowing for the discovery of new and novel knowledge. Traditional ways to manually collect data (via either surveys or human-entered business transactions) mostly left their places to modern-day data collection mechanisms that use Internet and/or sensor/radio frequency identification (RFID)–based computerized networks. These automated data collection systems are not only enabling us to collect more volumes of data but also enhancing the data quality and integrity. Figure 3.1 illustrates a typical analytics continuum—data to analytics to actionable information.
M03_SHAR1552_11_GE_C03.indd 157 07/01/20 4:33 PM
158 Part I • Introduction to Analytics and AI
Although their value proposition is undeniable, to live up their promise, data must comply with some basic usability and quality metrics. Not all data are useful for all tasks, obviously. That is, data must match with (have the coverage of the specifics for) the task for which they are intended to be used. Even for a specific task, the relevant data on hand need to comply with the quality and quantity requirements. Essentially, data have to be analytics ready. So what does it mean to make data analytics ready? In addition to its relevancy to the problem at hand and the quality/quantity requirements, it also has to have a certain structure in place with key fields/variables with properly normalized val- ues. Furthermore, there must be an organization-wide agreed-on definition for common variables and subject matters (sometimes also called master data management), such as how to define a customer (what characteristics of customers are used to produce a holis- tic enough representation to analytics) and where in the business process the customer- related information is captured, validated, stored, and updated.
Sometimes the representation of the data depends on the type of analytics being employed. Predictive algorithms generally require a flat file with a target variable, so mak- ing data analytics ready for prediction means that data sets must be transformed into a flat-file format and made ready for ingestion into those predictive algorithms. It is also imperative to match the data to the needs and wants of a specific predictive algorithm and/or a software tool. For instance, neural network algorithms require all input variables
UOB 1.0
X
UOB 2.2
UOB 2.1
UOB 3.0
ERP CRM SCM
Business Process
Google+
Linked In
YouTube
Tumblr Flicker
Instagram Pinterest
Snapchat
Reddit Foursquare
Internet/Social Media
Machines/Internet of Things
Data Storage Analytics
Data Protection
Cloud Storage and Computing
Patt ern
s
Trends
Knowledge
Applications
End Users
Validate
Built
Test
X
FIGURE 3.1 A Data to Knowledge Continuum.
M03_SHAR1552_11_GE_C03.indd 158 07/01/20 4:33 PM
Chapter 3 • Nature of Data, Statistical Modeling, and Visualization 159
to be numerically represented (even the nominal variables need to be converted into pseudo binary numeric variables), whereas decision tree algorithms do not require such numerical transformation—they can easily and natively handle a mix of nominal and nu- meric variables.
Analytics projects that overlook data-related tasks (some of the most critical steps) often end up with the wrong answer for the right problem, and these unintentionally cre- ated, seemingly good answers could lead to inaccurate and untimely decisions. Following are some of the characteristics (metrics) that define the readiness level of data for an ana- lytics study (Delen, 2015; Kock, McQueen, & Corner, 1997).
• Data source reliability. This term refers to the originality and appropriateness of the storage medium where the data are obtained—answering the question of “Do we have the right confidence and belief in this data source?” If at all possible, one should always look for the original source/creator of the data to eliminate/mitigate the possibilities of data misrepresentation and data transformation caused by the mishandling of the data as they moved from the source to destination through one or more steps and stops along the way. Every move of the data creates a chance to unintentionally drop or reformat data items, which limits the integrity and perhaps true accuracy of the data set.
• Data content accuracy. This means that data are correct and are a good match for the analytics problem—answering the question of “Do we have the right data for the job?” The data should represent what was intended or defined by the original source of the data. For example, the customer’s contact information recorded within a database should be the same as what the customer said it was. Data accuracy will be covered in more detail in the following subsection.
• Data accessibility. This term means that the data are easily and readily obtainable— answering the question of “Can we easily get to the data when we need to?” Access to data can be tricky, especially if they are stored in more than one location and storage medium and need to be merged/transformed while accessing and obtaining them. As the traditional relational database management systems leave their place (or coexist with a new generation of data storage mediums such as data lakes and Hadoop infra- structure), the importance/criticality of data accessibility is also increasing.
• Data security and data privacy. Data security means that the data are secured to allow only those people who have the authority and the need to access them and to prevent anyone else from reaching them. Increasing popularity in educational degrees and certificate programs for Information Assurance is evidence of the criti- cality and the increasing urgency of this data quality metric. Any organization that maintains health records for individual patients must have systems in place that not only safeguard the data from unauthorized access (which is mandated by federal laws such as the Health Insurance Portability and Accountability Act [HIPAA]) but also accurately identify each patient to allow proper and timely access to records by authorized users (Annas, 2003).
• Data richness. This means that all required data elements are included in the data set. In essence, richness (or comprehensiveness) means that the available variables portray a rich enough dimensionality of the underlying subject matter for an accurate and worthy analytics study. It also means that the information content is complete (or near complete) to build a predictive and/or prescriptive analytics model.
• Data consistency. This means that the data are accurately collected and com- bined/merged. Consistent data represent the dimensional information (variables of interest) coming from potentially disparate sources but pertaining to the same sub- ject. If the data integration/merging is not done properly, some of the variables of different subjects could appear in the same record—having two different patient
M03_SHAR1552_11_GE_C03.indd 159 07/01/20 4:33 PM
160 Part I • Introduction to Analytics and AI
records mixed up; for instance, this could happen while merging the demographic and clinical test result data records.
• Data currency/data timeliness. This means that the data should be up-to-date (or as recent/new as they need to be) for a given analytics model. It also means that the data are recorded at or near the time of the event or observation so that the time delay–related misrepresentation (incorrectly remembering and encoding) of the data is prevented. Because accurate analytics relies on accurate and timely data, an essential characteristic of analytics-ready data is the timeliness of the creation and access to data elements.
• Data granularity. This requires that the variables and data values be defined at the lowest (or as low as required) level of detail for the intended use of the data. If the data are aggregated, they might not contain the level of detail needed for an analytics algorithm to learn how to discern different records/cases from one another. For example, in a medical setting, numerical values for laboratory results should be recorded to the appropriate decimal place as required for the meaning- ful interpretation of test results and proper use of those values within an analytics algorithm. Similarly, in the collection of demographic data, data elements should be defined at a granular level to determine the differences in outcomes of care among various subpopulations. One thing to remember is that the data that are aggregated cannot be disaggregated (without access to the original source), but they can easily be aggregated from its granular representation.
• Data validity. This is the term used to describe a match/mismatch between the actual and expected data values of a given variable. As part of data definition, the acceptable values or value ranges for each data element must be defined. For example, a valid data definition related to gender would include three values: male, female, and unknown.
• Data relevancy. This means that the variables in the data set are all relevant to the study being conducted. Relevancy is not a dichotomous measure (whether a variable is relevant or not); rather, it has a spectrum of relevancy from least relevant to most relevant. Based on the analytics algorithms being used, one can choose to include only the most relevant information (i.e., variables) or, if the algorithm is capable enough to sort them out, can choose to include all the relevant ones regard- less of their levels. One thing that analytics studies should avoid is including totally irrelevant data into the model building because this could contaminate the informa- tion for the algorithm, resulting in inaccurate and misleading results.
The above-listed characteristics are perhaps the most prevailing metrics to keep up with; the true data quality and excellent analytics readiness for a specific application do- main would require different levels of emphasis to be placed on these metric dimensions and perhaps add more specific ones to this collection. The following section will delve into the nature of data from a taxonomical perspective to list and define different data types as they relate to different analytics projects.
u SECTION 3.2 REVIEW QUESTIONS
1. How do you describe the importance of data in analytics? Can we think of analytics without data?
2. Considering the new and broad definition of business analytics, what are the main inputs and outputs to the analytics continuum?
3. Where do the data for business analytics come from?
4. In your opinion, what are the top three data-related challenges for better analytics?
5. What are the most common metrics that make for analytics-ready data?
M03_SHAR1552_11_GE_C03.indd 160 07/01/20 4:33 PM
Chapter 3 • Nature of Data, Statistical Modeling, and Visualization 161
3.3 SIMPLE TAXONOMY OF DATA
The term data (datum in singular form) refers to a collection of facts usually obtained as the result of experiments, observations, transactions, or experiences. Data can consist of numbers, letters, words, images, voice recordings, and so on, as measurements of a set of variables (characteristics of the subject or event that we are interested in studying). Data are often viewed as the lowest level of abstraction from which information and then knowledge is derived.
At the highest level of abstraction, one can classify data as structured and unstruc- tured (or semistructured). Unstructured data/semistructured data are composed of any combination of textual, imagery, voice, and Web content. Unstructured/semistruc- tured data will be covered in more detail in the text mining and Web mining chapter. Structured data are what data mining algorithms use and can be classified as categori- cal or numeric. The categorical data
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.