November 30, 2022

Choose a case study either from page 2 of the attached ‘CASE STUDY’ document or another case study found on the web regarding Sports Analytics (i.e how Analytics is used in Physical The

Attached Files:

Analytic Summary.pptx (359.25 KB)
Theory and Methods Basics.pptx (363.3 KB)
example Case+Study+Analysis_+Analyzing+Player+Behavior+In+The+NFL+Runningback.pdf (251.757 KB)
example Sports+analytics+case+study+project.pdf (386.891 KB)

YOU ARE NOT CREATING A MODEL – YOU ARE COMMUNICATING HOW YOU WOULD TACKLE A PROBLEM USING THE ANALYTICS STEPS LISTED IN THE "FINAL PROJECT" DOCUMENT. THIS IS THEORY-BASED.1. Watch the video: https://365datascience.com/career-advice/expert-interviews/data-use-cases-sports-analytics/2. Watch the video to write a case study/report: https://www.youtube.com/watch?v=beGwPSAwD4o&t=2s3. Choose a case study either from page 2 of the attached "CASE STUDY" document or another case study found on the web regarding Sports Analytics (i.e how Analytics is used in Physical Therapy, see Exercise Analytics). You should look for applications where you can apply analytics and machine learning knowledge to improve some aspects of the Sports data lifecycle (see the Introduction to Sports Analytics course). Feel free to speak with me to talk through your idea if you pull it from the Internet or another class. other ideas: https://www.samford.edu/sports-analytics/case-studies https://towardsdatascience.com/scope-of-analytics-in-sports-world-37ed09c39860 https://www.digitaldividedata.com/clients/case-studies/sports-analytics4. Create a solid project on how sports analytics is used in your specific use case. You should speak to "How data science is applied in the broad field of Sports."Data: a. You can create fake data in Excel (we create fake data in class at times to understand the topic we are discussing) and add it as a snapshot to the presentation. b. You can take a screenshot of data on the web that supports how data is being used in your area. Look for, or think of examples from your use case that ties in topics of what you have learned in the class.5. Presentation should be presented in a Word document, PDF, or PPTX.6. two examples of student submissions are attached.7. Your work should be original else you will be penalized. You can reference articles/quotes but ensure the reader knows you're referencing these quotes/articles.

The Endgame: Presentation

Use Analytic Plan to Guide Final Presentation

Source: EMC^2 Dell DSA Module

Componets of Analytic Plan

Example – Retail Banking

Discovery Business Problem Framed

How do we identify churn/no churn for a customer?

Initial Hypothesis

Transaction volume and type are key predictors of churn rates.

Data & Scope

7 months of customer account history

Model Planning – Analytic Technique

Logistic regression to identify most influential factors predicting churn.

Results & Key Findings

Once customers stop using their accounts for gas and groceries, they will soon

erode their accounts and churn.

If customers use their debit card fewer than 6 times per month, they will leave

the bank within 55 days.

Business Impact

If we can target customers who are high-risk for churn, we can reduce customer

attrition by 22%. This would save $2.3 million in lost customer revenue and avoid

$1.1 million in new customer acquisition costs each year.

Key Aspects of Final Presentation Material

Source: EMC^2 Dell DSA Module

Develop Core Material you can use to Deliver Presentations to 2 Main Audiences

Source: EMC^2 Dell DSA Module

image4.png

image3.png

image5.emf

image6.emf

image2.png

Understanding the Strength of a Relationship

One primary goal of analytic methods… Understanding the relationships between variables.

Player performance and team performance

Performance in a given year related to the performance in the following year

Linear Relationships: Correlation Coefficient

For each subject, we measure two variables

Baseball player -> Number of hits and Number of runs scored (in a season)

(X) (Y)

A simple way to gain insight is a scatterplot

(X,Y) is plotted for each subject

Some cases the pattern is vague

Some cases the relationship is strong, the value of X almost completely determines the value of Y

The relationship is weak value of X giving a general indication of the value of Y

Measures to reduce the properties of such a relationship to one number that is useful as a simple summary of the relationship between variables.

Linear Relationships: Correlation Coefficients

Scatterplots

Strong relationship:

the value of X almost completely determines the value of Y

Weak relationship:

the value of X giving, at best, a general indication regarding the value of Y

Quantifying Association

To describe the relationship between two continuous variables, use:

Correlation analysis

Measures strength and direction of the linear relationship between two variables

Regression analysis

Concerns prediction or estimation of outcome variable, based on value of another variable (or variables)

Correlation Analysis

Plot the data (or have a computer to do so)

Visually inspect the relationship between two continuous variables

Is there a linear relationship (correlation)?

Are there outliers?

Are the distributions skewed?

Correlation Coefficient

Measures the strength and direction of the linear relationship between two variables X and Y

Population correlation coefficient:

Sample correlation coefficient :

(obtained by plugging in sample estimates)

Correlation Coefficient

The correlation coefficient, , takes values between -1 and +1

-1: Perfect negative linear relationship

0: No linear relationship

+1: Perfect positive relationship

Correlation Coefficient

Plot standardized Y versus standardized X

Observe an ellipse (elongated circle)

Correlation is the slope of the major axis

Correlation Notes

Other names for r

Pearson correlation coefficient

Product moment of correlation

Characteristics of r

Measures *linear* association

The value of r is independent of units used to measure the variables

The value of r is sensitive to outliers

r2 tells us what proportion of variation in Y is explained by linear relationship with X

Examples of Correlation Coefficient

Perfect positive correlation, r approx. 1

Perfect negative correlation, r approx. -1

Imperfect positive correlation, 0 < r < 1

Imperfect negative correlation, -1 < r <0

No correlation, r approx. 0

Theory and Methods

You’ve prepared your data: what’s next?

What kind of analysis do you need? Which model is more appropriate for it? …

Datasets

Training set: a set of examples used for learning, where the target value is known.

Validation set: a set of examples used to tune the architecture of a classifier and estimate the error.

Test set: used only to assess the performances of a classifier. It is never used during the training process so that the error on the test set provides an unbiased estimate of the generalization error.

Machine Learning

To learn: to get knowledge of by study, experience,

or being taught.

Types of Learning

Supervised

Unsupervised

Supervised Learning

Training data includes both the input and the desired results.

For some examples the correct results (targets) are known and are given in input to the model during the learning process.

The construction of a proper training, validation and test set is crucial.

These methods are usually fast and accurate.

Have to be able to generalize: give the correct results when new data are given in input without knowing a priori the target.

Unsupervised Learning

The model is not provided with the correct results during the training.

Can be used to cluster the input data in classes on the basis of their statistical properties only.

Cluster significance and labeling.

The labeling can be carried out even if the labels are only available for a small number of objects representative of the desired classes

Theory and Methods

Examine analytic needs and select an appropriate technique based on business objective initial hypothesis; and the data’s structure and volume

Apply some of the more commonly used methods in Analytics solutions

Explain the algorithms and the technical foundations for the commonly used methods

Explain the environment (use case) in which each technique can provide the most value

Use appropriate diagnostic methods to validate the models created

Use R to fit, score and evaluate models

Analytical Methods

Categorization (un-supervised):

K-Means clustering

Association Rules

Regression:

Linear

Logistic

Classification (supervised):

Naïve Bayesian classifier

Decision Trees

Times Series Analysis

Text Analysis

Where “R” we?

We previously reviewed R skills and basic statistics. You have the below guides to assist you.

“R Cookbook” textbook

“SimpleR – Using R for Introductory Statistics”

R cheat sheet

You can use R to:

Generate summary statistics to investigate a data set

Visualize Data

Perform statistical tests to analyze data and evaluate models

Prominent use cases for the method

Algorithms to implement the method

Diagnostics that are most commonly used to evaluate the effectiveness of the method

The reasons to choose (+) and cautions(-) (where the method is most and least effective)

Objectives to know for each method

What kind of Problem do I need to Solve? How do I Solve it?

Applying the Data Analytic Lifecycle

In a typical Data Analytical Problem – you would have gone through:

Phase 1 – Discovery – have the problem framed

Phase 2 – Data Preparation – have the data prepared

Now you need to plan the model and determine the method to be used.

Phase 3 – Model Planning

Have do people generally solve this problem with the kind of data and resources I have?

Does this work well enough? Or do I need to come up with something new?

What are related or analogous problems? How are they solved? Can I do that?

image1.png

image2.svg

.MsftOfcThm_Accent1_Fill { fill:#4472C4; } .MsftOfcThm_Accent1_Stroke { stroke:#4472C4; }

image3.emf

image4.emf

image5.emf

image6.emf

image7.emf

image8.emf

Case Study Analysis: Analyzing Player Behavior In

The NFL Runningback

Simmione Sauls 12-2-21 HKSP-464-01

Introduction

Since 1920, the National Football League (NFL) has been in existence. NFL management, teams, and players have sought a competitive advantage over their opponents since the league's start. With the help of different analytical tools and data sources, teams around the league have found ways to gain the upperhand on their competitors. As years go by, the behavior of players has been one of the harder areas to collect information. Due to the seemingly fast pace and rigorous contact involved with the game, it has never been easy to analyze the behaviors of a football player (such as the effect of fatigue and stress during the course of a game, speed/agility during plays, force taken/given during contact, etc.) and there has always been room for improvement in terms of analysis tools. In modern days AI technology can help make these analysis and calculations that are too tough for the human eye to see.

Phase of Analytics

The analysis of player behavior would be classified as predictive analytics. When investigating

player behavior, teams try to figure out what factors make a difference in a players

performance. Learning and understanding these factors that affect the athlete can help

predict when the player will potentially be at his best and worst, as well as help identify

factors/behaviors that can either help or hurt the performance of the player. The main

question that I am trying to answer by studying these types of analytics uis “what factor is

most associated with how many rushing 100 yards in a game?”

Data Acquisition

For this analysis, data will be taken from three sources. These sources consists of the three

pieces of equipment that every player must wear; helmet, shoulder pads, and cleats. Data will

be collected by speed and force detection devices that are placed in each source. These

devices will be placed in the equipment of running backs across the league, in order to see

what plays the biggest part in their ability to rush for 100 yards or more in a game.

Variable Selection

Aside from physical attributes, there are several variables that factor in when measuring

rushing success as a football player. Of these three factors, the most important are speed,

strength, and control. Speed is self explanatory, it is simply how fast the ball carrier is moving

once they have the ball and is measured by there 40 yard dash time. Strength is more

complex, instead of measuring weight room strength, it measures how much force a player

can deliver/receive while running the ball, and could be measured by a player’s weight.. Lastly,

control measures how many yards a player averages a carry, as well as a players ability to stay

moving after contact as well as their ability to control the ball during contact.

Exploratory Data Analysis

For this analysis, i figured the best way to present and analyze the dataset was with a scatter

plot.Scatter Plots not only give you a vast amount of data in one place, but they also can show

relationships between variables. In my opinion, scatterplots are best used when you have a lot

of numerical data to analyze, because they can show you the correlation between two

variables and whether or not that correlation is a positive or a negative one. See Dataset for

examples.

Model Selection

The ultimate goal of the study is to not only see what factors play in to running backs having successful rushing games, but also to see what attributes have a stronger correlation with rushing success. I choose to use a regression model for this problem because it is the most efficient and simplistic way of seeing the correlations between the y and x variables, and it gives you the values that you need with just a few clicks. My Y variable will be the average number of yards produced in a game, and my x variables will the speed, strength, and control numbers that I collected.

Evaluation Metrics

I would evaluate the predictions of my regression model by paying attention to my p-values. By examining these, I can get a feel for which one of my variables has the greatest correlation, as well as see which variables have little to no correlation at all. Fortunately for my study, all of my variables had some kind of correlation, which means that each variable plays a part in the amount of total yards a game.

Testing

In order to fully test the model, I would need to get a sample group of NFL running backs that are known to produce a lot of yards each game, as well as access to their equipment. I will also need the sensor devices to track the speed and force in the equipment. The players would have to do nothing out of their ordinary routine except make sure the sensors are in the pads and cleats at all times when there is practice or a game. This will help collect unbiased data, because some of the data recorded could be from only games where the player played well if the test is not well monitored.

Conclusion

In conclusion, I believe that the observation and analysis of player behaviors is very necessary in this new era of sports. The speed, strength, and control that a running back has during the course of a game can be forever changing due to reasons that we can’t physically see and therefore can’t measure accurately. However, by analyzing the tangible factors we are able to see what physical traits play the biggest role. Of the three variables that I measured, control while carrying the football had the most significance when it came to determining which variable correlated more with averaging 100 or more yards a game.

Slide 1
Introduction
Phase of Analytics
Data Acquisition
Variable Selection
Exploratory Data Analysis
Slide 7
Model Selection
Evaluation Metrics
Testing
Conclusion

Final Project-Case Study

Zakai Meghoo-Peddie

Professor Phillips

HKSP 464: Sports Analytics

Morehouse College

12/05/2021

Final Project-Case Study

Abstract:

We see numerous representations of what it means to be the favorite team and club to win

it all in the sports world. One of the most common representations in sports is the home field

advantage which is the perceived built-in edge that the team playing at home will have for the

contest. This advantage can be described as non-specific and further broken down into various

sectors such as the outcome of a match, points or goals scored, defensive performance, and

individual player performance. The home-field advantage raises questions and assumptions about

whether the team in this position has some competitive advantage or if this theory is merely a

psychological notion that holds little to no value. In this study, there will be empirical evidence and

research to explain the theory of a home-field advantage in soccer, if any. Methods to be used in

this study will include qualitative and quantitative indicators and descriptors that provide an

analysis of home-field advantage statistics from various leagues at different competitive levels

across the globe.

Figure 1

Final Project-Case Study

The conversation of home-field advantage has been a topic of discussion nearly since the

inception of organized soccer in 1863 in England. Nearly every game since sports media began has

certain buzzwords or mentions of the term "home team" that reflects some sort of impact that being

a home team might have, whether negative or positive. In most cases, fan engagement is expected

to be more positive, impactful, and desirable in the case of a match where a home team is present

as opposed to a meeting at a neutral location for match day. Factors that play a role in fan

engagement include geographic and economic attributes. These factors indicate a somewhat of an

advantage as matches at home tend to have more local spectators, support in ticket sales from team

fans tend to be higher, and merchandise is often purchased at a higher rate when teams have home

matches. All these factors indicate a huge boost in support for the home team through fan

engagement, but they don't point out a clear advantage in home teams' overall match success. If

anything, they might seem more beneficial to a team from a sales and reputation perspective.

For a long time, it was a commonly held belief about the home-field advantage that through

their support, fans elevated athletic performance, thus contributing to the overall home advantage.

However, this theory is only partially correct. It is the referees' job to maintain order and act as a

disciplinarian on the field, and just like the players, she or he has the potential to really change the

outcome of the match. In soccer, the referee can impartially dictate the match by calling fouls and

penalties, adding more time to the match, and issuing red and yellow cards. It is rare that the

decisions of the referees are impartial in nature because human bias will always be present.

According to data compiled from 128 Italian Serie A soccer games, referees were found to

statistically favor home teams. In each game that was analyzed, referees awarded, on average, 2.69

yellow cards, .27 red cards, and .17 penalties for each team.

Final Project-Case Study

However, the home team's average was 2.65 yellow cards, .23 red cards, and .2 penalties,

while the away team averaged 2.73 yellow cards, .31 red cards, and .15 penalties (Scoppa and

Schwartz, 2014). These statistics are intriguing because any of those decisions by the referee has

the potential to alter the outcome of multiple matches. For example, when a player receives a red

card, their team will have to resume the remainder of the match with one less player. Also, when a

team is awarded a penalty kick, nearly 76.8% of the time, a goal is a result (Hawerchuk, 2010).

Furthermore, another study discovered that when adding stoppage or injury time, the

referee's decision tends to go in the way of the home team. In the Bundesliga (German Soccer

League), and the Major League Soccer (MLS), when the home team loses by a goal, on average,

extra time, is 12% versus when they're winning by a goal. More notably, an example of this home-

field advantage bias could be represented by the team, Stuttgart. The average stoppage time added

when winning is 2 minutes and 20 seconds, and when they are losing a match, it is 3 minutes and

41 seconds. The differential in stoppage time is 1 minute and 21 seconds, or 57% (Bialik, 2014).

While these numbers may appear insignificant, it is vital to remember that a goal can be scored in a

matter of seconds. So, if a team is winning, it is beneficial to have less stoppage time, but if they

are losing, the more time, the greater chance they have to score. Additionally, the implications of

these results are worrisome.

Final Project-Case Study

Figure 2

Final Project-Case Study

Although it may not directly affect players and their performances, crowd pressure can

very much so influence the referees. A study in 1993 by Professor Simo Salimen analyzed 56

matches where there was a home team present, but the crowd was neutral, so the fan support was

equal. In this research, Salimen discovered that the home team did statistically perform better than

the visiting team when their fans supported them; even when the opposing crowd supported the

other team, they still managed to score more goals (Salimen, 1993).

In addition, Alan Nevill conducted research in which he divided 40 referees into two

groups of 20. Then he had them watch a replay of a match between Leicester City and Liverpool

that took place in Anfield, otherwise known as Liverpool’s home stadium. The first group watched

the replay with audio while the second group watched the replay in silence. Although both groups

of referees watched the same game, Nevill had them decide whether 47 different incidents were

fouls or not. The first group of referees who watched with audio, where the crowd supported

Liverpool, called about 15.5% fewer fouls against Liverpool as opposed to the silent group of

referees (Ingle, 2013). The results of Salimen’s research contradict the idea that the home crowd

affects players and provides a direct home-field advantage. Instead, Nevill’s experiment is a clear

indicator that crowd behavior can directly influence the decisions of referees, which can shift the

outcome of a match.

Final Project-Case Study

The aforementioned findings demonstrate that home teams have an advantage, but they do

not clearly define how much impact those factors such as referee and fan engagement can have on

the outcome of a match. Essentially, how effective is home advantage? Findings from statistics

from the English Premier League seasons 2009 to 2013 found that a home team's win percentage

was 47% and their away win percentage was nearly 27%. Also, 15 out of 20 of these English

Collepals.com Plagiarism Free Papers

Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.

Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS

Why Hire Collepals.com writers to do your paper?

Quality- We are experienced and have access to ample research materials.

We write plagiarism Free Content

Confidential- We never share or sell your personal information to third parties.

Support-Chat with us today! We are always waiting to answer all your questions.

Choose a case study either from page 2 of the attached ‘CASE STUDY’ document or another case study found on the web regarding Sports Analytics (i.e how Analytics is used in Physical The

image4.png

image3.png

image5.emf

image6.emf

image2.png

image1.png

image2.svg

image3.emf

image4.emf

image5.emf

image6.emf

image7.emf

image8.emf

Related Posts

Choose one code of ethics on pages 56-57 of Reynolds. Discuss the chosen code and write up an analysis of the strengths, weaknesses, and recommended changes.

Using the Abbreviated Budget, compute the Measures of Solvency for Bay City, Texas for 2022.

Create one math story based on a mathematical concept. ?Your initial post should be at least 200 words/numbers or a combination of both? ? Additional readings must be cited, a