Choose a case study either from page 2 of the attached ‘CASE STUDY’ document or another case study found on the web regarding Sports Analytics (i.e how Analytics is used in Physical The
Attached Files:
- Analytic Summary.pptx (359.25 KB)
- Theory and Methods Basics.pptx (363.3 KB)
- example Case+Study+Analysis_+Analyzing+Player+Behavior+In+The+NFL+Runningback.pdf (251.757 KB)
- example Sports+analytics+case+study+project.pdf (386.891 KB)
YOU ARE NOT CREATING A MODEL – YOU ARE COMMUNICATING HOW YOU WOULD TACKLE A PROBLEM USING THE ANALYTICS STEPS LISTED IN THE "FINAL PROJECT" DOCUMENT. THIS IS THEORY-BASED.1. Watch the video: https://365datascience.com/career-advice/expert-interviews/data-use-cases-sports-analytics/2. Watch the video to write a case study/report: https://www.youtube.com/watch?v=beGwPSAwD4o&t=2s3. Choose a case study either from page 2 of the attached "CASE STUDY" document or another case study found on the web regarding Sports Analytics (i.e how Analytics is used in Physical Therapy, see Exercise Analytics). You should look for applications where you can apply analytics and machine learning knowledge to improve some aspects of the Sports data lifecycle (see the Introduction to Sports Analytics course). Feel free to speak with me to talk through your idea if you pull it from the Internet or another class. other ideas: https://www.samford.edu/sports-analytics/case-studies https://towardsdatascience.com/scope-of-analytics-in-sports-world-37ed09c39860 https://www.digitaldividedata.com/clients/case-studies/sports-analytics4. Create a solid project on how sports analytics is used in your specific use case. You should speak to "How data science is applied in the broad field of Sports."Data: a. You can create fake data in Excel (we create fake data in class at times to understand the topic we are discussing) and add it as a snapshot to the presentation. b. You can take a screenshot of data on the web that supports how data is being used in your area. Look for, or think of examples from your use case that ties in topics of what you have learned in the class.5. Presentation should be presented in a Word document, PDF, or PPTX.6. two examples of student submissions are attached.7. Your work should be original else you will be penalized. You can reference articles/quotes but ensure the reader knows you're referencing these quotes/articles.
The Endgame: Presentation
1
Use Analytic Plan to Guide Final Presentation
2
Source: EMC^2 Dell DSA Module
Componets of Analytic Plan
Example – Retail Banking
Discovery Business Problem Framed
How do we identify churn/no churn for a customer?
Initial Hypothesis
Transaction volume and type are key predictors of churn rates.
Data & Scope
7 months of customer account history
Model Planning – Analytic Technique
Logistic regression to identify most influential factors predicting churn.
Results & Key Findings
Once customers stop using their accounts for gas and groceries, they will soon
erode their accounts and churn.
If customers use their debit card fewer than 6 times per month, they will leave
the bank within 55 days.
Business Impact
If we can target customers who are high-risk for churn, we can reduce customer
attrition by 22%. This would save $2.3 million in lost customer revenue and avoid
$1.1 million in new customer acquisition costs each year.
Key Aspects of Final Presentation Material
3
Source: EMC^2 Dell DSA Module
Develop Core Material you can use to Deliver Presentations to 2 Main Audiences
4
Source: EMC^2 Dell DSA Module
image4.png
image3.png
image5.emf
image6.emf
image2.png
,
Understanding the Strength of a Relationship
One primary goal of analytic methods… Understanding the relationships between variables.
Player performance and team performance
Performance in a given year related to the performance in the following year
Linear Relationships: Correlation Coefficient
For each subject, we measure two variables
Baseball player -> Number of hits and Number of runs scored (in a season)
(X) (Y)
A simple way to gain insight is a scatterplot
(X,Y) is plotted for each subject
Some cases the pattern is vague
Some cases the relationship is strong, the value of X almost completely determines the value of Y
The relationship is weak value of X giving a general indication of the value of Y
3
Measures to reduce the properties of such a relationship to one number that is useful as a simple summary of the relationship between variables.
Linear Relationships: Correlation Coefficients
Scatterplots
Strong relationship:
the value of X almost completely determines the value of Y
Weak relationship:
the value of X giving, at best, a general indication regarding the value of Y
Quantifying Association
To describe the relationship between two continuous variables, use:
Correlation analysis
Measures strength and direction of the linear relationship between two variables
Regression analysis
Concerns prediction or estimation of outcome variable, based on value of another variable (or variables)
Correlation Analysis
Plot the data (or have a computer to do so)
Visually inspect the relationship between two continuous variables
Is there a linear relationship (correlation)?
Are there outliers?
Are the distributions skewed?
Correlation Coefficient
Measures the strength and direction of the linear relationship between two variables X and Y
Population correlation coefficient:
Sample correlation coefficient :
(obtained by plugging in sample estimates)
Correlation Coefficient
The correlation coefficient, , takes values between -1 and +1
-1: Perfect negative linear relationship
0: No linear relationship
+1: Perfect positive relationship
Correlation Coefficient
Plot standardized Y versus standardized X
Observe an ellipse (elongated circle)
Correlation is the slope of the major axis
Correlation Notes
Other names for r
Pearson correlation coefficient
Product moment of correlation
Characteristics of r
Measures *linear* association
The value of r is independent of units used to measure the variables
The value of r is sensitive to outliers
r2 tells us what proportion of variation in Y is explained by linear relationship with X
Examples of Correlation Coefficient
Perfect positive correlation, r approx. 1
Perfect negative correlation, r approx. -1
Imperfect positive correlation, 0 < r < 1
Imperfect negative correlation, -1 < r <0
No correlation, r approx. 0
Theory and Methods
14
15
You’ve prepared your data: what’s next?
What kind of analysis do you need? Which model is more appropriate for it? …
Datasets
Training set: a set of examples used for learning, where the target value is known.
Validation set: a set of examples used to tune the architecture of a classifier and estimate the error.
Test set: used only to assess the performances of a classifier. It is never used during the training process so that the error on the test set provides an unbiased estimate of the generalization error.
16
Machine Learning
To learn: to get knowledge of by study, experience,
or being taught.
Types of Learning
Supervised
Unsupervised
17
Supervised Learning
Training data includes both the input and the desired results.
For some examples the correct results (targets) are known and are given in input to the model during the learning process.
The construction of a proper training, validation and test set is crucial.
These methods are usually fast and accurate.
Have to be able to generalize: give the correct results when new data are given in input without knowing a priori the target.
18
Unsupervised Learning
The model is not provided with the correct results during the training.
Can be used to cluster the input data in classes on the basis of their statistical properties only.
Cluster significance and labeling.
The labeling can be carried out even if the labels are only available for a small number of objects representative of the desired classes
19
Theory and Methods
Examine analytic needs and select an appropriate technique based on business objective initial hypothesis; and the data’s structure and volume
Apply some of the more commonly used methods in Analytics solutions
Explain the algorithms and the technical foundations for the commonly used methods
Explain the environment (use case) in which each technique can provide the most value
Use appropriate diagnostic methods to validate the models created
Use R to fit, score and evaluate models
20
20
Analytical Methods
Categorization (un-supervised):
K-Means clustering
Association Rules
Regression:
Linear
Logistic
Classification (supervised):
Naïve Bayesian classifier
Decision Trees
Times Series Analysis
Text Analysis
21
Where “R” we?
We previously reviewed R skills and basic statistics. You have the below guides to assist you.
“R Cookbook” textbook
“SimpleR – Using R for Introductory Statistics”
R cheat sheet
You can use R to:
Generate summary statistics to investigate a data set
Visualize Data
Perform statistical tests to analyze data and evaluate models
22
Prominent use cases for the method
Algorithms to implement the method
Diagnostics that are most commonly used to evaluate the effectiveness of the method
The reasons to choose (+) and cautions(-) (where the method is most and least effective)
23
Objectives to know for each method
What kind of Problem do I need to Solve? How do I Solve it?
24
Applying the Data Analytic Lifecycle
In a typical Data Analytical Problem – you would have gone through:
Phase 1 – Discovery – have the problem framed
Phase 2 – Data Preparation – have the data prepared
Now you need to plan the model and determine the method to be used.
Phase 3 – Model Planning
Have do people generally solve this problem with the kind of data and resources I have?
Does this work well enough? Or do I need to come up with something new?
What are related or analogous problems? How are they solved? Can I do that?
25
image1.png
image2.svg
.MsftOfcThm_Accent1_Fill { fill:#4472C4; } .MsftOfcThm_Accent1_Stroke { stroke:#4472C4; }
image3.emf
image4.emf
image5.emf
image6.emf
image7.emf
image8.emf
,
Case Study Analysis: Analyzing Player Behavior In
The NFL Runningback
Simmione Sauls 12-2-21 HKSP-464-01
Introduction
Since 1920, the National Football League (NFL) has been in existence. NFL management, teams, and players have sought a competitive advantage over their opponents since the league's start. With the help of different analytical tools and data sources, teams around the league have found ways to gain the upperhand on their competitors. As years go by, the behavior of players has been one of the harder areas to collect information. Due to the seemingly fast pace and rigorous contact involved with the game, it has never been easy to analyze the behaviors of a football player (such as the effect of fatigue and stress during the course of a game, speed/agility during plays, force taken/given during contact, etc.) and there has always been room for improvement in terms of analysis tools. In modern days AI technology can help make these analysis and calculations that are too tough for the human eye to see.
Phase of Analytics
The analysis of player behavior would be classified as predictive analytics. When investigating
player behavior, teams try to figure out what factors make a difference in a players
performance. Learning and understanding these factors that affect the athlete can help
predict when the player will potentially be at his best and worst, as well as help identify
factors/behaviors that can either help or hurt the performance of the player. The main
question that I am trying to answer by studying these types of analytics uis “what factor is
most associated with how many rushing 100 yards in a game?”
Data Acquisition
For this analysis, data will be taken from three sources. These sources consists of the three
pieces of equipment that every player must wear; helmet, shoulder pads, and cleats. Data will
be collected by speed and force detection devices that are placed in each source. These
devices will be placed in the equipment of running backs across the league, in order to see
what plays the biggest part in their ability to rush for 100 yards or more in a game.
Variable Selection
Aside from physical attributes, there are several variables that factor in when measuring
rushing success as a football player. Of these three factors, the most important are speed,
strength, and control. Speed is self explanatory, it is simply how fast the ball carrier is moving
once they have the ball and is measured by there 40 yard dash time. Strength is more
complex, instead of measuring weight room strength, it measures how much force a player
can deliver/receive while running the ball, and could be measured by a player’s weight.. Lastly,
control measures how many yards a player averages a carry, as well as a players ability to stay
moving after contact as well as their ability to control the ball during contact.
Exploratory Data Analysis
For this analysis, i figured the best way to present and analyze the dataset was with a scatter
plot.Scatter Plots not only give you a vast amount of data in one place, but they also can show
relationships between variables. In my opinion, scatterplots are best used when you have a lot
of numerical data to analyze, because they can show you the correlation between two
variables and whether or not that correlation is a positive or a negative one. See Dataset for
examples.
Model Selection
The ultimate goal of the study is to not only see what factors play in to running backs having successful rushing games, but also to see what attributes have a stronger correlation with rushing success. I choose to use a regression model for this problem because it is the most efficient and simplistic way of seeing the correlations between the y and x variables, and it gives you the values that you need with just a few clicks. My Y variable will be the average number of yards produced in a game, and my x variables will the speed, strength, and control numbers that I collected.
Evaluation Metrics
I would evaluate the predictions of my regression model by paying attention to my p-values. By examining these, I can get a feel for which one of my variables has the greatest correlation, as well as see which variables have little to no correlation at all. Fortunately for my study, all of my variables had some kind of correlation, which means that each variable plays a part in the amount of total yards a game.
Testing
In order to fully test the model, I would need to get a sample group of NFL running backs that are known to produce a lot of yards each game, as well as access to their equipment. I will also need the sensor devices to track the speed and force in the equipment. The players would have to do nothing out of their ordinary routine except make sure the sensors are in the pads and cleats at all times when there is practice or a game. This will help collect unbiased data, because some of the data recorded could be from only games where the player played well if the test is not well monitored.
Conclusion
In conclusion, I believe that the observation and analysis of player behaviors is very necessary in this new era of sports. The speed, strength, and control that a running back has during the course of a game can be forever changing due to reasons that we can’t physically see and therefore can’t measure accurately. However, by analyzing the tangible factors we are able to see what physical traits play the biggest role. Of the three variables that I measured, control while carrying the football had the most significance when it came to determining which variable correlated more with averaging 100 or more yards a game.
- Slide 1
- Introduction
- Phase of Analytics
- Data Acquisition
- Variable Selection
- Exploratory Data Analysis
- Slide 7
- Model Selection
- Evaluation Metrics
- Testing
- Conclusion
,
Final Project-Case Study
Zakai Meghoo-Peddie
Professor Phillips
HKSP 464: Sports Analytics
Morehouse College
12/05/2021
1
Final Project-Case Study
Abstract:
We see numerous representations of what it means to be the favorite team and club to win
it all in the sports world. One of the most common representations in sports is the home field
advantage which is the perceived built-in edge that the team playing at home will have for the
contest. This advantage can be described as non-specific and further broken down into various
sectors such as the outcome of a match, points or goals scored, defensive performance, and
individual player performance. The home-field advantage raises questions and assumptions about
whether the team in this position has some competitive advantage or if this theory is merely a
psychological notion that holds little to no value. In this study, there will be empirical evidence and
research to explain the theory of a home-field advantage in soccer, if any. Methods to be used in
this study will include qualitative and quantitative indicators and descriptors that provide an
analysis of home-field advantage statistics from various leagues at different competitive levels
across the globe.
Figure 1
2
Final Project-Case Study
The conversation of home-field advantage has been a topic of discussion nearly since the
inception of organized soccer in 1863 in England. Nearly every game since sports media began has
certain buzzwords or mentions of the term "home team" that reflects some sort of impact that being
a home team might have, whether negative or positive. In most cases, fan engagement is expected
to be more positive, impactful, and desirable in the case of a match where a home team is present
as opposed to a meeting at a neutral location for match day. Factors that play a role in fan
engagement include geographic and economic attributes. These factors indicate a somewhat of an
advantage as matches at home tend to have more local spectators, support in ticket sales from team
fans tend to be higher, and merchandise is often purchased at a higher rate when teams have home
matches. All these factors indicate a huge boost in support for the home team through fan
engagement, but they don't point out a clear advantage in home teams' overall match success. If
anything, they might seem more beneficial to a team from a sales and reputation perspective.
For a long time, it was a commonly held belief about the home-field advantage that through
their support, fans elevated athletic performance, thus contributing to the overall home advantage.
However, this theory is only partially correct. It is the referees' job to maintain order and act as a
disciplinarian on the field, and just like the players, she or he has the potential to really change the
outcome of the match. In soccer, the referee can impartially dictate the match by calling fouls and
penalties, adding more time to the match, and issuing red and yellow cards. It is rare that the
decisions of the referees are impartial in nature because human bias will always be present.
According to data compiled from 128 Italian Serie A soccer games, referees were found to
statistically favor home teams. In each game that was analyzed, referees awarded, on average, 2.69
yellow cards, .27 red cards, and .17 penalties for each team.
3
Final Project-Case Study
However, the home team's average was 2.65 yellow cards, .23 red cards, and .2 penalties,
while the away team averaged 2.73 yellow cards, .31 red cards, and .15 penalties (Scoppa and
Schwartz, 2014). These statistics are intriguing because any of those decisions by the referee has
the potential to alter the outcome of multiple matches. For example, when a player receives a red
card, their team will have to resume the remainder of the match with one less player. Also, when a
team is awarded a penalty kick, nearly 76.8% of the time, a goal is a result (Hawerchuk, 2010).
Furthermore, another study discovered that when adding stoppage or injury time, the
referee's decision tends to go in the way of the home team. In the Bundesliga (German Soccer
League), and the Major League Soccer (MLS), when the home team loses by a goal, on average,
extra time, is 12% versus when they're winning by a goal. More notably, an example of this home-
field advantage bias could be represented by the team, Stuttgart. The average stoppage time added
when winning is 2 minutes and 20 seconds, and when they are losing a match, it is 3 minutes and
41 seconds. The differential in stoppage time is 1 minute and 21 seconds, or 57% (Bialik, 2014).
While these numbers may appear insignificant, it is vital to remember that a goal can be scored in a
matter of seconds. So, if a team is winning, it is beneficial to have less stoppage time, but if they
are losing, the more time, the greater chance they have to score. Additionally, the implications of
these results are worrisome.
4
Final Project-Case Study
Figure 2
5
Final Project-Case Study
Although it may not directly affect players and their performances, crowd pressure can
very much so influence the referees. A study in 1993 by Professor Simo Salimen analyzed 56
matches where there was a home team present, but the crowd was neutral, so the fan support was
equal. In this research, Salimen discovered that the home team did statistically perform better than
the visiting team when their fans supported them; even when the opposing crowd supported the
other team, they still managed to score more goals (Salimen, 1993).
In addition, Alan Nevill conducted research in which he divided 40 referees into two
groups of 20. Then he had them watch a replay of a match between Leicester City and Liverpool
that took place in Anfield, otherwise known as Liverpool’s home stadium. The first group watched
the replay with audio while the second group watched the replay in silence. Although both groups
of referees watched the same game, Nevill had them decide whether 47 different incidents were
fouls or not. The first group of referees who watched with audio, where the crowd supported
Liverpool, called about 15.5% fewer fouls against Liverpool as opposed to the silent group of
referees (Ingle, 2013). The results of Salimen’s research contradict the idea that the home crowd
affects players and provides a direct home-field advantage. Instead, Nevill’s experiment is a clear
indicator that crowd behavior can directly influence the decisions of referees, which can shift the
outcome of a match.
6
Final Project-Case Study
The aforementioned findings demonstrate that home teams have an advantage, but they do
not clearly define how much impact those factors such as referee and fan engagement can have on
the outcome of a match. Essentially, how effective is home advantage? Findings from statistics
from the English Premier League seasons 2009 to 2013 found that a home team's win percentage
was 47% and their away win percentage was nearly 27%. Also, 15 out of 20 of these English
7
</
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.