Data Distributions
Data Distributions
The purpose of this assignment is to apply data distributions to discrete and continuous data and justify the selection of the distributions.
For this assignment, you will use the “Random Variables” data set. You will use SPSS to analyze the data set and address the questions presented. Findings should be presented in a Word document along with the SPSS outputs.
Part 1:
Identify if the following random variables are discrete or continuous.
1. Number of defected items in a shipment.
2. Height of males (in mm) who attend Grand Canyon University.
3. Yearly income among all people in the United States.
4. Whether or not a high school graduate is accepted into a college.
5. Time that it takes for a person to run a mile.
6. The number of emergency hospital visits that each person had in the last 12 months.
Part 2:
Let X be a random variable of the outcome after rolling a six-sided die that is not fair. In fact, the die is designed to never result in a 1 or 6, while the other outcomes (i.e., 2, 3, 4, and 5) are equally probable.
1. What is the PMF of X?
2. What is the CDF of X?
3. What is = ?
4. What is = ?
5. What is = ?
Part 3:
The data set provided consists of the following random variables:
1. BMI: The body mass index of a random set of people.
2. Distance: The distance (in feet) that a baseball player hit the ball.
3. Height: The height of males (in mm).
4. Income: The income (in dollars) of people in a large company.
5. Pass: The outcome when taking an exam (1=Pass; 0=Fail).
6. Wait Time: The time (in minutes) that it takes when waiting for the train.
Answer each question below. Use SPSS as needed and include the software outputs as part of the Word document you submit.
1. What is a Q-Q plot?
2. Given a set of realized values of a random variable, how can a Q-Q plot be used to assess the distribution of the random variable?
3. Using histograms and Q-Q plots (except for binomial), match each random variable to one of the following distributions: Binomial (with N=1, P=0.7), Chi-square (with d.f.=20), Exponential, Lognormal, Normal, and Uniform.
APA format is not required, but solid academic writing is expected.
This assignment uses a rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.
You are not required to submit this assignment to LopesWrite.
MIS600 APPLIED ANALYTICS FOR BUSINESS
Week 2 Assignment
Simple Regression Analysis
The purpose of this assignment is to apply simple regression concepts, interpret simple regression analysis models, and justify business predictions based upon the analysis.
For this assignment, you will use the “Trucks” data set. You will use SPSS to analyze the data set and address the questions presented. Findings should be presented in a Word document along with the SPSS outputs.
The business characteristics of n = 250 U.S. trucking and delivery companies for calendar year 2011 were recorded. Among the characteristics studied were the number of drivers and the number of trucks (power units) each company employed.
Part 1:
Given that the data consists of counts and range of counts is large, a natural log transformation is usually performed to improve the linear model results. Apply a natural log transform to both variables and then plot the Y = ln(Trucks) vs. X = ln(Drivers).
Is there a linear relationship? Justify your answer by providing the SPSS output as an illustration.
Part 2:
Build a simple linear model by regressing Y on X and testing whether or not a relationship exists between the number of drivers and the number of trucks. Address the following questions in your written response:
1. After fitting the model, plot the standardized residuals (on vertical axis) vs. the standardize predictions (on horizontal axis). Is there a pattern? How would you interpret the pattern or lack of pattern?
2. After fitting the model, derive the normal probability plot and interpret what the plot means.
3. What is the coefficient of determination, R2, of the model? How would you interpret the R2?
4. What is the estimate of β1? How would you interpret the estimate of β1?
5. Is the estimate of β1 significantly different than 0? Assume an α = 0.01.
6. What is a 95% confidence interval for β1? How would you interpret the 95% confidence interval for β1?
7. If a new trucking and delivery company with 4,900 drivers were to be formed, how many trucks would you expect the company would employ based on the model?
APA format is not required, but solid academic writing is expected.
This assignment uses a rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.
MIS600 APPLIED ANALYTICS FOR BUSINESS
Week 3 Assignment
Visual Representation of Data With Excel and Tableau
The purpose of this assignment is to aggregate data using two techniques, one with Excel and one with Tableau. For this assignment, you will use the “Claims 2” dataset. Use Excel pivot tables and pivot charts for Part 1 and Tableau for Part 2. Part 3 is a compare/contrast summary of the experiences utilizing both tools.
Part 1:
Create a dashboard describing the data by age group (e.g., 21-30 yrs, 31-40 yrs, 41-50 yrs, 51-60 yrs, and 61-70 yrs). The dashboard should include the graphs and charts listed in the locations described. The dashboard should be a separate tab in Excel that only includes the five items below. A sample layout is provided below the dashboard description.
1. Top Left: Bar graph showing the average number of ER visits for each of the five age groups. Show the actual average values above each bar.
2. Middle Left: Bar graph showing the average number of procedures for each of the five age groups. Show the actual average values above each bar.
3. Bottom Left: Bar graph showing the average claim cost for each of the five age groups. Show the actual average values above each bar.
4. Top Right: Pie chart showing the percent of the total sum of all claim costs for each of the five age groups. Show the actual percent values of each slice.
5. Bottom Right: Line graph showing the percent of each age group that has one or more ER visits. Show the actual percent values of each group. To create this chart, first create a new calculated column, named “Has ER Visit,” that is equal to 1 when the patient has had one or more ER visits; otherwise 0. HINT: The average of a 0-1 column is a percent. Refer to the example in the resource “Visual Representation of Data Screenshot: Preview of the Excel Dashboard”.
Part 2:
Create a dashboard describing the data by age group (e.g., 21-30 yrs, 31-40 yrs, 41-50 yrs, 51-60 yrs, and 61-70 yrs). The dashboard should include the graphs and charts listed in the locations described. The dashboard should be submitted as a Tableau file. A sample layout is provided in the resource, “Data Visualization With Tableau Screenshot: Preview of the Tableau Dashboard.”
1. Top Left: Bar graph showing the average number of ER visits for each of the five age groups. Show the actual average values above each bar.
2. Middle Left: Bar graph showing the average number of procedures for each of the five age groups. Show the actual average values above each bar.
3. Bottom Left: Bar graph showing the average claim cost for each of the five age groups. Show the actual average values above each bar.
4. Top Right: Pie chart showing the percent of the total sum of all claim costs for each of the five age groups. Show the actual percent values of each slice.
5. Bottom Right: Line graph showing the percent of each age group that has one or more ER visits. Show the actual percent values of each group. To create this chart, first create a new calculated column, named “Has ER Visit,” which is equal to 1 when the patient has had one or more ER visits; otherwise 0. HINT: The average of a 0-1 column is a percent.
Part 3:
Interpret the dashboard and the story it is attempting to tell users by writing a 250-word summary. In the summary, describe the merits of each of the charts used on the dashboard. For example, discuss why a pie chart might be more appropriate than a bar graph for highlighting the information you want key stakeholders to obtain by studying that content on the dashboard. Include specific discussion about why each specific chart is used to illustrate the data it represents. Compare and contrast the use of Excel and Tableau in data visualization. Include a specific discussion about the following in your summary.
1. Software ease of use.
2. Software visualization capabilities.
3. Software limitations.
4. Discussion of when each of these software programs is most appropriate for use. APA format is not required, but solid academic writing is expected.
This assignment uses a rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.
You are not required to submit this assignment to LopesWrite.
MIS600 APPLIED ANALYTICS FOR BUSINESS
Week 4 Assignment
k-Nearest Neighbor Classification
The purpose of this assignment is to perform k-nearest neighbor classification, interpret the results, and analyze whether or not the information generated can be used to address a specific business problem.
For this assignment, you will use the “Adult Incomes” data set, provided in the topic resources.
ABC Survey Company collects data via surveys that it then sells to marketing departments. Marketing departments typically do not like missing data. Since survey takers typically do not like to answer questions regarding their salary, the one question usually missing from the survey results is, “Is your annual salary $50,000 or more?”
You are the analyst who has been tasked with finding a way to impute (i.e., fill-in) the answer to the question, “Is your annual salary $50,000 or more?” This information can best be imputed based upon how individuals answer other survey questions related to their marital status, educational level, occupation, and familial relationship status. If this important question can be accurately imputed, then the worth of the survey data provided by ABC Survey Company increases dramatically.
Question 1: Using only “Marital_Status,” “Education,” “Occupation,” and “Relationship” variables, find the number of neighbors (k) that minimizes the error rate. Use a range of k between 3 and 10. Include the “k Selection Error Log” output when submitting the answer.
Question 2: Using the same variables and the k selected in Question 1, rerun the nearest neighbor model using the feature selection option in the IBM SPSS Modeler. What is the set of variables that minimize the error rate? Include the “Predictor Selection Error Log” output when submitting the answer.
Question 3: Using the value of k and the set of variables that minimizes the error rate, rerun the k-nearest neighbor model. What is the classification table? Include the pivot table output when submitting the answer. Question 4: Consider the following individual: Marital_Status=Never-married, Education=Masters, Occupation=Sales, and Relationship=Not-in-family. Based on the k-nearest neighbor model from Question 3, how would this individual be classified? Provide the predicted income level (“>50K” or “<=50K”) and explain the process that you used to determine the income level. Include the table illustrating the data when submitting the answer.
Question 5: Describe the model building process you used to determine whether or not a particular survey taker earned an annual salary of $50,000 or more. Include discussion of the accuracy of the k-nearest neighbor model and how it can be used in practice to impute the answer to the question, “Is your annual salary $50,000 or more?”
Submit the answers to Questions 1-5, including the specified screenshots and software outputs, in a Word document.
APA format is not required, but solid academic writing is expected.
This assignment uses a rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.
You are not required to submit this assignment to LopesWrite.
MIS600 APPLIED ANALYTICS FOR BUSINESS
Week 4 Assignment
Logistic Regression
The purpose of this assignment is to perform logistic regression, interpret the results, and analyze whether or not the information generated can be used to address a specific business problem.
For this assignment, you will use the “Adult Incomes” data set, provided in the topic resources.
The marketing department is interested in creating advertising directed primarily at high-income individuals, and it has come to you seeking very specific customer data. The director of marketing explains that individuals with large amounts of disposable income tend to purchase luxury items. Therefore, understanding what predictors are correlated with high income can be very useful for a marketing department because it can help it tailor messages to the high-earning cohort. For example, individuals that earn capital gains tend to be high-income earners, and advertisements for luxury items can be targeted toward them on realty or investment websites.
As a member of the analytics team, you have been asked to determine a list of predictors and their relative impact on the likelihood of an individual being a high-income earner. Individuals earning more than $50,000 annually are considered high-income earners. In your summary, include discussion of how the marketing department can use your results to devise a smart advertising strategy.
Question 1: Partition the data to create a training data set (70%) and test data set (30%). With a cut-off of 0.5, run logistic regression with “Income” as the target and the following predictors: “Capital_Gain,” “Hours_Per_Week,” “Sex,” “Age,” and “Race.” Show the model summary and variables in the equation. Which probability is being modeled? Include the “Model Summary” and “Variables in the Equation” outputs when submitting the answer.
Question 2: Is “Race” a statistically significant predictor when modeling whether incomes are greater than
$50,000 annually? Explain your answer. Use a 5% significance level.
Question 3: Rerun the model without “Race” while still using a cut-off of 0.5. Show the model summary and variables in the equation. Write the equation showing the probability as a function of the predictors. Interpret the meaning of the coefficients for “Age” and “Sex.” Include the “Model Summary” and “Variables in the Equation” outputs when submitting the answer.
Question 4: Given that approximately 26% of the individuals in the data have incomes greater than $50,000 annually, rerun the model in Question 3 with a cut-off of 0.26. Show the classification tables and percent correct for each predicted outcome (>50K and <=50K) for the training data and test data. Why is the percent that is correct usually lower when the test data are used? Include the “Training Classification Table” and “Test Classification Table” outputs when submitting the answer.
Question 5: Consider the following individual: Age=30, Sex=Female, Hours_Per_Week=40, Capital_Gain=$0. Based on the logistic model from Question 4, what is the probability of this individual earning more than $50,000 annually? What would be the predicted class for this individual? Explain your answer.
Question 6: Based upon your analysis, what are the predictors that can determine whether or not an individual would be considered a high-income earner? Discuss how the marketing department can use this information in formulating its advertising strategy? Present your findings in the form of a 250-word executive summary that includes relevant data, charts, and tables to validate the conclusions presented.
Submit the answers to Questions 1-5 and the executive summary as Word documents.
Prepare this assignment according to the guidelines found in the APA Style Guide, located in the Student Success Center.
This assignment uses a rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.
You are not required to submit this assignment to LopesWrite.
MIS600 APPLIED ANALYTICS FOR BUSINESS
Week 4 Assignment
Clustering (Apriori Association Rules)
The purpose of this assignment is to perform cluster analysis and analyze clusters in a data set to determine whether or not the information generated can be used to address a specific business problem.
For this assignment, you will use the “Wholesale Customers” data set, provided in the topic resources. Most data categories are self-explanatory. Clarifying notes are as follows:
1. Fresh: Annual spending (in $1000s) on fresh products
2. Milk: Annual spending (in $1000s) on milk products
3. Grocery: Annual spending (in $1000s) on grocery products
4. Frozen: Annual spending (in $1000s) on frozen products
5. Detergents_Paper: Annual spending (in $1000s) on detergents and paper products
6. Delicatessen: Annual spending (in $1000s) on delicatessen products
7. Client_Type: Type of client – either HoReCa (Hotel/Restaurant/Café) or Retail
8. Region: Region of client – either Lisbon, Oporto, or Other
A wholesale distributor wants to understand the purchasing profiles of its clients. If a finite set of distinct profiles were defined, then a marketing strategy could be designed specifically for each set of similar clients. The company has compiled a data set that includes annual client spending on diverse product categories. You have been tasked with analyzing the data to determine what patterns emerge and how these patterns can be used to create specific client marketing profiles.
Use k-means clustering to explore and analyze the data set by using only the quantitative variables to cluster the clients.
Question 1: Explain the process you used to define the clusters, such as the number of clusters formed, the specific variables used, etc. Include the “Cluster Sizes” and “Predictor Importance” outputs when submitting the answer.
Question 2: Interpret the clusters with respect to the quantitative variables that were used in forming the clusters. Include the “Clusters” output when submitting the answer.
Question 3: Discuss whether there is a pattern in the clusters with respect to the qualitative variables (i.e., Client_Type or Region). Include the charts illustrating these patterns when submitting the answer.
Question 4: Provide an appropriate name for each cluster using any or all of the variables in the data set. Question 5: Based upon your analysis, what patterns emerged and how can these patterns be used to create specific client marketing profiles? Include discussion of the characteristics for each profile. Present your findings in the form of a 250-word executive summary that includes relevant data, charts, and tables to validate the conclusions presented.
Submit the answers to Questions 1-5 and the executive summary as Word documents.
Prepare this assignment according to the guidelines found in the APA Style Guide, located in the Student Success Center.
This assignment uses a rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.
You are not required to submit this assignment to LopesWrite.
MIS600 APPLIED ANALYTICS FOR BUSINESS
Week 4 Assignment
Methods Summary
In a 100-250 word paper, compare and contrast the three methods: k-NN, logistics regression, and clustering.
While APA style is not required for the body of this assignment, solid academic writing is expected, and documentation of sources should be presented using APA formatting guidelines, which can be found in the APA Style Guide, located in the Student Success Center.
This assignment uses a rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.
You are required to submit this assignment to LopesWrite. A link to the LopesWrite technical support articles is located in Class Resources if you need assistance.
MIS600 APPLIED ANALYTICS FOR BUSINESS
Week 5 Assignment
IBM Approach to Text Analytics
Read “About IBM SPSS Modeler Text Analytics,” view “Text Analytics in IBM SPSS Modeler 18.2,” located in the topic resources, and compare to section 5.5 in Chapter 5 of the textbook.
In 100-150 words, discuss whether the IBM approach is consistent with what is in the textbook. Provide examples to support your rationale.
While APA style is not required for the body of this assignment, solid academic writing is expected, and documentation of sources should be presented using APA formatting guidelines, which can be found in the APA Style Guide, located in the Student Success Center.
This assignment uses a rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.
You are not required to submit this assignment to LopesWrite.
MIS600 APPLIED ANALYTICS FOR BUSINESS
Week 7 Assignment
Big Data Versus Enterprise Data
Utilize “Table 7.1: When to Use Which Platform-Hadoop Versus DW,” “Technology Insights 7.2: A Few Demystifying Facts About Hadoop,” and the “Coexistence of Hadoop and Data Warehouse” section in Chapter 7 of the textbook to complete this assignment.
In 150-200-words, address the following:
1. Discuss the challenges facing data warehousing and big data. Are we witnessing the end of the data warehousing era? Why or why not?
2. Describe the use cases for big data and Hadoop.
3. Describe the use cases for data warehousing and relational database management systems (DBMS).
4. Describe scenarios where Hadoop and relational DBMS can coexist.
While APA style is not required for the body of this assignment, solid academic writing is expected, and documentation of sources should be presented using APA formatting guidelines, which can be found in the APA Style Guide, located in the Student Success Center.
This assignment uses a rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.
You are required to submit this assignment to LopesWrite. A link to the LopesWrite technical support articles is located in Class Resources if you need assistance.
MIS600 APPLIED ANALYTICS FOR BUSINESS
Week 8 Assignment
Project Management Options
Review CRISP-DM in Chapter 4 of the textbook and the topic resources to complete this assignment.
Part 1
In 100-250 words, address the following:
1. Compare and contrast the CRISP-DM data mining process, PMI project management life cycle (PLC) models, and software development life cycle (SDLC).
2. Justify why CRSIP-DM is a superior choice for BI projects.
Part 2
Create a graphic image or visual model differentiating five key features of the three models. You can use PowerPoint, Smart Art in Word, or another software that creates a graphic.
General Requirements
Submit both parts to the assignment dropbox.
While APA style is not required for the body of this assignment, solid academic writing is expected, and documentation of sources should be presented using APA formatting guidelines, which can be found in the APA Style Guide, located in the Student Success Center.
This assignment uses a rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.
You are not required to submit this assignment to LopesWrite.
Public School Education Case Study
Review “Public School Education in Tacoma, Washington, Uses Microsoft Azure Machine Learning to Predict School Dropouts,” located in Chapter 8 of the textbook.
Conduct further research on the Microsoft platform containing Azure Machine Learning, Azure Data Factory, Azure SQL Database, and Power BI.
In a 100-250 word paper, discuss how this Microsoft platform can support the business analytics CRISP-DM model. Provide examples to support your position.
While APA style is not required for the body of this assignment, solid academic writing is expected, and documentation of sources should be presented using APA formatting guidelines, which can be found in the APA Style Guide, located in the Student Success Center.
This assignment uses a rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.
You are required to submit this assignment to LopesWrite. A link to the LopesWrite technical support articles is located in Class Resources if you need assistance.
MIS600 APPLIED ANALYTICS FOR BUSINESS
Week 8 Assignment
Public School Education Case Study
Review “Public School Education in Tacoma, Washington, Uses Microsoft Azure Machine Learning to Predict School Dropouts,” located in Chapter 8 of the textbook.
Conduct further research on the Microsoft platform containing Azure Machine Learning, Azure Data Factory, Azure SQL Database, and Power BI.
In a 100-250 word paper, discuss how this Microsoft platform can support the business analytics CRISP-DM model. Provide examples to support your position.
While APA style is not required for the body of this assignment, solid academic writing is expected, and documentation of sources should be presented using APA formatting guidelines, which can be found in the APA Style Guide, located in the Student Success Center.
This assignment uses a rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.
You are required to submit this assignment to LopesWrite. A link to the LopesWrite technical support articles is located in Class Resources if you need assistance.
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.