In this project, you will demonstrate your mas
In this project, you will demonstrate your mastery of the following competencies:
- Implement statistical analysis using quantitative and qualitative variables
- Apply statistical techniques to address research problems
Scenario
You are a data analyst working for a real estate company based in Seattle. You have access to a large set of historical data that you can use to analyze patterns between different attributes of a house (such as square footage and number of bathrooms) and the house’s selling price. You have been asked to create different regression models that can be used to predict a house’s selling price based on different factors. These regression models will help your company set better prices when listing a home for a client. You will use the R programming language to perform the statistical analyses and then prepare a report of your findings. Since your report will be read by different stakeholders within your real estate company, you will need to interpret your findings and describe their practical implications.
Note: This data set has been “cleaned” for the purposes of this assignment.
Reference
Harlfoxem. (2016). House Sales in King County, USA [Data file]. Retrieved from https://www.kaggle.com/harlfoxem/housesalesprediction
Directions
- R Script: To complete the tasks listed below, open the Project One Jupyter Notebook link in the Assignment Information module. Your project contains the data set and a Jupyter Notebook. The Jupyter Notebook contains instructions and blank code blocks where you will write your R scripts. You will be asked to complete the following regression analyses:
- First Order Regression Model with Quantitative and Qualitative Variables
- Complete Second Order Multiple Regression Model with Quantitative Variables
- Nested Models F-Test
- Summary Report: Once you have completed all the steps in your R script, you will create a summary report to present your findings. Use the provided template to create your report. You must complete each of the following sections by answering all of the questions in each section.
- Introduction: Set the context for your scenario and the analyses you will be performing.
- First Order Regression Model with Quantitative and Qualitative Variables:
- Correlation analysis between the variables using data visualizations, correlation coefficients, and the correlation matrix
- Reporting results of the model by listing and interpreting various model statistics, including R2 and Ra2
- Evaluate the significance of the model by reporting parameter estimates and performing hypothesis testing for each estimate and the overall model.
- Use model equations to make predictions.
- Complete Second Order Model with Quantitative Variables:
- Correlation analysis between the variables using data visualizations, correlation coefficients, and the correlation matrix
- Reporting results of the model by listing and interpreting estimates of various model statistics, including R2 and Ra2
- Evaluate the significance of the model by reporting parameter estimates and performing hypothesis testing for each estimate and the overall model.
- Use model equations to make predictions.
- Nested Models F-Test:
- Reporting results of the model by listing and interpreting estimates of various model statistics
- Evaluate the significance of the model by reporting parameter estimates and performing hypothesis testing for each estimate and the overall model.
- Model Comparison: Evaluate whether the complete model is necessary by performing the nested models F-test.
- Conclusion: Summarize your findings and explain their practical implications.
What to Submit
To complete this project, you must submit the following:
R Script
Your Jupyter Notebook R script contains all the statistical analyses you completed for this project. Download your work as an HTML file. Review the file to make sure that every step and all your outputs are included. Submit the HTML file as part of your submission. Review the Jupyter Notebook in Codio Tutorial in the Supporting Materials section if you need help.
Summary Report
Use the provided template to create your summary report. The template contains guiding questions to help you complete each section. Be sure to remove these questions before submitting your report. Your summary report should be submitted as a 3– to 5–page Microsoft Word document. It should include an APA-style cover page and APA citations for any sources used. Use single spacing, 11-point Calibri font, and one-inch margins.
Supporting Materials
The following resource(s) may help support your work on the project:
Document: Jupyter Notebook in Codio Tutorial
This tutorial will help you become familiar with the Jupyter Notebooks interface. You will learn how to open, complete, save, and download your Jupyter Notebook for this project.
Shapiro Library: APA Style Guide
This guide will help you format your cover page and references according to APA style. You are not required to use external resources for this project. However, if you do use any resources, you must cite them in APA format.
MAT 303 Project One Summary Report
[Full Name]
[SNHU Email]
Southern New Hampshire University
Note: Replace the bracketed text on page one (the cover page) with your personal information.
1. Introduction
Discuss the statement of the problem in terms of the statistical analyses that are being performed. Be sure to address the following:
· What is the data set that you are exploring?
· How will your results be used?
· What type of analyses will you be running in this project?
Answer the questions in a paragraph response. Remove all questions and this note before submitting! Do not include R code in your report.
2. Data Preparation
There are some important variables that are used in this project. Identify and explain these variables.
· What are the important variables in this data set?
· How many rows and columns are present in this data set?
Answer the questions in a paragraph response. Remove all questions and this note before submitting! Do not include R code in your report.
3. Model #1 – First Order Regression Model with Quantitative and Qualitative Variables
Correlation Analysis
· Create the following scatterplots and include a copy of each in this section:
· Price (price) vs. the living area (sqft_living)
· Price (price) vs. the age of the home (age)
· Describe what trends, if any, exist for each scatterplot.
· Report the correlation coefficients between the following variables:
· Price (price) vs. the living area (sqft_living)
· Price (price) vs. the age of the home (age)
· Describe the strength and direction of each correlation coefficient.
Answer the questions in a paragraph response. Remove all questions and this note before submitting! Do not include R code in your report.
Reporting Results
· Write the general form of the multiple regression model using price as the response variable and living area, grade of the home, number of bathrooms, and view as predictor variables. Use (where i 1, 2, … ) to represent the slope parameters for all predictor variables. Note: You will use the variables living area, grade of the home, and number of bathrooms as quantitative variables and view as a qualitative variable in this model. Use the equation editor to write the general form of the regression equation.
· Create the multiple regression model for price as a response variable with living area, grade of the home, number of bathrooms and views as predictor variables. Write the model equation. Note: Use the equation editor to write the regression equation.
· What are the values of R2 and for the model? Provide your interpretation of these statistics.
· Interpret the beta estimates for the living area and lake view.
· Obtain the residuals and fitted values to create the following plots. Include these plots and comment on the validity of assumptions. Include any tables for the values for residuals or the fitted values.
· Residuals against Fitted Values
· Normal Q-Q plot
Answer the questions in a paragraph response. Remove all questions and this note before submitting! Do not include R code in your report.
Evaluating Significance of Model
· Is the model significant at a 5% level of significance? Carry out the overall F-test by identifying the null hypothesis, the alternative hypothesis, the P-value, and the conclusion of the test.
· Which terms are significant at a 5% level of significance? Carry out individual beta tests by identifying the null hypothesis, the alternative hypothesis, the P-value, and the conclusion of each test.
Answer the questions in a paragraph response. Remove all questions and this note before submitting! Do not include R code in your report.
Making Predictions Using Model
· What is the predicted price for a home that backs out to a lake and has a 2,150 sq ft living area, 7 grade, and three bathrooms? Obtain 90% prediction and confidence intervals for the price of this home. Interpret each interval.
· What is the predicted price for a home that backs out to a road and has a 2,150 sq ft living area, 7 grade, and three bathrooms? Obtain 90% prediction and confidence intervals for the price of this home. Interpret each interval.
· Why is the prediction interval wider than the confidence interval?
Answer the questions in a paragraph response. Remove all questions and this note before submitting! Do not include R code in your report.
4. Model #2 – Complete Second Order Regression Model with Quantitative Variables
Correlation Analysis
· Create scatterplots of:
· Price (price) vs. the age of appliances (appliance_age)
· Price (price) vs. the crime rate per 100,000 people (crime)
· Comment on each scatterplot. Is a second order model appropriate using these variables?
Answer the questions in a paragraph response. Remove all questions and this note before submitting! Do not include R code in your report.
Reporting Results
· Write the general form of a complete second order model for price using age of appliances and crime rate per 100,000 people as predictors. Use (where i 1, 2, … ) to represent the slope parameters for all predictor variables. Note: Use age of appliances and crime rate as quantitative variables in this model. Use the equation editor to write the general form of the regression equation.
· Create and write the equation for the complete second order regression model for price using age of appliances and crime rate per 100,000 people as predictors. Note: Use age of appliances and crime rate as quantitative variables in this model. Use the equation editor to write the general form of the regression equation.
· What are the values of and for the model? Provide your interpretation of these statistics.
· Obtain the residuals and fitted values to create the following plots. Include these plots and comment on the validity of assumptions. Include any tables for the values for residuals or the fitted values.
· Residuals against Fitted Values
· Normal Q-Q plot
Answer the questions in a paragraph response. Remove all questions and this note before submitting! Do not include R code in your report.
Evaluating Significance of Model
· Is the model significant at a 5% level of significance? Carry out the overall F-test by identifying the null hypothesis, the alternative hypothesis, the P-value, and the conclusion of the test.
· Which terms are significant at a 5% level of significance? Carry out individual beta tests by identifying the null hypothesis, the alternative hypothesis, the P-value, and the conclusion of each test.
Answer the questions in a paragraph response. Remove all questions and this note before submitting! Do not include R code in your report.
Making Predictions Using Model
· What is the predicted price for a home that has one-year-old appliances and is in an area that has a crime rate of 81.02 per 100,000 individuals? Obtain 90% prediction and confidence intervals for the price of this home. Interpret each interval.
· What is the predicted price for a home that has 15-year-old appliances and is in an area that has a crime rate of 200.50 per 100,000 individuals? Obtain 90% prediction and confidence intervals for the price of this home. Interpret each interval.
Answer the questions in a paragraph response. Remove all questions and this note before submitting! Do not include R code in your report.
5. Nested Models F-Test
Reporting Results
· Write the general form of a first order model for price using age of appliances and crime rate per 100,000 people as predictors. Include the interaction term between age of appliances and crime rate. Use (where i 1, 2, … ) to represent the slope parameters for all predictor variables. Note: Use age of appliances and crime rate as quantitative variables in this model. Use the equation editor to write the general form of the regression equation.
· Create and write the equation for a first order regression model for price using age of appliances and crime rate per 100,000 people as predictors. Include the interaction term between age of appliances and crime rate. Note: Use age of appliances and crime rate as quantitative variables in this model. Use the equation editor to write the general form of the regression equation.
Answer the questions in a paragraph response. Remove all questions and this note before submitting! Do not include R code in your report.
Evaluating Significance of Model
· Is the model significant at a 5% level of significance? Carry out the overall F-test by identifying the null hypothesis, the alternative hypothesis, the P-value, and the conclusion of the test.
· Which terms are significant at a 5% level of significance? Carry out individual beta tests by identifying the null hypothesis, the alternative hypothesis, the P-value, and the conclusion of each test.
Answer the questions in a paragraph response. Remove all questions and this note before submitting! Do not include R code in your report.
Model Comparison
You will now compare this model with the second order model for price using age of appliances and crime rate per 100,000 people as predictors to test whether the quadratic (squared) terms contribute in predicting the prices of homes. The complete second order model is Model #2, which you created in this project.
· In general, what is a reduced and a complete model when comparing two models?
· Write the general form of the model that is the reduced model in this comparison.
· Write the general form of the model that is the complete model in this comparison.
· Run the nested model F-test at a 5% level of significance to evaluate if the quadratic (squared) terms are needed. Identify the null hypothesis, the alternative hypothesis, the P-value, and the conclusion of the test.
Answer the questions in a paragraph response. Remove all questions and this note before submitting! Do not include R code in your report.
6. Conclusion
Describe the results of the statistical analyses clearly, using proper descriptions of statistical terms and concepts. Fully describe what these results mean for your scenario.
· Which model would you choose to predict house prices? Briefly summarize your findings in plain language.
· What is the practical importance of the analyses that were performed?
Answer the questions in a paragraph response. Remove all questions and this note before submitting! Do not include R code in your report.
7. Citations
You were not required to use external resources for this report. If you did not use any resources, you should remove this entire section. However, if you did use any resources to help you with your interpretation, you must cite them. Use proper APA format for citations.
Insert references here in the following format:
Author's Last Name, First Initial. Middle Initial. (Year of Publication). Title of book: Subtitle of book, edition. Place of Publication: Publisher.
1
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.