MAT240 Scatterplots in Excel Tutorial
I am working on this project for my MAT-240 class and in need of some assistant's. I have added some information to help better understand the assignment. Ref module 2 template.
Scenario
Smart businesses in all industries use data to provide an intuitive analysis of how they can get a competitive advantage. The real estate industry heavily uses linear regression to estimate home prices, as cost of housing is currently the largest expense for most families. Additionally, in order to help new homeowners and home sellers with important decisions, real estate professionals need to go beyond showing property inventory. They need to be well versed in the relationship between price, square footage, build year, location, and so many other factors that can help predict the business environment and provide the best advice to their clients.
Prompt
You have been recently hired as a junior analyst by D.M. Pan Real Estate Company. The sales team has tasked you with preparing a report that examines the relationship between the selling price of properties and their size in square feet. You have been provided with a Real Estate Data Spreadsheet spreadsheet that includes properties sold nationwide in recent years. The team has asked you to select a region, complete an initial analysis, and provide the report to the team.
Note: In the report you prepare for the sales team, the response variable (y) should be the listing price and the predictor variable (x) should be the square feet.
Specifically you must address the following rubric criteria, using the Module Two Assignment Template Word Document:
· Generate a Representative Sample of the Data
· Select a region and generate a simple random sample of 30 from the data.
· Report the mean, median, and standard deviation of the listing price and the square foot variables.
· Analyze Your Sample
· Discuss how the regional sample created is or is not reflective of the national market.
· Compare and contrast your sample with the population using the National Summary Statistics and Graphs Real Estate Data PDF document.
· Explain how you have made sure that the sample is random.
· Explain your methods to get a truly random sample.
· Generate Scatterplot
· Create a scatterplot of the x and y variables noted above and include a trend line and the regression equation
· Observe patterns
· Answer the following questions based on the scatterplot:
· Define x and y. Which variable is useful for making predictions?
· Is there an association between x and y? Describe the association you see in the scatter plot.
· What do you see as the shape (linear or nonlinear)?
· If you had a 1,800 square foot house, based on the regression equation in the graph, what price would you choose to list at?
· Do you see any potential outliers in the scatterplot?
· Why do you think the outliers appeared in the scatterplot you generated?
· What do they represent?
,
Simple Regression
In the regression equation, we are trying to ascertain the influence of the independent variable(s) on the dependent variable. In simple regression, we have a single dependent variable and a single independent variable while in multiple regression, we have one dependent variable and multiple independent variables. Our goal is to understand the influence of the independent variable(s) on the dependent variable.
Let’s think about rain (in inches) and corn production (in tons). In this case of simple regression, the rain is the independent variable (denoted by x) and the corn production is the dependent variable (denoted by y). Our objective is to determine the influence of rain on the corn production.
Let's imagine after computation, the equation of the regression line is:
y = 2 + 0.5 x
This means that if there is no rain, the corn production will be 2 tons. For every one inch of rain, the corn production increases by 0.5 tons.
In real life we may have variables such as rain, machinery, soil condition and quality of fertilizer as the independent variables. So, we are trying to ascertain the influence of these independent variables on the corn production.
Correlation
Correlation determines the degree of association between two variables. In statistical sense, the correlation analysis measures the degree of linear relationship between two variables. We compute the correlation coefficient (r) to measure the degree of linear relationship between two variables. Numerically, the correlation coefficient ranges from 1 to 1.
Correlation is not causation. High correlation between two variables does not imply that one of these variables “causes” the behavior of the other variable. Correlation is a cause and effect relationship.
Simple Regression vs. Correlation
Correlation analysis attempts to determine whether there is a linear relationship between two variables and how strong is that relationship while simple regression determines the influence of independent variable over the dependent variable.
Coefficient of determination R2
The Coefficient of determination is the proportion of the variation in the dependent variable explained by the regression model, and is a measure of the goodness of fit of the model. It can range from 0 to 1. This signifies how much known deviation has been removed from the regression line. We need to remove a high degree of variations from the equation of the regression line. Higher the value of R2, — better is the reliability of the regression line (for prediction and business decision making purposes).
Describing data with a simple regression equation
Graphically, we can draw a straight line on the graph so it passes through the cluster of points, as in Figure 1. Simple regression is a way of choosing the best straight line for this job.
Figure 1
This raises two problems: what is the best straight line, and how can we describe it when we have found it?
Let's deal first with describing a straight line. Any straight line can be described by an equation relating the y values to the x values. In general, we usually write,
y = mx + c
Here m and c are constants whose values tell us which of the infinite number of possible straight lines we are looking at. m (from French monter) tells us about the slope or gradient of the line. Positive m means the line slopes upwards to the right; negative m that it slopes downwards. High m values mean a steep slope, low values a shallow one. The value of c (from French couper) tells us about the intercept, i.e. where the line cuts the y axis: positive c means that when x is zero, y has a positive value, negative c means that when x is zero, y has a negative value. But for regression purposes, it's more convenient to use different symbols. We usually write:
y = a + bx
This is just the same equation with different names for the constants: a is the intercept, b is the gradient.
The problem of choosing the best straight line then comes down to finding the best values of a and b. We define "best" in the same way as we did when we explained why the mean is the best summary of a set of data: we choose the a and b values that give us the line such that the sum of squared deviations from the line, instead of from the average, is minimized. This is illustrated in Figure 2. The best line is called the regression line, and the equation describing it is called the regression equation.
Figure 2
Goodness of fit in regression
Having found the best straight line, the next question is how well it describes the data.
This is called the variance accounted for, symbolized by VAC or R2. Its square root is the Pearson product-moment correlation coefficient. R2 can vary from 0 (the points are completely random) to 1 (all the points lie exactly on the regression line); quite often it is reported as a percentage (e.g. 73% instead of 0.73). Two sets of data can have identical a and b values and very different R2 values, or vice versa.
Note carefully that a, b and R2 are all descriptive statistics. We have not said anything yet about significance tests. Given a set of paired x and y values, we can use Minitab to find the corresponding values of a, b and R2. It will also do some significance tests for us.
Problem: Suppose a vitamin and supplement supplier would like to investigate the relationship between the size of the order and the age of the customer who ordered it. The information will be used to target promotions to specific age groups. The following table shows the ages of seven random customers along with their more recent order sizes in dollars.
Age (in years) |
Order Size ($) |
41 |
54 |
26 |
30 |
34 |
22 |
54 |
63 |
29 |
15 |
49 |
25 |
38 |
85 |
Based on this information, (i) Find the equation of the regression line using the least square technique and interpret the result; (ii) Determine whether the slope of the regression line is significant at a 95% confidence; (iii) Compute the correlation coefficient and the coefficient of Determination.
You could try to solve this problem and if you have any questions, please send me an e-mail.
image1.jpeg
image2.jpeg
,
MAT 240 Scatterplots in Excel Tutorial
This tutorial will guide you through the steps necessary to create scatterplots using your data. It will also walk you through inserting a linear trend line and inserting the regression equation and the R-squared value on the chart.
1. Open your data set in Excel.
2. Select all the data for the two variables you are targeting (e.g., median listing price and median square feet).
Tip: Holding down the Ctrl key while selecting your data will allow you to select two columns of data that are not next to each other.
3. On the Insert tab, click the Recommended Charts button.
This will bring up the Insert Chart dialog box, which prompts you to select from the list of auto-generated charts. If Scatter is not one of the options, click the All Charts tab. Then select X Y (Scatter) and select the chart on the right side. Click OK.
4. With the new chart selected, go to the Chart Design tab and click the Add Chart Element button. In the drop-down menu, hover over Trendline and select More Trendline Options.
5. In the Format Trendline sidebar, select Linear. At the bottom, check the boxes for these options:
Display Equation on Chart Display R-Squared Value on Chart
6. Close the Format Trendline sidebar and move the equation and R-squared value to the side so that it is visible.
Tip: You can use an empty cell to have Excel calculate the square root of your R-squared value by selecting an empty cell in your sheet and using the =sqrt() function.
Type your R-squared value into the parenthesis and hit enter.
- MAT 240 Scatterplots in Excel Tutorial
,
MAT 240 Random Sampling in Excel Tutorial This tutorial will guide you though the steps necessary to collect a random sample of a data set to put on a new sheet.
1. Open your data set in Excel. Be sure the Analysis toolpak is enabled. Steps for how to do this are available on the Microsoft support site.
2. To find a random sample, you first need to insert the =rand() function an empty column next to your data. In the example being shown, it is column G. To do this, select the target cell and type in =rand() then press enter.
3. Double click the Fill handle (little square icon) at the bottom right side of the highlighted cell to copy the formula through to the bottom of the data set. This will copy this formula to each row
of data.
4. Sort your new column to rearrange the data into a random order. To do this, select the data within your column, then click the Sort & Filter button from the Home ribbon and choose Sort
Smallest to Largest.
5. A dialog box will open asking if you what you want to do. Select to Expand the selection and click Sort.
6. Capture your sample size by selecting the amount of rows you are sampling. A sample of 50 would mean you should select the first 50 rows of data.
a. By selecting only the first cell of data in the first column and dragging down, Excel will count the number of rows for you.
b. Once you have the correct number of rows, then drag to the right to highlight all the data in the appropriate number of rows.
7. Cut and paste this selected data set onto a new sheet and you will have your random sample separated from the main data set.
8. In the Descriptive statistics window, select input range field, then select all your numerical data
9. Then check the Summary Statistics box and click ok
10. You now should see a new sheet with just your descriptive statistics listed in a chart. Change the titles of the columns to their respective names from your data: median listing price, median dollars per square foot, median square feet. And remove any extraneous information that is not needed for this project.
- MAT 240 Random Sampling in Excel Tutorial
,
Selling Price Analysis for D.M. Pan National Real Estate Company 2
[ Note: To complete this template, replace the bracketed text with your own content. Remove this note before you submit your outline.]
Report: Selling Price and Area Analysis for D.M. Pan National Real Estate Company
[Michael Rickman]
Selling Price and Area Analysis for D.M. Pan National Real Estate Company 1
Southern New Hampshire University
Introduction
[Include in this section a brief overview, including the purpose of the report.]
Representative Data Sample
[Present your simple random sample of 30, including the region you selected for your sample. Then identify the mean, median, and standard deviation of the listing price and the square foot variables.]
Data Analysis
[Discuss how the regional sample created is reflective of the national market. Compare and contrast your regional sample with the national population using the National Statistics and Graphs document found in the Module Two Assignment Guidelines and Rubric.
Explain how you have made sure that the sample is random. Explain your methods to get a truly random sample.]
Scatterplot
[Insert a scatterplot graph of the sample using the x and y variables. Include a trend line and regression equation.]
The Pattern
[Based on your graph, define each variable, and explain which variable will be useful for making predictions and why.]
[Describe the association between x and y in the scatterplot and determine its shape. Identify any outliers you see in the graph and explain why these occur and what they represent.]
[If you had a 1,800 square foot house, based on the regression equation in the graph, what price would you choose to list at? Explain.]
,
MAT 240 Descriptive Statistics in Excel Tutorial This tutorial will guide you though the steps necessary to pull out the descriptive statistics of your data using the Analysis tookpak. If you do not have the Office 365 version of Excel, please download the latest version free using the Excel Download Tutorial.
1. Open your data set in Excel. Be sure the Analysis toolpak is enabled. Steps for how to do this are available on the Microsoft support site.
2. Navigate to the data tab and select the Data Analysis tools button
3. Select Descriptive Statistics from the analysis tools window that pops up
4. In the Descriptive statistics window, select input range field, then select all your numerical data
5. Then check the Summary Statistics box and click ok
6. You now should see a new sheet with just your descriptive statistics listed in a chart. Change the titles of the columns to their respective names from your data: median listing price, median dollars per square foot, median square feet. And remove any extraneous information that is not needed for this project.
- MAT 240 Descriptive Statistics in Excel Tutorial
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.
