Economics Question
Department of Economics Columbia University UN3412 Spring 2024 Problem Set 1 Introduction to Econometrics for all sections (Due on Fab. 8th at 10 am) __________________________________________________________________________________________ “Calculator” was once a job description. This problem set gives you an opportunity to do some calculations on the relationship between smoking and lung cancer using a (very) small sample of five countries. The purpose of this exercise is to illustrate the mechanics of ordinary least squares (OLS) regression. You will calculate the regression “by hand” using formulas from class and the textbook. For these calculations, you may relive history and use long multiplication, long division, and tables of square roots and logarithms; or you may use an electronic calculator or a spreadsheet. For example, if you are using Excel, use the formulas to calculate sample slope and sample intercept, do not use the regression function or any readymade formulas. We will ask the same question in Problem Set #2, in which you will use Stata or R to answer these questions. The data are summarized in the following table. The variables are per capita cigarette consumption in 1930 (the independent variable, “X”) and the death rate from lung cancer in 1950 (the dependent variable, “Y”). The cancer rates are shown for a later time period because it takes time for lung cancer to develop and be diagnosed. Observation # Country 1 2 3 4 5 Switzerland Finland Great Britain Canada Denmark Cigarettes consumed per capita in 1930 (X) 530 1115 1145 510 380 Lung cancer deaths per million people in 1950 (Y) 250 350 465 150 165 1. (28p) Use a calculator, a spreadsheet, or “by hand” methods to compute the following: refer to the lecture slides or textbook for the necessary formulas. (Note: if you use a spreadsheet, copy/paste a printout; do not use the built-in formulas. You need to calculate it step-by-step) (a) (3p) The sample means of ๐ and ๐, ๐ฬ and ๐ฬ (b) (3p) The standard deviations of X and Y, ๐ ๐ and ๐ ๐ . (c) (3p) Covariance X and Y, ๐ ๐๐ . (d) (3p) The correlation coefficient, r, between X and Y 1 (e) (4p) ๐ฝฬ1, the OLS estimated slope coefficient from the regression ๐๐ = ๐ฝฬ0 + ๐ฝฬ1 ๐๐ + ๐ข๐ (f) (4p) ๐ฝฬ0, the OLS estimated intercept term from the same regression (g) (4p) ๐ฬ๐ , ๐ = 1, … , ๐, the predicted values for each country from the regression (h) (4p) ๐ขฬ๐ , the OLS residual for each data point. 2. (10p) On graph paper or using a spreadsheet, graph the scatterplot of the four data points and the regression line. Be sure to label the axes, the data points, the regression equation with slope and intercept of the regression line. 3. (22p) Adult males are taller, on average, than adult females. Visiting two recent American Youth Soccer Organization (AYSO) under-12-years-old (U12) soccer matches on a Saturday, you do not observe an obvious difference in the height of boys and girls of that age. You suggest to your little sister that she collect data on height and gender of children in 4th to 6th grades as part of her science project. The accompanying table shows her findings. Height of Young Boys and Girls, Grades 4-6, in inches Boys Girls ฬ ๐ฉ๐๐๐ ๐ ๐๐ฉ๐๐๐ ๐๐ฉ๐๐๐ ฬ ๐ฎ๐๐๐๐ ๐ ๐๐ฎ๐๐๐๐ ๐๐ฎ๐๐๐๐ 57.8 3.9 55 58.4 4.2 57 ฬ ๐ฉ๐๐๐ is the sample average height for boys, ๐๐ฉ๐๐๐ is the number of boys in the Where ๐ sample, ๐๐ ๐ฉ๐๐๐ is the sample variance of height of boys. (a) (3p) Let your null hypothesis be that there is no difference in the height of females and males at this age level. Specify the alternative hypothesis. (b) (3p) What is the unbiased estimate of the difference in height between boys and girls? Provide a formula and check the unbiasedness. Calculate the value of this estimate for the given sample. (c) (4p) Derive the formula for the variance of the estimate from (b). Calculate the estimate of the variance for the given sample. (d) (4p) Create a statistic for testing the hypothesis in (a) using the Central Limit Theorem and the Law of Large Numbers. (e) (4p) Calculate the t-statistic for comparing the two means. Is the difference statistically significant at the 1% level? Which critical value did you use? Why would this number be smaller if you had assumed a one-sided alternative hypothesis? What is the intuition behind this? 2 (f) (4p) Generate a 95% confidence interval for the difference in height. 4. (16p) Let Y1, Y2, Y3, Y4, be independently, identically distributed random variables from a population with mean ๏ญ and variance ๏ณ2. Let Y = (1/4) (Y1+Y2+Y3+Y4) denote the average of these four random variables. (a) (4p) What are the expected value and variance of Y in terms of ๏ญ and ๏ณ2? (b) (4p) Now, consider a different estimator of ๏ญ: แปธ =(1/8)Y1+(1/8)Y2,+(1/4)Y3+(1/2)Y4. This is an example of a weighted average of the Yi.’s. Show that แปธ is also an unbiased estimator of ๏ญ. Find the variance of แปธ. (c) (4p) Based on your answer to parts (a) and (b), which estimator of ๏ญ do you prefer, Y or แปธ, why? (d) (4p) Suppose Y1, Y2, Y3, Y4 follow a Normal distribution with mean ๏ญ=5 and variance ๏ณ2=3. What is the distribution of ๐ฬ and ๐ฬ? 5. (24p) You are given data on a simple random sample of two opposite-sex couples. That is, we have randomly sampled two couples from the population of opposite-sex couples. Let the heights of the two men be ๐1 , ๐2 ; and let the heights of the two women be ๐1, ๐2 . The population distribution of men’s heights has mean ๐๐ and variance ๐๐2 . The population 2 distribution of women’s heights has mean ๐๐ and variance ๐๐ . You want to estimate the average height of people in opposite-sex couples. (a) (5p) What is the population mean height of people in opposite-sex couples? ฬ = 1 (๐1 + ๐2 + ๐1 + ๐2 ). Show that this estimator (b) (5p) Consider the sample average ๐ป 4 is unbiased. (c) (7p) Suppose that couples form assortatively on height so that within couples, the covariance between the heights of the man and the woman, cov(๐, ๐), is 0.5. What is the ฬ ? sample variance of ๐ป (d) (7p) Suppose instead that couples form randomly so that the covariance between the 2 heights of the man and the woman is zero. Suppose also that you know ๐๐ and ๐๐2 . You may now want to consider a sample average that puts different weights on the men and the women in the sample as an estimator. What is the sample weighted average estimator that has the smallest sample variance? [NB: A weighted average is a weighted sum of the ฬ= observations such that the sum of the weights is equal to 1, so your estimator is ๐ป ๐พ๐ ๐1 + ๐พ๐ ๐2 + ๐พ๐ ๐1 + ๐พ๐ ๐2 where 2๐พ๐ + 2๐พ๐ = 1] 3 Following questions will not be graded, you do not need to submit your solutions. They are for you to practice and their solutions will be discussed at the recitation this week: 6. [Practice question, not graded] Rain (X=0) No Rain (X=1) Total Long Commute (Y=0) 0.15 0.07 0.22 Short Commute (Y=1) 0.15 0.63 0.78 Total 0.30 .70 1.00 Using the random variables X and Y from Table 2.2 (given above), consider two new random variables W = 3 + 6X and V = 20 – 7Y. Compute: (a) E(W) and E(V). (b) σ²W and σ²V. (c) σW,V and Corr(W,V). 7. [Practice question, not graded] SW 2.6 The following table gives the joint probability distribution between employment status and college graduation among those either employed or looking for work (unemployed) in the working age US population, based on the 1990 US Census. Unemployed (Y=0) Employed (Y=1) Total Non-college grads (X=0) 0.045 0.709 0.754 College grads (X=1) 0.005 0.241 0.246 Total 0.050 0.950 1.000 (a) Compute E(Y). (b) The unemployment rate is the fraction of the labor force that is unemployed. Show that the unemployment rate is given by 1-E(Y). (c) Calculate the E(Y|X=1) and E(Y|X=0). (d) Calculate the unemployment rate for (i) college graduates and (ii) non-college graduates. (e) A randomly selected member of this population reports being unemployed. What is the probability that this worker is a college graduate? A non-college graduate? 4 (f) Are educational achievement and employment status independent? Explain. 8. [Practice question, not graded] SW 2.14 [Hint: Use SW Appendix Table 1.] In a population E[Y] = 100 and Var(Y) = 43. Use the central limit theorem to answer the following questions: (a) In a random sample of size n = 100, find Pr( Y ≤101) (b) In a random sample of size n = 165, find Pr( Y >98) (c) In a random sample of size n = 64, find Pr(101 ≤ Y ≤103) 9. [Practice question, not graded] SW 3.12 To investigate possible gender discrimination in a firm, a sample of 100 men and 64 women with similar job descriptions are selected at random. A summary of the resulting monthly salaries are: Avg. Salary ( Y ) Stand Dev (of Y) n Men $3100 $200 100 Women $2900 $320 64 (a) What do these data suggest about wage differences in the firm? Do they represent statistically significant evidence that wages of men and women are different? (To answer this question, first state the null and alternative hypothesis; second, compute the relevant t-statistic; and finally, use the p-value to answer the equation.) (b) Do these data suggest that the firm is guilty of gender discrimination in its compensation politics? Explain. 5
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.