Econometrics
Department of Economics Columbia University UN3412 Spring 2024 Problem Set 4 Introduction to Econometrics for all sections (due on Feb. 29th at 10 am) __________________________________________________________________________________________ Please make sure to select the page number for each question while you are uploading your solutions to Gradescope. Otherwise, it is hard to grade your answers, and you will lose 5 points. 1. (30p) Use the data in hprice1.dta. to estimate the following model (description of the variables in the data set is listed below in Table 1: ๐๐๐๐๐ = ๐ฝ! + ๐ฝ1″ ๐ ๐๐๐๐ก + ๐ฝ# ๐๐๐๐๐ + ๐ข where price = the (selling) price of the house (in 1000 dollars), sqrft = size of house (square feet) and bdrms = number of bedrooms in the house. (a) (3p) Write out the estimation result in equation form. (b) (3p) What is the estimated increase in price for a house with one more bedroom keeping square footage constant? (c) (6p) What is the estimated increase in price for a house with an additional 1400-square-foot bedroom added? Compare this to your answer in (b). (d) (6p) What percentage of the variation in price is explained by square footage and number of bedrooms? Compare your answer to the adjusted ๐ # . Explain the difference. (e) (6p) Consider the first house in the sample. Report the square footage and number of bedrooms for this house. Find the predicted selling price for this house from the OLS regression line. (f) (6p) What is the actual selling price of the first house in the sample? Find the residual of this house. Does it suggest that the buyer underpaid or overpaid for the house? Explain. Table 1: DATA DESCRIPTION, FILE: hprice1.dta Variable price Assess bdrms Lotsize Sqft colonial Lprice lassess llotsize lsqft Definition House price, in $1000. Assessed value in $1000. Average number bedrooms. Size of lot in square feet. Size of house in square feet = 1 if house is in Colonial style. = 0 otherwise. Log(price) Log(assess) Log(lotsize) Log(sqft) 1 2. (35p) In this exercise you will build on the investigation in problem set 3 regarding the factors influencing academic achievement at the high school level in various countries. Again, the theory driving this investigation is that what drives student success is some combination of resources and culture. (a) (4p) Know that in addition to being asked to supply call and response dialogue from your software for many of the question parts, you will also be asked to copy and paste your .do file (or .R script file or a substitute) at the end of the question. Open your software and start a log as desired. Read in the data. In problem set 3 a linear relationship between PISA scores and 2022 per capita GDP was used. Recall that the R-squared for the simple regression of test scores on GDP22 was over 50% and the coefficient had a t-statistic over 5. When the regional indicators were also included, the t-statistic on GDP22 was still over 3. Create a scatterplot, perhaps with a fitted line, with PISA on the vertical axis and GDP22 on the horizontal axis. Copy and paste your scatterplot. (b) (4p) It seems as though the relationship between PISA scores and GDP is nonlinear. This is not surprising in that the GDP levels grow exponentially. Create a variable to hold the log of GDP22. Then do a regression of PISA scores only on the log of GDP22. Copy and paste that regression call and response below. By how much is the R-squared higher than it was in the simple regression in problem set 3 when using the levels, not the logs, of GDP22? What is the change in the Root MSE (the SER)? How has the t-statistic on the coefficient changed? (c) (4p) Store, i.e. create a name for, or otherwise get access to the location of, the predicted values for this regression. (In Stata, if you chose to name your predicted values something like “GDPcurve”, the command would be “predict GDPcurve”.) Create a graph of your data and the predicted values from the regression. (In Stata a simple one might be “scatter pisa GDPcurve gdp22”.) Copy and paste your graph. (d) (4p) Try creating the curve using the square and cube of GDP22. Create those two variables and run the regression. No need to report the regression below, but do answer this question below: Is the coefficient on the cubed variable significant at the 5% level? Drop the cubed variable and rerun the regression. Copy and paste that regression below. In your qualitative opinion, are the measures of fit much different from each other with or without the cubed variable? (e) (4p) The common tendency is for standard errors that are heteroskedasticity-robust to be larger than those using the homoscedasticity-only formula. This in turn makes the t-statistics, and often the F-statistics, bigger, both of which make it easier to reject the null hypothesis. Rerun the regression in part d on a homoscedasticity-only basis. No need to report the regression below, but answer this question below: Focusing on the standard errs and the t-statistics for the two coefficients, does this usual pattern hold in this case? (f) (4p) Return to working on the usual heteroskedasticity-robust basis. Let us use the log of GDP in the regressions going forward rather than the polynomial in order to make use of the percentage change interpretation. Run a regression of the log of gdp and the regional indicator variables using the UKUS variable the base case, no need to paste those results. Do a linear 2 hypothesis test to know if the indicator variables are still jointly significant. Paste the results of that test below. Are the indictors still jointly significant at the 5% level? (g) (4p) To further explore the interaction of resources and culture, create five interaction variables, one for each of the indicators times logGDP. Add these five variables to your regression. Copy and paste the regression results below. Examine the sign of the coefficient on the log of GDP. Has it reversed from our previous regressions? Examining the signs of the interaction terms, are there any regions (outside of the base case UKUS region) for which we would expect an increase in income to be associated with a lower PISA score? If we want to know if logGDP is still statistically significant, we must do a linear restriction test that includes all variables that contain log GDP, six in this case. Do that test and print the results. Is logGDP significant at the 5% level? (h) (4p) Examining the coefficients on the interaction terms, are there one or two regions that seem like they may be able to (or choose to) get more educational achievement from an increase in per capita income? (i) (3p) Copy and paste your .do file (or coding in R) onto your answer sheet. 3. (35p) In this problem we will try and work towards understanding whether governmentsubsidized savings accounts help people save towards retirement, and if so, by how much. We’ll do this using the 401ksubs.dta dataset attached to this problem set.1 This is a dataset of a cross-section of individuals and includes information on basic demographics, their income and wealth, and whether they participate in a 401(k) account. (a) (7p) Start by running a naïve regression. Regress net total assets on the dummy variable indicating whether the respondent has a 401(k) account. Interpret the sign and magnitude of the coefficient. Can you give this estimate a causal interpretation? Why (not)? (b) (7p) Now add in the dummy for eligibility for a 401(k) account and interpret the coefficient [Hint: Can you have a 401(k) account if you are not eligible?]. Does the coefficient on eligibility imply that being eligible for a 401(k) lowers savings? Why (not)? What omitted factors do you think are being picked up here? (c) (7p) Now let’s drop eligibility from the regression, but let’s add in a set of controls. Add in the dummy for IRA participation, age, age squared, family size, income, income squared, the male dummy, and the marriage dummy. Interpret five of the coefficients. How does the coefficient on p401k change? Now do you think you can interpret the coefficient on p401k as causal? Why (not)? 1 Incidentally, you can access this, and all the other datasets used in the Stock & Watson (https://fmwww.bc.edu/ecp/data/stockwatson/datasets.list.html) and Wooldridge (http://fmwww.bc.edu/ecp/data/wooldridge/datasets.list.html) textbooks through Boston College. They even set up a nice stata command that lets you read them straight into stata called bcuse. To install it type “ssc install bcuse” into stata. Then you can load this dataset by typing “bcuse 401ksubs.dta, clear” 3 (d) (7p) Let’s explore the possibility that the controls matter differently for men than for women. Run a regression of net total assets on the dummy for 401(k) participation and then all the controls as well as their interactions with the male dummy. Interpret the coefficient on p401k and the interaction with the male dummy for two of the controls. Test whether all of the interactions of the controls with the male dummy are jointly significant. How does this change whether you think the coefficient on p401k is causal? (e) (7p) Finally, let’s see whether 401(k) participation affects savings differentially for men vs women. Run the regression from part d) but also interact the 401k participation dummy with the male dummy. What does this regression imply is the effect on savings of 401k participation for women? For men? Test whether the effect for men is = 0. Test whether the effect for women is = 0. Test whether the effects are the same for men as for women. The following questions will not be graded. They are for you to practice and may be discussed at the recitation: The questions to be considered are: 1. 2. 3. 4. SW Exercise 8.2 (See textbook for question wording) SW Exercise 8.10 (See textbook for question wording) SW Exercise 9.6 (See textbook for question wording) SW Empirical Exercise 8.2 (AHE) 4
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.