EXERCISE 8 - ANSWER

Another thing that income affects is the amount of money that a household spends on its car(s). Regress the value of a household's vehicles on its monthly income to see if this is true here. Please show a graph.

The variables of interest here are stxcars (the value of a family's car(s)) and totminc (a family's total monthly income). These are again household level variables so one should reduce the dataset accordingly.

This regression command should generate the following table:

regress stxcars totminc

  Source |       SS       df       MS                  Number of obs =     957
---------+------------------------------               F(  1,   955) = 2980.92
   Model |  4.8898e+09     1  4.8898e+09               Prob > F      =  0.0000
Residual |  1.5666e+09   955  1640367.58               R-squared     =  0.7574
---------+------------------------------               Adj R-squared =  0.7571
   Total |  6.4563e+09   956  6753504.01               Root MSE      =  1280.8

------------------------------------------------------------------------------
 stxcars |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 totminc |   .4792248   .0087774     54.598   0.000       .4619997      .49645
   _cons |  -734.5361    44.4826    -16.513   0.000      -821.8311   -647.2412
------------------------------------------------------------------------------

One can readily see that the t-stat of 54.598 means that the relationship of income to the value of a family's vahicle(s) is significant. An income increase of one Rand means that a family will spend almost one half of it on their car(s).

Now predict the values of the regression and graph the picture with these commands:

predict carshat
graph stxcars carshat totminc, connect(.s) symbol (oi) ylabel xlabel

As one can see in this graph, all of the data is scrunched to the left of the picture. It looks like outliers have presented a major problem. To get rid of this drop any observation whose income is more than 50,000 rand per month and see if the line has a better fit.

drop if totminc>50000

regress stxcars totminc

  Source |       SS       df       MS                  Number of obs =     956
---------+------------------------------               F(  1,   954) =  444.55
   Model |  23597708.4     1  23597708.4               Prob > F      =  0.0000
Residual |  50640301.6   954  53082.0771               R-squared     =  0.3179
---------+------------------------------               Adj R-squared =  0.3172
   Total |  74238009.9   955   77736.136               Root MSE      =  230.40
------------------------------------------------------------------------------
 stxcars |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 totminc |   .0618106   .0029316     21.084   0.000       .0560575    .0675637
   _cons |  -36.50336    9.00513     -4.054   0.000      -54.17551   -18.83121
------------------------------------------------------------------------------

This is a case of where the one outlier in the graph had a tremendous effect on the regression.

By removing the one observation, our regression results actually look worse (lower t-stat and R-squared) but it is really more indicative of the population as a whole.

Re-predict and re-graph to see the difference:

predict cars1hat
graph stxcars cars1hat totminc, connect(.s) symbol (oi) ylabel xlabel

The data are more dispersed in this picture too. This regression is a much better fit for the majority of the population.

 

BACK TO EXERCISE QUESTIONS