The variables of interest here are stxcars (the value of a family's car(s)) and totminc (a family's total monthly income). These are again household level variables so one should reduce the dataset accordingly.
This regression command should generate the following table:
regress stxcars totminc
Source | SS df MS Number of obs = 957 ---------+------------------------------ F( 1, 955) = 2980.92 Model | 4.8898e+09 1 4.8898e+09 Prob > F = 0.0000 Residual | 1.5666e+09 955 1640367.58 R-squared = 0.7574 ---------+------------------------------ Adj R-squared = 0.7571 Total | 6.4563e+09 956 6753504.01 Root MSE = 1280.8 ------------------------------------------------------------------------------ stxcars | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- totminc | .4792248 .0087774 54.598 0.000 .4619997 .49645 _cons | -734.5361 44.4826 -16.513 0.000 -821.8311 -647.2412 ------------------------------------------------------------------------------
One can readily see that the t-stat of 54.598 means that the relationship of income to the value of a family's vahicle(s) is significant. An income increase of one Rand means that a family will spend almost one half of it on their car(s).
Now predict the values of the regression and graph the picture with these commands:
predict carshatAs one can see in this graph, all of the data is scrunched to the left of the picture. It looks like outliers have presented a major problem. To get rid of this drop any observation whose income is more than 50,000 rand per month and see if the line has a better fit.
drop if totminc>50000
regress stxcars totminc
Source | SS df MS Number of obs = 956 ---------+------------------------------ F( 1, 954) = 444.55 Model | 23597708.4 1 23597708.4 Prob > F = 0.0000 Residual | 50640301.6 954 53082.0771 R-squared = 0.3179 ---------+------------------------------ Adj R-squared = 0.3172 Total | 74238009.9 955 77736.136 Root MSE = 230.40 ------------------------------------------------------------------------------ stxcars | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- totminc | .0618106 .0029316 21.084 0.000 .0560575 .0675637 _cons | -36.50336 9.00513 -4.054 0.000 -54.17551 -18.83121 ------------------------------------------------------------------------------
This is a case of where the one outlier in the graph had a tremendous effect on the regression.
By removing the one observation, our regression results actually look worse (lower t-stat and R-squared) but it is really more indicative of the population as a whole.
Re-predict and re-graph to see the difference:
predict cars1hat
graph stxcars cars1hat totminc, connect(.s) symbol (oi) ylabel xlabel
The data are more dispersed in this picture too. This regression is a much better fit for the majority of the population.