The variables of interest here are mxtrent and rooms_to. Mxtrent stands for total monthly housing expenditure and rooms_to stands for the number of rooms a residence has, a decent measure of size. It is important to remember the issue with household variables, so you have to control for the number of observations per household id.
sort hhid
regress mxtrent rooms_to if hhid~=hhid[_n-1]
The results should be the following:
Source | SS df MS Number of obs = 1062 ---------+------------------------------ F( 1, 1060) = 265.38 Model | 47970249.7 1 47970249.7 Prob > F = 0.0000 Residual | 191608318 1060 180762.564 R-squared = 0.2002 ---------+------------------------------ Adj R-squared = 0.1995 Total | 239578567 1061 225804.493 Root MSE = 425.16 ------------------------------------------------------------------------------ mxtrent | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- rooms_to | 84.86547 5.209539 16.290 0.000 74.64329 95.08765 _cons | -88.00493 25.06526 -3.511 0.000 -137.1881 -38.82178 ------------------------------------------------------------------------------
Because the resulting t-stat of rooms_to (16.290) is greater than 1.96 (or 2), it is safe to say that the number of rooms in a residence does have a significant effect on its monthly cost.
To find out how much more an additional room would cost, you have to look at the "coefficient" for rooms_to.
In this case, each additional room increases the monthly cost of living in a residence by 84.87 Rand.
In order to graph the relationship between mxtrent and rooms_to, you have to first "predict" the values of the regression.
You can call your predicted variable by any name, but we chose the name renthat.
predict renthat
With the following graph command you should produce this graph:
sort hhid
graph mxtrent renthat rooms_to if hhid~=hhid[_n-1], connect(.s) symbol (Oi) ylabel xlabel
The graph demonstrates a reasonably well fitting line, however, it is very likely that without the outliers the regression line would appear to fit better.