This is a question of causality. From the onset, it isn't immediately obvious whether spending too little on health care will make someone suffer from an illness longer or if those who had more serious illnesses (ones that made them be sick more days) spent more on health care. A good research methodology would want to come up with a solid theory of causality first and then test his/her hypothesis. You can try both ways using a regression.
lookfor health care
216. mxhea float %9.0g health care exp.
The variable we are interested in for health care expenditure is mxhea. What about for days sick?
lookfor days sick
14. resident byte %9.0g 12 :15 days out of last 30 days 133. still_si byte %9.0g 3a:still sick? 134. sick1_c byte %9.0g sick1_c 4 :1st sick code 135. sick2_c byte %9.0g sick2_c 5 :2nd sick code 136. days_sic byte %9.0g 6 :no of days sick 137. days_out byte %9.0g 7 :days out of work 288. stxhol int %9.0g holidays.
Use the variable days_sic for number of days sick. Now regress one upon the other. First try regressing mxhea on days_sic.
Our hypothesis here is that number of days sick affects how much people spend on health care.
regress mxhea days_sic
Source | SS df MS Number of obs = 122 ---------+------------------------------ F( 1, 120) = 0.60 Model | 2079.63937 1 2079.63937 Prob > F = 0.4397 Residual | 415232.303 120 3460.2692 R-squared = 0.0050 ---------+------------------------------ Adj R-squared = -0.0033 Total | 417311.943 121 3448.85903 Root MSE = 58.824 ------------------------------------------------------------------------------ mxhea | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- days_sic | -.8727225 1.125737 -0.775 0.440 -3.101604 1.356159 _cons | 31.51363 9.710739 3.245 0.002 12.28704 50.74021 ------------------------------------------------------------------------------
From this table, we see that Stata gives us a coefficient for days_sic of -.8727225. This says that for every additional day sick, someone spends .8727 less rand on health care. But wait, this is not a valid conclusion. By looking at the t statistic one can see that it is insignificant (-.775>-2). Therefore, this regression tells us that there is not a reason to believe that the number of days sick systematically affects the amount of money spent on health care, at least in a linear fashion anyway.
Now let's try our other hypothesis: the amount of money spent on health care affects the number of days someone is sick.
regress days_sic mxhea
Source | SS df MS Number of obs = 122 ---------+------------------------------ F( 1, 120) = 0.60 Model | 13.6070155 1 13.6070155 Prob > F = 0.4397 Residual | 2716.852 120 22.6404333 R-squared = 0.0050 ---------+------------------------------ Adj R-squared = -0.0033 Total | 2730.45902 121 22.565777 Root MSE = 4.7582 ------------------------------------------------------------------------------ days_sic | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- mxhea | -.0057102 .0073657 -0.775 0.440 -.0202937 .0088733 _cons | 7.357118 .4691279 15.683 0.000 6.428277 8.285958 ------------------------------------------------------------------------------
This table says that for every additional rand someone spends on health care, that person will get better .0057 days faster. Better said, for every 100 additional rand that someone spends on health care, he/she will get better a half day faster. But wait, this isn't a statistically significant result either because the t stat is insignificant again (-.775>-2). This regression tells us that there is no systematic relationship between health care and days sick the other way, at least not in a linear fashion. There may be another kind of (nonlinear) relationship between these two variables, but we have failed to show that one exists here. In the next module, you will learn how to tackle some of these nonlinear relationships.