Research Question
Is Gross Domestic Product of the country related to its citizens’ average life expectancy at birth?
Variables
Quantitative Explanatory Variable: incomeperperson (representing annual income per person in US Dollar)
Quantitative Response Variable: lifeexpectancy (representing the average number of years a newborn child would live)
Centered Explanatory Variable: cincomeperperson (the mean is very close to zero)
Centered Explanatory Variable without Two Extreme Outliers: nocincomeperperson (the mean is very close to zero)
Program
/* Start the data step */
LIBNAME mydata “/courses/d1406ae5ba27fe300 ” access=readonly;
DATA new; set mydata.gapminder;
/* Assign label names for variables */
LABEL incomeperperson=”Income Per Person” /*”Income Per Person – 2010 Gross Domestic Product Per Capita in Constant 2000 US$”*/
lifeexpectancy=”Life Expectancy” /*”2011 Average Number of Years a Newborn Child Would Live”*/
cincomeperperson=”Centered Income Per Person”
nocincomeperperson=”Centered Income Per Person without Outliers”
/* set omitted values to missing */
IF incomeperperson=’ ‘ THEN incomeperperson=.;
IF lifeexpectancy=’ ‘ THEN lifeexpectancy=.;
/* create new variable called cincomeperperson to center the explanatory variable (incomeperperson) by subtracting the mean */
cincomeperperson=incomeperperson-8740.9655;
/* create another new variable called nocincomeperperson to center the explanatory variable (incomeperperson) and for removing two extreme outliers*/
nocincomeperperson=incomeperperson-8509.72;
IF country=’Equatorial Guinea’ THEN nocincomeperperson=.;
ELSE IF country=’Luxembourg’ THEN nocincomeperperson=.;
PROC SORT; by COUNTRY;
/* Calculate the means of cincomeperperson and nocincomeperperson to check the centering. Means should be zero or very close to zero*/
PROC MEANS; var cincomeperperson;
PROC MEANS; var nocincomeperperson;
Run;
/* Test a linear regression model */
PROC GLM; Model lifeexpectancy=cincomeperperson /solution;
PROC GLM; Model lifeexpectancy=nocincomeperperson /solution;
Run;
Output for Checking the Centering
The quantitative explanatory variable, incomeperperson, was centered by subtracting the mean. A new variable named cincomeperperson was created and served as the centered variable of incomeperperson. cincomeperperson has the mean which is very close to zero. The means procedure is as follows:
In order to avoid distorting regression coefficients in the test, another new centered variable, called nocincomeperperson was created with the removal of two extreme outliers. nocincomeperperson also has the mean which is very close to zero. The means procedure is as follows:
Output and Result for the Linear Regression Model
(With and Without Extreme Outliers)
Output with Outliers
There were 176 countries in this test, including extreme outliers. The result of the linear regression model indicated that income per person (Beta/Regression Coefficient=0.00055, p-value<.0001) was significantly and positively associated with life expectancy.
The r² (r square) of 0.3618 suggests that if we know the income per person, we can predict 36% of the variability we will see in life expectancy.
Output without Two Extreme Outliers
There were 174 countries in this test. Two extreme outliers were removed. The result of the linear regression model indicated that income per person (Beta/Regression Coefficient=0.00059, p-value<.0001) was significantly and positively associated with life expectancy.
The r² (r square) of 0.3824 suggests that if we know the income per person, we can predict 38% of the variability we will see in life expectancy.
The conclusion did not change significantly after the two extreme outliers were removed.





































You must be logged in to post a comment.