#f/2.84E
Explore tagged Tumblr posts
gossipdepartement · 7 years ago
Text
Nikon D7500 20.9MP DSLR Camera with AF-S DX NIKKOR 16-80mm f/2.8-4E ED VR Lens, Black
Nikon D7500 20.9MP DSLR Camera with AF-S DX NIKKOR 16-80mm f/2.8-4E ED VR Lens, Black
Tumblr media
Class leading image quality, ISO range, image processing and metering.
Large 3.2″ 922K dot, tilting LCD screen with touch functionality.
51-point AF system with 15 cross-type sensors and group-area AF paired with up to 8 fps continuous shooting capability.
4K Ultra HD and 1080p Full HD video with stereo sound,…
View On WordPress
0 notes
raghavagarwal-blog1 · 5 years ago
Text
MULTIPLE REGRESSION MODEL
CODE:
import pandas import seaborn import matplotlib.pyplot as plt import statsmodels.api as sm import statsmodels.formula.api as smf
# any additional libraries would be imported here data = pandas.read_csv('gapminder.csv', low_memory=False)
# convert to numeric format data['alcconsumption'] = pandas.to_numeric(data['alcconsumption'], errors='coerce') data['urbanrate'] = pandas.to_numeric(data['urbanrate'], errors='coerce') data['suicideper100th'] = pandas.to_numeric(data['suicideper100th'], errors='coerce')
# listwise deletion of missing values sub1 = data[['urbanrate', 'alcconsumption', 'suicideper100th']].dropna()
# first order (linear) scatterplot scat1 = seaborn.regplot(x="alcconsumption", y="suicideper100th", scatter=True, data=sub1) plt.xlabel('Alcohol Consumption') plt.ylabel('Suicide Rates')
sub1['urbanrate_c'] = (sub1['urbanrate'] - sub1['urbanrate'].mean()) sub1['alcconsumption_c'] = (sub1['alcconsumption'] - sub1['alcconsumption'].mean())
# linear regression analysis reg1 = smf.ols('suicideper100th ~ urbanrate_c', data=sub1).fit() print (reg1.summary())
# adding internet use rate reg2 = smf.ols('suicideper100th ~ alcconsumption_c + urbanrate_c ', data=sub1).fit() print (reg2.summary())
#Q-Q plot for normality fig=sm.qqplot(reg2.resid, line='r')
# simple plot of residuals stdres=pandas.DataFrame(reg2.resid_pearson) plt.plot(stdres, 'o', ls='None') l = plt.axhline(y=0, color='r') plt.ylabel('Standardized Residual') plt.xlabel('Observation Number')
# leverage plot fig3=sm.graphics.influence_plot(reg2, size=8) print(fig3)
OUTPUT:
OLS Regression Results                             ============================================================================== Dep. Variable:        suicideper100th   R-squared:                       0.125 Model:                            OLS   Adj. R-squared:                  0.120 Method:                 Least Squares   F-statistic:                     25.63 Date:                Fri, 14 Jun 2019   Prob (F-statistic):           1.02e-06 Time:                        22:41:31   Log-Likelihood:                -578.31 No. Observations:                 181   AIC:                             1161. Df Residuals:                     179   BIC:                             1167. Df Model:                           1                                         Covariance Type:            nonrobust                                         ====================================================================================                       coef    std err          t      P>|t|      [0.025      0.975] ------------------------------------------------------------------------------------ Intercept            9.6950      0.442     21.958      0.000       8.824      10.566 alcconsumption_c     0.4537      0.090      5.062      0.000       0.277       0.631 ============================================================================== Omnibus:                       58.494   Durbin-Watson:                   2.045 Prob(Omnibus):                  0.000   Jarque-Bera (JB):              139.216 Skew:                           1.423   Prob(JB):                     5.88e-31 Kurtosis:                       6.219   Cond. No.                         4.93 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.                            OLS Regression Results                             ============================================================================== Dep. Variable:        suicideper100th   R-squared:                       0.177 Model:                            OLS   Adj. R-squared:                  0.168 Method:                 Least Squares   F-statistic:                     19.19 Date:                Fri, 14 Jun 2019   Prob (F-statistic):           2.84e-08 Time:                        22:27:09   Log-Likelihood:                -572.75 No. Observations:                 181   AIC:                             1152. Df Residuals:                     178   BIC:                             1161. Df Model:                           2                                         Covariance Type:            nonrobust                                         ====================================================================================                       coef    std err          t      P>|t|      [0.025      0.975] ------------------------------------------------------------------------------------ Intercept            9.6950      0.429     22.580      0.000       8.848      10.542 alcconsumption_c     0.5385      0.091      5.935      0.000       0.359       0.718 urbanrate_c         -0.0657      0.020     -3.359      0.001      -0.104      -0.027 ============================================================================== Omnibus:                       53.336   Durbin-Watson:                   2.065 Prob(Omnibus):                  0.000   Jarque-Bera (JB):              117.432 Skew:                           1.330   Prob(JB):                     3.16e-26 Kurtosis:                       5.915   Cond. No.                         22.9 ==============================================================================
Tumblr media Tumblr media Tumblr media Tumblr media
SUMMARY:
From the scatter plot we can see that the relation between alcohol consumption and suicide rate is linear. We add a new explanatory variable urban rate and from the regression table we can see that the p values are significant for both the explanatory variables which means that there is no confounding effect of urban rate on alcohol consumption and suicide rates, but the relation between urban rate and suicide rate is negative. The residual values do not follow the line completely therefore we are certainly missing some important explanatory variables. There are no points outliers which contribute to the leverage plot.
0 notes
panjinkhoma-blog · 7 years ago
Text
Data Analysis Tools Week 4
ANOVA
I chose to work with the gapminder dataset
Python Code
import numpy import pandas as pd import statsmodels.formula.api as smf import statsmodels.stats.multicomp as multi import seaborn import matplotlib.pyplot as plt
# Load gapminder dataset and replace blank values as Na data = pd.read_csv("gapminder.csv", low_memory = False, na_values = " ")
# Bug fix for display formats to avoid run time errors pd.set_option("display.float_format", lambda x: "%f"%x)
# Show number of roles and columns print (len(data)) #number of observations (rows) print (len(data.columns)) # number of variables (columns)
# Convert all variables to numeric data["incomeperperson"] = pd.to_numeric(data["incomeperperson"],errors="coerce") data["lifeexpectancy"] = pd.to_numeric(data["lifeexpectancy"],errors="coerce") data["co2emissions"] = pd.to_numeric(data["co2emissions"],errors="coerce")
# To run ANOVA I will take employment rate as the Independent or X variable and life expectancy as the dependent variable #Creating new variable by Spliting eemployrate into 2 groups (25-60, 60-100)
data["life"] = pd.cut(data.lifeexpectancy, [25, 55, 100], labels=["short", "long"])
print("Counts (Frequencies) for life") c1 = data["life"].value_counts().sort_index(ascending = True) print(c1)
model1 = smf.ols(formula='incomeperperson ~ C(life)', data=data).fit() print (model1.summary())
sub1 = data[['incomeperperson', 'life']].dropna()
print ("means for income per person by lifeexpectancy short vs. long") m1= sub1.groupby('life').mean() print (m1)
print ("standard deviation for mean income per person by life short vs. long") st1= sub1.groupby('life').std() print (st1)
# bivariate bar graph seaborn.factorplot(x="life", y="incomeperperson", data=data, kind="bar", ci=None) plt.xlabel('life Expectancy') plt.ylabel('Mean Income Per Person')
# I want to moderate for CO2emmission and see whether it has an effect on the relationship between life expectancy and incom per person # Convert co2emisions to low and high (132000 - 5000000000 5000000000 - 340000000000)
data["co2"] = pd.cut(data.co2emissions, [132000, 5000000000, 340000000000], labels=["low", "high"])
sub2 = data[['incomeperperson', 'life', 'co2']].dropna()
sub3=data[(data['co2']=='low')] sub4=data[(data['co2']=='high')]
print ('association between life expectancy and income per person for those in low carbon dioxide emmission countries') model2 = smf.ols(formula='incomeperperson ~ C(life)', data=sub3).fit() print (model2.summary())
print ('association between life expectancy and income Per Person for those in high carbon dioxide emmission countries') model3 = smf.ols(formula='incomeperperson ~ C(life)', data=sub4).fit() print (model3.summary())
print ("means for incomeperperson by life short vs. long  for low") m3= sub3.groupby('life').mean() print (m3) print ("Means for incomeperperson by life short vs. long for high") m4 = sub4.groupby('life').mean() print (m4)
Output
213 16 Counts (Frequencies) for life short     24 long     167 Name: life, dtype: int64                            OLS Regression Results                             ============================================================================== Dep. Variable:        incomeperperson   R-squared:                       0.049 Model:                            OLS   Adj. R-squared:                  0.044 Method:                 Least Squares   F-statistic:                     8.989 Date:                Sun, 29 Apr 2018   Prob (F-statistic):            0.00311 Time:                        18:21:20   Log-Likelihood:                -1875.5 No. Observations:                 176   AIC:                             3755. Df Residuals:                     174   BIC:                             3761. Df Model:                           1                                         Covariance Type:            nonrobust                                         ===================================================================================                      coef    std err          t      P>|t|      [0.025      0.975] ----------------------------------------------------------------------------------- Intercept        1148.3985   2203.227      0.521      0.603   -3200.092    5496.889 C(life)[T.long]  7061.7667   2355.349      2.998      0.003    2413.035    1.17e+04 ============================================================================== Omnibus:                       69.319   Durbin-Watson:                   1.685 Prob(Omnibus):                  0.000   Jarque-Bera (JB):              155.486 Skew:                           1.827   Prob(JB):                     1.72e-34 Kurtosis:                       5.803   Cond. No.                         5.49 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. means for income per person by lifeexpectancy short vs. long       incomeperperson life                   short      1148.398518 long       8210.165256 standard deviation for mean income per person by life short vs. long       incomeperperson life                   short      2010.119040 long      10995.264363 association between life expectancy and income per person for those in low carbon dioxide emmission countries                            OLS Regression Results                             ============================================================================== Dep. Variable:        incomeperperson   R-squared:                       0.043 Model:                            OLS   Adj. R-squared:                  0.036 Method:                 Least Squares   F-statistic:                     6.321 Date:                Sun, 29 Apr 2018   Prob (F-statistic):             0.0131 Time:                        18:21:20   Log-Likelihood:                -1517.1 No. Observations:                 143   AIC:                             3038. Df Residuals:                     141   BIC:                             3044. Df Model:                           1                                         Covariance Type:            nonrobust                                         ===================================================================================                      coef    std err          t      P>|t|      [0.025      0.975] ----------------------------------------------------------------------------------- Intercept        1051.1692   2207.143      0.476      0.635   -3312.201    5414.539 C(life)[T.long]  5983.3357   2379.830      2.514      0.013    1278.575    1.07e+04 ============================================================================== Omnibus:                       79.735   Durbin-Watson:                   1.695 Prob(Omnibus):                  0.000   Jarque-Bera (JB):              262.292 Skew:                           2.269   Prob(JB):                     1.11e-57 Kurtosis:                       7.841   Cond. No.                         5.17 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. association between life expectancy and income Per Person for those in high carbon dioxide emmission countries                            OLS Regression Results                             ============================================================================== Dep. Variable:        incomeperperson   R-squared:                       0.028 Model:                            OLS   Adj. R-squared:                 -0.011 Method:                 Least Squares   F-statistic:                    0.7260 Date:                Sun, 29 Apr 2018   Prob (F-statistic):              0.402 Time:                        18:21:20   Log-Likelihood:                -290.77 No. Observations:                  27   AIC:                             585.5 Df Residuals:                      25   BIC:                             588.1 Df Model:                           1                                         Covariance Type:            nonrobust                                         ===================================================================================                      coef    std err          t      P>|t|      [0.025      0.975] ----------------------------------------------------------------------------------- Intercept        3745.6499    1.2e+04      0.313      0.757   -2.09e+04    2.84e+04 C(life)[T.long]  1.038e+04   1.22e+04      0.852      0.402   -1.47e+04    3.55e+04 ============================================================================== Omnibus:                        2.909   Durbin-Watson:                   1.900 Prob(Omnibus):                  0.233   Jarque-Bera (JB):                2.251 Skew:                           0.566   Prob(JB):                        0.324 Kurtosis:                       2.151   Cond. No.                         10.3 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. means for incomeperperson by life short vs. long  for low       incomeperperson  alcconsumption  armedforcesrate  breastcancerper100th  \ life                                                                             short      1051.169165        4.981364         0.565333             19.681818   long       7034.504879        6.269350         1.630469             37.226271  
         co2emissions  femaleemployrate  hivrate  internetuserate  \ life                                                                 short 191539166.666667         55.709091 7.135000         6.072571   long  722409047.263682         46.801626 0.925051        34.148252  
      lifeexpectancy  oilperperson  polityscore  relectricperperson  \ life                                                                   short       50.879000           nan     1.545455          161.251591   long        71.651843      1.596282     3.292453         1177.558105  
      suicideper100th  employrate  urbanrate   life                                           short        11.583359   65.122727  34.001818   long          9.264804   58.337398  56.234656   Means for incomeperperson by life short vs. long for high       incomeperperson  alcconsumption  armedforcesrate  breastcancerper100th  \ life                                                                             short      3745.649852       10.160000         0.331863             35.000000   long      14125.192877        9.603077         1.426218             51.355556  
           co2emissions  femaleemployrate   hivrate  internetuserate  \ life                                                                     short 14609848000.000000         34.299999 17.800000        12.334893   long  32802046172.839485         45.340740  0.301250        53.845072  
      lifeexpectancy  oilperperson  polityscore  relectricperperson  \ life                                                                   short       52.797000      0.504659     9.000000          920.137600   long        76.212889      1.340801     5.814815         1437.614364  
      suicideper100th  employrate  urbanrate   life                                           short        15.714571   41.099998  60.740000   long         10.859063   56.462963  73.145185  
Tumblr media
Comments
Is income per person associated with life expectancy for those countries with low or high co2 emission?
Results show that income per person is associated with life expectancy for those countries with low co2 emission p = 0.0131.
However, there is no association for those countries with high co2 emission p = 0.402
CHI SQUARE 
I chose to work with the gapminder dataset
Python Code
import pandas as pd import numpy import scipy.stats import seaborn import matplotlib.pyplot as plt import itertools from scipy import stats
data = pd.read_csv("gapminder.csv", low_memory=False, na_values = " ")
# Bug fix for display formats to avoid run time errors pd.set_option("display.float_format", lambda x: "%f"%x)
# setting variables I will be working with to numeric data['alcconsumption'] = pd.to_numeric(data['alcconsumption'], errors='coerce') data['suicideper100th'] = pd.to_numeric(data['suicideper100th'], errors='coerce') data['employrate'] = pd.to_numeric(data['employrate'], errors='coerce')
# Data management for Alcohol Consumption
data["alcconsumption_GRPS"] = pd.cut(data.alcconsumption, bins=[0, 12, 24], labels=["low", "high"])  
print("Counts (Frequencies) for Alcohol Consumption_GRPS") c1 = data["alcconsumption_GRPS"].value_counts(sort = False, dropna = True) print(c1)
# Data management for Suicide Per 100th
data["suicideper100th_GRPS"] = pd.cut(data.suicideper100th, bins=[0, 8, 38], labels=["0", "1"])
print("Counts (Frequencies) for suicideper100th_GRPS") c2 = data["suicideper100th_GRPS"].value_counts(sort = False, dropna = True) print(c2)
# contingency table of observed counts ct1=pd.crosstab(data['suicideper100th_GRPS'], data['alcconsumption_GRPS']) print (ct1)
sub1 = data.copy()
# Create dataframe containing "alcconsumption_GRPS" and "suicideper100th_GRPS", where # alcconsumption is modified to be 'low' and 'high' sub2 = sub1[['alcconsumption_GRPS', 'suicideper100th_GRPS', 'employrate']].dropna()
# contingency table of observed counts ct1=pd.crosstab(sub2['alcconsumption_GRPS'], sub2['suicideper100th_GRPS']) print (ct1)
# column percentages colsum=ct1.sum(axis=0) colpct=ct1/colsum print(colpct)
# chi-square print ('chi-square value, p value, expected counts') cs1= scipy.stats.chi2_contingency(ct1) print (cs1)
# set variable types sub2["alcconsumption_GRPS"] = sub2["alcconsumption_GRPS"].astype('category') sub2['suicideper100th_GRPS'] = pd.to_numeric(sub2['suicideper100th_GRPS'], errors='coerce')
# graph percent with alcohol consumption within each suicide frequency group seaborn.factorplot(x="alcconsumption_GRPS", y="suicideper100th_GRPS", data=sub2, kind="bar", ci=None) plt.xlabel('Levels of Drinking') plt.ylabel('Proportion suicide')
# I want to moderate for employrate and see whether it has an effect on the relationship between suicide and alcohol consumption # Convert employrate to low and high ()
print("Describe Employment Rate") desc1 = data["employrate"].describe() print(desc1)
sub2["employrate_GRPS"] = pd.cut(data.employrate, bins=[32, 58, 83], labels=["0", "1"]) sub2['employrate_GRPS'] = pd.to_numeric(sub2['employrate_GRPS'], errors='coerce')
print("Counts (Frequencies) for employrate") c3 = sub2["employrate_GRPS"].value_counts(sort = False, dropna = True) print(c3)
sub3=sub2[(sub2['employrate_GRPS']== 0)] sub4=sub2[(sub2['employrate_GRPS']== 1)]
print ('association between level of alcohol consumption and suicide rate for those countries with low employment rate') # contingency table of observed counts ct2=pd.crosstab(sub3['alcconsumption_GRPS'], sub3['suicideper100th_GRPS']) print (ct2)
# column percentages colsum=ct1.sum(axis=0) colpct=ct1/colsum print(colpct)
# chi-square print ('chi-square value, p value, expected counts') cs2= scipy.stats.chi2_contingency(ct2) print (cs2)
print ('association between level of alcohol consumption and suicide rate for those countries with high employment rate') # contingency table of observed counts ct2=pd.crosstab(sub4['alcconsumption_GRPS'], sub4['suicideper100th_GRPS']) print (ct2)
# column percentages colsum=ct1.sum(axis=0) colpct=ct1/colsum print(colpct)
# chi-square print ('chi-square value, p value, expected counts') cs2= scipy.stats.chi2_contingency(ct2) print (cs2)
# graph percent with alcohol consumption within each suicide frequency group seaborn.factorplot(x="alcconsumption_GRPS", y="suicideper100th_GRPS", data=sub3, kind="bar", ci=None) plt.xlabel('Levels of Drinking') plt.ylabel('Proportion suicide') plt.title('association between level of drinking and suicide rate for countries WITH low em,ployment rate')
# graph percent with alcohol consumption within each suicide frequency group seaborn.factorplot(x="alcconsumption_GRPS", y="suicideper100th_GRPS", data=sub4, kind="bar", ci=None) plt.xlabel('Levels of Drinking') plt.ylabel('Proportion suicide') plt.title('association between level of drinking and suicide rate for countries WITH high em,ployment rate')
Output
Counts (Frequencies) for Alcohol Consumption_GRPS low     155 high     32 Name: alcconsumption_GRPS, dtype: int64 Counts (Frequencies) for suicideper100th_GRPS 0     88 1    103 Name: suicideper100th_GRPS, dtype: int64 alcconsumption_GRPS   low  high suicideper100th_GRPS           0                      81     4 1                      72    28 suicideper100th_GRPS   0   1 alcconsumption_GRPS         low                   67  70 high                   4  25 suicideper100th_GRPS        0        1 alcconsumption_GRPS                   low                  0.943662 0.736842 high                 0.056338 0.263158 chi-square value, p value, expected counts (10.66289761300192, 0.0010930602193584638, 1, array([[ 58.59638554,  78.40361446],       [ 12.40361446,  16.59638554]])) Describe Employment Rate count   178.000000 mean     58.635955 std      10.519454 min      32.000000 25%      51.225000 50%      58.699999 75%      64.975000 max      83.199997 Name: employrate, dtype: float64 Counts (Frequencies) for employrate 0.000000    74 1.000000    90 Name: employrate_GRPS, dtype: int64 association between level of alcohol consumption and suicide rate for those countries with low employment rate suicideper100th_GRPS   0   1 alcconsumption_GRPS         low                   32  20 high                   2  20 suicideper100th_GRPS        0        1 alcconsumption_GRPS                   low                  0.943662 0.736842 high                 0.056338 0.263158 chi-square value, p value, expected counts (15.075911404771702, 0.00010327277630028857, 1, array([[ 23.89189189,  28.10810811],       [ 10.10810811,  11.89189189]])) association between level of alcohol consumption and suicide rate for those countries with high employment rate suicideper100th_GRPS   0   1 alcconsumption_GRPS         low                   35  49 high                   2   4 suicideper100th_GRPS        0        1 alcconsumption_GRPS                   low                  0.943662 0.736842 high                 0.056338 0.263158 chi-square value, p value, expected counts (0.0008195527063451476, 0.97716141526886258, 1, array([[ 34.53333333,  49.46666667],       [  2.46666667,   3.53333333]])) Out[64]: Text(0.5,1,'association between level of drinking and suicide rate for countries WITH high em,ployment rate')
Tumblr media Tumblr media Tumblr media
Comments
Does employment rate affect the relationship between alcohol consumption and suicide rate?
The relationship between alcohol consumption and suicide is significant for those countries with low employment rate, p < 0.001.
However, it is not significant for those with high employment rate p = 0.977
PEARSON CORRELATION
I chose to work with the gapminder dataset
Python Code
import pandas as pd import numpy import seaborn import scipy import matplotlib.pyplot as plt
data = pd.read_csv("gapminder.csv", low_memory=False, na_values = " ")
# Bug fix for display formats to avoid run time errors pd.set_option("display.float_format", lambda x: "%f"%x)
# setting variables I will be working with to numeric data['alcconsumption'] = pd.to_numeric(data['alcconsumption'], errors='coerce') data['suicideper100th'] = pd.to_numeric(data['suicideper100th'], errors='coerce') data['employrate'] = pd.to_numeric(data['employrate'], errors='coerce')
data_clean=data.dropna()
# Pearsson Correlation for association between incomeperperson and Suicide Rate print ('association between alcconsumption and suicideper100th') print (scipy.stats.pearsonr(data_clean['alcconsumption'], data_clean['suicideper100th']))
print("Describe Employment Rate") desc1 = data["employrate"].describe() print(desc1)
def employgrp (row):   if row['employrate'] <= 51.225:      return 1   elif row['employrate'] <= 58.699 :      return 2   elif row['employrate'] > 64.975:      return 3
data_clean['employgrp'] = data_clean.apply (lambda row: employgrp(row),axis=1)
chk1 = data_clean['employgrp'].value_counts(sort=False, dropna=False) print(chk1)
sub1=data_clean[(data_clean['employgrp']== 1)] sub2=data_clean[(data_clean['employgrp']== 2)] sub3=data_clean[(data_clean['employgrp']== 3)]
print ('association between alcconsumption and suicideper100th for LOW employrate countries') print (scipy.stats.pearsonr(sub1['alcconsumption'], sub1['suicideper100th'])) print ('       ') print ('association between urbanrate and internetuserate for MIDDLE employrate countries') print (scipy.stats.pearsonr(sub2['alcconsumption'], sub2['suicideper100th'])) print ('       ') print ('association between urbanrate and internetuserate for HIGH employrate countries') print (scipy.stats.pearsonr(sub3['alcconsumption'], sub3['suicideper100th']))
scat1 = seaborn.regplot(x="alcconsumption", y="suicideper100th", data=sub1) plt.xlabel('alcconsumption') plt.ylabel('suicideper100th') plt.title('Scatterplot for the Association Between alcohol consumption and suicide per 100th for LOW employrate countries') print (scat1) plt.show()
scat2 = seaborn.regplot(x="alcconsumption", y="suicideper100th", data=sub2) plt.xlabel('alcconsumption') plt.ylabel('suicideper100th') plt.title('Scatterplot for the Association Between alcohol consumption and suicide per 100th for MEDIUM employrate countries') print (scat2) plt.show()
scat1 = seaborn.regplot(x="alcconsumption", y="suicideper100th", data=sub3) plt.xlabel('alcconsumption') plt.ylabel('suicideper100th') plt.title('Scatterplot for the Association Between alcohol consumption and suicide per 100th for HIGH employrate countries') print (scat1) plt.show()
Output
association between alcconsumption and suicideper100th (0.45834250546091254, 0.00038178766966525383) Describe Employment Rate count   178.000000 mean     58.635955 std      10.519454 min      32.000000 25%      51.225000 50%      58.699999 75%      64.975000 max      83.199997 Name: employrate, dtype: float64 1.000000    14 2.000000    15 nan         20 3.000000     7 Name: employgrp, dtype: int64 association between alcconsumption and suicideper100th for LOW employrate countries (0.56573500213060901, 0.034974841840160878)
association between urbanrate and internetuserate for MIDDLE employrate countries (0.43336812208699116, 0.10658773388707161)
association between urbanrate and internetuserate for HIGH employrate countries (0.089398986518181067, 0.84883731927600881) __main__:37: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy AxesSubplot(0.125,0.125;0.775x0.755)
Tumblr media Tumblr media Tumblr media
Comments
Does employment rate moderate the relationship between alcohol consumption and suicide rate?
For low employment countries, the relationship is significant, p = 0.035
For medium and high employment countries, the relationship is not significant p = 0.107 & p = 0.85 respectively
0 notes
pedro-couto-blr-blog · 8 years ago
Text
Regression Modelling in Practice - Week 3 Assignment E
Final Output
OLS Regression Results                             ============================================================================== Dep. Variable:        incomeperperson   R-squared:                       0.800 Model:                            OLS   Adj. R-squared:                  0.795 Method:                 Least Squares   F-statistic:                     148.4 Date:                Sun, 05 Mar 2017   Prob (F-statistic):           9.70e-51 Time:                        22:54:36   Log-Likelihood:                -1508.8 No. Observations:                 153   AIC:                             3028. Df Residuals:                     148   BIC:                             3043. Df Model:                           4                                         Covariance Type:            nonrobust                                         =============================================================================================                               coef    std err          t      P>|t|      [95.0% Conf. Int.] --------------------------------------------------------------------------------------------- Intercept                  2257.0700    619.225      3.645      0.000      1033.405  3480.735 internetuserate_c           105.5975     29.360      3.597      0.000        47.579   163.617 I(internetuserate_c ** 2)     6.4900      0.626     10.367      0.000         5.253     7.727 urbanrate_c                  88.3966     25.135      3.517      0.001        38.726   138.067 lifeexpectancy_c            183.6493     69.398      2.646      0.009        46.509   320.789 ============================================================================== Omnibus:                       34.477   Durbin-Watson:                   2.198 Prob(Omnibus):                  0.000   Jarque-Bera (JB):              110.667 Skew:                           0.812   Prob(JB):                     9.31e-25 Kurtosis:                       6.837   Cond. No.                     1.81e+03 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 1.81e+03. This might indicate that there are strong multicollinearity or other numerical problems.
Complete Code
import numpy import pandas import scipy.stats import seaborn import matplotlib.pyplot as plt import statsmodels.formula.api as smf import statsmodels.api as sm
#Format errors pandas.set_option('display.float_format', lambda x:'%f'%x)
#Remove data set limitations pandas.set_option('display.max_columns', None) pandas.set_option('display.max_rows', None)
#gapminder csv data = pandas.read_csv('/Users/carol_novo/Desktop/Data_Analysis/Data_Management/Python/gapminder.csv', low_memory=False)
#Lower case columns names data.columns = map(str.lower, data.columns)
#Force data conversion to numeric data["internetuserate"] = data["internetuserate"].convert_objects(convert_numeric=True) data["incomeperperson"] = data["incomeperperson"].convert_objects(convert_numeric=True) data["armedforcesrate"] = data["armedforcesrate"].convert_objects(convert_numeric=True) data["co2emissions"] = data["co2emissions"].convert_objects(convert_numeric=True) data["femaleemployrate"] = data["femaleemployrate"].convert_objects(convert_numeric=True) data["urbanrate"] = data["urbanrate"].convert_objects(convert_numeric=True) data["lifeexpectancy"] = data["lifeexpectancy"].convert_objects(convert_numeric=True)
#Removing outliers #data=data[(data['incomeperperson']<=60000)] data=data[(data['co2emissions']<=3.0e11)]
#Clean the dataset data=data.dropna()
#Centering the explanatory variable data['internetuserate_c'] = (data['internetuserate'] - data['internetuserate'].mean()) data['armedforcesrate_c'] = (data['armedforcesrate'] - data['armedforcesrate'].mean()) data['co2emissions_c'] = (data['co2emissions'] - data['co2emissions'].mean()) data['femaleemployrate_c'] = (data['femaleemployrate'] - data['femaleemployrate'].mean()) data['urbanrate_c'] = (data['urbanrate'] - data['urbanrate'].mean()) data['lifeexpectancy_c'] = (data['lifeexpectancy'] - data['lifeexpectancy'].mean())
#Linear regression reg = smf.ols('incomeperperson ~ internetuserate_c + armedforcesrate_c + co2emissions_c + femaleemployrate_c + urbanrate_c + lifeexpectancy_c', data=data).fit() print (reg.summary())
#Linear regression reg = smf.ols('incomeperperson ~ internetuserate_c', data=data).fit() print (reg.summary())
reg = smf.ols('incomeperperson ~ femaleemployrate_c', data=data).fit() print (reg.summary())
reg = smf.ols('incomeperperson ~ urbanrate_c', data=data).fit() print (reg.summary())
reg = smf.ols('incomeperperson ~ internetuserate_c + urbanrate_c', data=data).fit() print (reg.summary())
reg = smf.ols('incomeperperson ~ internetuserate_c + femaleemployrate_c + urbanrate_c', data=data).fit() print (reg.summary())
#Linear regression reg = smf.ols('incomeperperson ~ internetuserate_c + I(internetuserate_c**2)', data=data).fit() print (reg.summary())
reg = smf.ols('incomeperperson ~ urbanrate_c + I(urbanrate_c**2)', data=data).fit() print (reg.summary())
#Linear regression reg = smf.ols('incomeperperson ~ internetuserate_c + I(internetuserate_c**2) + armedforcesrate_c + co2emissions_c + femaleemployrate_c + urbanrate_c + I(urbanrate_c**2) + lifeexpectancy_c', data=data).fit() print (reg.summary())
#Linear regression reg = smf.ols('incomeperperson ~ internetuserate_c + I(internetuserate_c**2) + armedforcesrate_c + co2emissions_c + femaleemployrate_c + urbanrate_c + lifeexpectancy_c', data=data).fit() print (reg.summary())
reg = smf.ols('incomeperperson ~ lifeexpectancy_c', data=data).fit() print (reg.summary())
#Linear regression reg = smf.ols('incomeperperson ~ internetuserate_c + I(internetuserate_c**2) + urbanrate_c', data=data).fit() print (reg.summary())
#Linear regression reg2 = smf.ols('incomeperperson ~ internetuserate_c + I(internetuserate_c**2) + urbanrate_c + lifeexpectancy_c', data=data).fit() print (reg2.summary())
#q-q plot #fig1 = sm.qqplot(reg2.resid, line='r')
#Standard residuals #stdres = pandas.DataFrame(reg.resid_pearson) #fig2 = plt.plot(stdres, 'o', ls='None') #l = plt.axhline(y=0,color='r')
#fig3 = sm.graphics.plot_regress_exog(reg2, "internetuserate_c", fig=plt.figure()) #fig3 = sm.graphics.plot_regress_exog(reg2, "urbanrate_c", fig=plt.figure()) #fig3 = sm.graphics.plot_regress_exog(reg2, "lifeexpectancy_c", fig=plt.figure())
fig4 = sm.graphics.influence_plot(reg2,size=8) print(fig4)
Complete Output
OLS Regression Results                             ============================================================================== Dep. Variable:        incomeperperson   R-squared:                       0.674 Model:                            OLS   Adj. R-squared:                  0.660 Method:                 Least Squares   F-statistic:                     50.28 Date:                Sun, 05 Mar 2017   Prob (F-statistic):           3.82e-33 Time:                        22:54:36   Log-Likelihood:                -1546.3 No. Observations:                 153   AIC:                             3107. Df Residuals:                     146   BIC:                             3128. Df Model:                           6                                         Covariance Type:            nonrobust                                         ======================================================================================                         coef    std err          t      P>|t|      [95.0% Conf. Int.] -------------------------------------------------------------------------------------- Intercept           7314.7640    490.815     14.903      0.000      6344.745  8284.783 internetuserate_c    267.2068     31.679      8.435      0.000       204.598   329.816 armedforcesrate_c    153.8284    352.855      0.436      0.664      -543.535   851.192 co2emissions_c      4.374e-08   4.27e-08      1.026      0.307     -4.06e-08  1.28e-07 femaleemployrate_c   100.3340     38.488      2.607      0.010        24.268   176.400 urbanrate_c           73.7339     33.636      2.192      0.030         7.258   140.209 lifeexpectancy_c     -24.4754     87.947     -0.278      0.781      -198.290   149.339 ============================================================================== Omnibus:                       35.385   Durbin-Watson:                   2.386 Prob(Omnibus):                  0.000   Jarque-Bera (JB):               94.933 Skew:                           0.908   Prob(JB):                     2.43e-21 Kurtosis:                       6.405   Cond. No.                     1.21e+10 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 1.21e+10. This might indicate that there are strong multicollinearity or other numerical problems.                            OLS Regression Results                             ============================================================================== Dep. Variable:        incomeperperson   R-squared:                       0.650 Model:                            OLS   Adj. R-squared:                  0.648 Method:                 Least Squares   F-statistic:                     280.5 Date:                Sun, 05 Mar 2017   Prob (F-statistic):           3.01e-36 Time:                        22:54:36   Log-Likelihood:                -1551.7 No. Observations:                 153   AIC:                             3107. Df Residuals:                     151   BIC:                             3114. Df Model:                           1                                         Covariance Type:            nonrobust                                         =====================================================================================                        coef    std err          t      P>|t|      [95.0% Conf. Int.] ------------------------------------------------------------------------------------- Intercept          7314.7640    499.968     14.630      0.000      6326.928  8302.600 internetuserate_c   299.9326     17.910     16.747      0.000       264.547   335.319 ============================================================================== Omnibus:                       26.923   Durbin-Watson:                   2.515 Prob(Omnibus):                  0.000   Jarque-Bera (JB):               60.502 Skew:                           0.740   Prob(JB):                     7.28e-14 Kurtosis:                       5.702   Cond. No.                         27.9 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.                            OLS Regression Results                             ============================================================================== Dep. Variable:        incomeperperson   R-squared:                       0.000 Model:                            OLS   Adj. R-squared:                 -0.007 Method:                 Least Squares   F-statistic:                   0.01802 Date:                Sun, 05 Mar 2017   Prob (F-statistic):              0.893 Time:                        22:54:36   Log-Likelihood:                -1632.1 No. Observations:                 153   AIC:                             3268. Df Residuals:                     151   BIC:                             3274. Df Model:                           1                                         Covariance Type:            nonrobust                                         ======================================================================================                         coef    std err          t      P>|t|      [95.0% Conf. Int.] -------------------------------------------------------------------------------------- Intercept           7314.7640    845.079      8.656      0.000      5645.057  8984.471 femaleemployrate_c     7.6083     56.672      0.134      0.893      -104.365   119.582 ============================================================================== Omnibus:                       64.884   Durbin-Watson:                   1.838 Prob(Omnibus):                  0.000   Jarque-Bera (JB):              150.022 Skew:                           1.893   Prob(JB):                     2.65e-33 Kurtosis:                       6.034   Cond. No.                         14.9 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.                            OLS Regression Results                             ============================================================================== Dep. Variable:        incomeperperson   R-squared:                       0.356 Model:                            OLS   Adj. R-squared:                  0.352 Method:                 Least Squares   F-statistic:                     83.40 Date:                Sun, 05 Mar 2017   Prob (F-statistic):           4.09e-16 Time:                        22:54:36   Log-Likelihood:                -1598.4 No. Observations:                 153   AIC:                             3201. Df Residuals:                     151   BIC:                             3207. Df Model:                           1                                         Covariance Type:            nonrobust                                         ===============================================================================                  coef    std err          t      P>|t|      [95.0% Conf. Int.] ------------------------------------------------------------------------------- Intercept    7314.7640    678.313     10.784      0.000      5974.554  8654.974 urbanrate_c   281.6570     30.841      9.133      0.000       220.721   342.593 ============================================================================== Omnibus:                       57.158   Durbin-Watson:                   2.190 Prob(Omnibus):                  0.000   Jarque-Bera (JB):              138.162 Skew:                           1.591   Prob(JB):                     9.97e-31 Kurtosis:                       6.399   Cond. No.                         22.0 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.                            OLS Regression Results                             ============================================================================== Dep. Variable:        incomeperperson   R-squared:                       0.654 Model:                            OLS   Adj. R-squared:                  0.649 Method:                 Least Squares   F-statistic:                     141.6 Date:                Sun, 05 Mar 2017   Prob (F-statistic):           2.84e-35 Time:                        22:54:36   Log-Likelihood:                -1550.9 No. Observations:                 153   AIC:                             3108. Df Residuals:                     150   BIC:                             3117. Df Model:                           2                                         Covariance Type:            nonrobust                                         =====================================================================================                        coef    std err          t      P>|t|      [95.0% Conf. Int.] ------------------------------------------------------------------------------------- Intercept          7314.7640    498.953     14.660      0.000      6328.880  8300.648 internetuserate_c   278.5970     24.522     11.361      0.000       230.143   327.050 urbanrate_c          39.5535     31.125      1.271      0.206       -21.947   101.054 ============================================================================== Omnibus:                       29.993   Durbin-Watson:                   2.524 Prob(Omnibus):                  0.000   Jarque-Bera (JB):               69.057 Skew:                           0.819   Prob(JB):                     1.01e-15 Kurtosis:                       5.855   Cond. No.                         32.8 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.                            OLS Regression Results                             ============================================================================== Dep. Variable:        incomeperperson   R-squared:                       0.671 Model:                            OLS   Adj. R-squared:                  0.665 Method:                 Least Squares   F-statistic:                     101.4 Date:                Sun, 05 Mar 2017   Prob (F-statistic):           8.12e-36 Time:                        22:54:36   Log-Likelihood:                -1547.0 No. Observations:                 153   AIC:                             3102. Df Residuals:                     149   BIC:                             3114. Df Model:                           3                                         Covariance Type:            nonrobust                                         ======================================================================================                         coef    std err          t      P>|t|      [95.0% Conf. Int.] -------------------------------------------------------------------------------------- Intercept           7314.7640    487.794     14.996      0.000      6350.877 ��8278.651 internetuserate_c    266.0643     24.383     10.912      0.000       217.884   314.245 femaleemployrate_c    99.8275     35.424      2.818      0.005        29.829   169.826 urbanrate_c           73.6839     32.751      2.250      0.026         8.968   138.400 ============================================================================== Omnibus:                       33.811   Durbin-Watson:                   2.406 Prob(Omnibus):                  0.000   Jarque-Bera (JB):               86.434 Skew:                           0.884   Prob(JB):                     1.70e-19 Kurtosis:                       6.230   Cond. No.                         33.0 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.                            OLS Regression Results                             ============================================================================== Dep. Variable:        incomeperperson   R-squared:                       0.768 Model:                            OLS   Adj. R-squared:                  0.765 Method:                 Least Squares   F-statistic:                     247.8 Date:                Sun, 05 Mar 2017   Prob (F-statistic):           2.86e-48 Time:                        22:54:36   Log-Likelihood:                -1520.4 No. Observations:                 153   AIC:                             3047. Df Residuals:                     150   BIC:                             3056. Df Model:                           2                                         Covariance Type:            nonrobust                                         =============================================================================================                                coef    std err          t      P>|t|      [95.0% Conf. Int.] --------------------------------------------------------------------------------------------- Intercept                  3197.6448    624.638      5.119      0.000      1963.419  4431.871 internetuserate_c           220.4029     17.251     12.776      0.000       186.316   254.490 I(internetuserate_c ** 2)     5.2831      0.606      8.716      0.000         4.085     6.481 ============================================================================== Omnibus:                       25.313   Durbin-Watson:                   2.216 Prob(Omnibus):                  0.000   Jarque-Bera (JB):               78.932 Skew:                           0.563   Prob(JB):                     7.25e-18 Kurtosis:                       6.334   Cond. No.                     1.70e+03 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 1.7e+03. This might indicate that there are strong multicollinearity or other numerical problems.                            OLS Regression Results                             ============================================================================== Dep. Variable:        incomeperperson   R-squared:                       0.411 Model:                            OLS   Adj. R-squared:                  0.403 Method:                 Least Squares   F-statistic:                     52.28 Date:                Sun, 05 Mar 2017   Prob (F-statistic):           5.94e-18 Time:                        22:54:36   Log-Likelihood:                -1591.6 No. Observations:                 153   AIC:                             3189. Df Residuals:                     150   BIC:                             3198. Df Model:                           2                                         Covariance Type:            nonrobust                                         =======================================================================================                          coef    std err          t      P>|t|      [95.0% Conf. Int.] --------------------------------------------------------------------------------------- Intercept            5015.2143    895.524      5.600      0.000      3245.745  6784.684 urbanrate_c           302.5706     30.119     10.046      0.000       243.058   362.083 I(urbanrate_c ** 2)     4.7538      1.271      3.739      0.000         2.242     7.266 ============================================================================== Omnibus:                       66.361   Durbin-Watson:                   2.128 Prob(Omnibus):                  0.000   Jarque-Bera (JB):              213.352 Skew:                           1.707   Prob(JB):                     4.69e-47 Kurtosis:                       7.671   Cond. No.                         978. ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.                            OLS Regression Results                             ============================================================================== Dep. Variable:        incomeperperson   R-squared:                       0.810 Model:                            OLS   Adj. R-squared:                  0.799 Method:                 Least Squares   F-statistic:                     76.74 Date:                Sun, 05 Mar 2017   Prob (F-statistic):           4.26e-48 Time:                        22:54:36   Log-Likelihood:                -1505.0 No. Observations:                 153   AIC:                             3028. Df Residuals:                     144   BIC:                             3055. Df Model:                           8                                         Covariance Type:            nonrobust                                         =============================================================================================                                coef    std err          t      P>|t|      [95.0% Conf. Int.] --------------------------------------------------------------------------------------------- Intercept                  1781.1438    686.353      2.595      0.010       424.516  3137.772 internetuserate_c           105.9222     29.498      3.591      0.000        47.617   164.228 I(internetuserate_c ** 2)     6.0576      0.653      9.282      0.000         4.768     7.348 armedforcesrate_c           101.0620    279.771      0.361      0.718      -451.927   654.051 co2emissions_c              3.96e-08   3.28e-08      1.208      0.229     -2.52e-08  1.04e-07 femaleemployrate_c           15.3682     31.671      0.485      0.628       -47.232    77.968 urbanrate_c                  99.2216     26.057      3.808      0.000        47.718   150.725 I(urbanrate_c ** 2)           1.6805      0.813      2.066      0.041         0.073     3.288 lifeexpectancy_c            176.8532     70.725      2.501      0.014        37.061   316.646 ============================================================================== Omnibus:                       43.451   Durbin-Watson:                   2.128 Prob(Omnibus):                  0.000   Jarque-Bera (JB):              149.642 Skew:                           1.027   Prob(JB):                     3.20e-33 Kurtosis:                       7.388   Cond. No.                     2.20e+10 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 2.2e+10. This might indicate that there are strong multicollinearity or other numerical problems.                            OLS Regression Results                             ============================================================================== Dep. Variable:        incomeperperson   R-squared:                       0.804 Model:                            OLS   Adj. R-squared:                  0.795 Method:                 Least Squares   F-statistic:                     85.17 Date:                Sun, 05 Mar 2017   Prob (F-statistic):           3.56e-48 Time:                        22:54:36   Log-Likelihood:                -1507.2 No. Observations:                 153   AIC:                             3030. Df Residuals:                     145   BIC:                             3055. Df Model:                           7                                         Covariance Type:            nonrobust                                         =============================================================================================                                coef    std err          t      P>|t|      [95.0% Conf. Int.] --------------------------------------------------------------------------------------------- Intercept                  2371.6534    631.003      3.759      0.000      1124.501  3618.805 internetuserate_c           102.4478     29.780      3.440      0.001        43.588   161.307 I(internetuserate_c ** 2)     6.3430      0.645      9.834      0.000         5.068     7.618 armedforcesrate_c           241.8809    274.383      0.882      0.379      -300.425   784.187 co2emissions_c             3.872e-08   3.32e-08      1.168      0.245     -2.68e-08  1.04e-07 femaleemployrate_c           34.2900     30.657      1.118      0.265       -26.303    94.883 urbanrate_c                  93.9412     26.222      3.583      0.000        42.114   145.768 lifeexpectancy_c            181.3226     71.484      2.537      0.012        40.038   322.607 ============================================================================== Omnibus:                       37.123   Durbin-Watson:                   2.185 Prob(Omnibus):                  0.000   Jarque-Bera (JB):              119.950 Skew:                           0.882   Prob(JB):                     8.98e-27 Kurtosis:                       6.963   Cond. No.                     2.00e+10 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large,  2e+10. This might indicate that there are strong multicollinearity or other numerical problems.                            OLS Regression Results                             ============================================================================== Dep. Variable:        incomeperperson   R-squared:                       0.369 Model:                            OLS   Adj. R-squared:                  0.365 Method:                 Least Squares   F-statistic:                     88.27 Date:                Sun, 05 Mar 2017   Prob (F-statistic):           8.50e-17 Time:                        22:54:36   Log-Likelihood:                -1596.8 No. Observations:                 153   AIC:                             3198. Df Residuals:                     151   BIC:                             3204. Df Model:                           1                                         Covariance Type:            nonrobust                                         ====================================================================================                       coef    std err          t      P>|t|      [95.0% Conf. Int.] ------------------------------------------------------------------------------------ Intercept         7314.7640    671.373     10.895      0.000      5988.265  8641.263 lifeexpectancy_c   653.7658     69.583      9.395      0.000       516.283   791.249 ============================================================================== Omnibus:                       51.449   Durbin-Watson:                   2.142 Prob(Omnibus):                  0.000   Jarque-Bera (JB):              108.214 Skew:                           1.501   Prob(JB):                     3.17e-24 Kurtosis:                       5.822   Cond. No.                         9.65 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.                            OLS Regression Results                             ============================================================================== Dep. Variable:        incomeperperson   R-squared:                       0.791 Model:                            OLS   Adj. R-squared:                  0.787 Method:                 Least Squares   F-statistic:                     188.0 Date:                Sun, 05 Mar 2017   Prob (F-statistic):           1.97e-50 Time:                        22:54:36   Log-Likelihood:                -1512.3 No. Observations:                 153   AIC:                             3033. Df Residuals:                     149   BIC:                             3045. Df Model:                           3                                         Covariance Type:            nonrobust                                         =============================================================================================                                coef    std err          t      P>|t|      [95.0% Conf. Int.] --------------------------------------------------------------------------------------------- Intercept                  2719.1890    605.941      4.488      0.000      1521.841  3916.537 internetuserate_c           156.0236     22.782      6.849      0.000       111.007   201.041 I(internetuserate_c ** 2)     5.8970      0.596      9.891      0.000         4.719     7.075 urbanrate_c                 102.2168     25.077      4.076      0.000        52.664   151.770 ============================================================================== Omnibus:                       32.888   Durbin-Watson:                   2.255 Prob(Omnibus):                  0.000   Jarque-Bera (JB):              110.159 Skew:                           0.752   Prob(JB):                     1.20e-24 Kurtosis:                       6.876   Cond. No.                     1.73e+03 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 1.73e+03. This might indicate that there are strong multicollinearity or other numerical problems.                            OLS Regression Results                             ============================================================================== Dep. Variable:        incomeperperson   R-squared:                       0.800 Model:                            OLS   Adj. R-squared:                  0.795 Method:                 Least Squares   F-statistic:                     148.4 Date:                Sun, 05 Mar 2017   Prob (F-statistic):           9.70e-51 Time:                        22:54:36   Log-Likelihood:                -1508.8 No. Observations:                 153   AIC:                             3028. Df Residuals:                     148   BIC:                             3043. Df Model:                           4                                         Covariance Type:            nonrobust                                         =============================================================================================                                coef    std err          t      P>|t|      [95.0% Conf. Int.] --------------------------------------------------------------------------------------------- Intercept                  2257.0700    619.225      3.645      0.000      1033.405  3480.735 internetuserate_c           105.5975     29.360      3.597      0.000        47.579   163.617 I(internetuserate_c ** 2)     6.4900      0.626     10.367      0.000         5.253     7.727 urbanrate_c                  88.3966     25.135      3.517      0.001        38.726   138.067 lifeexpectancy_c            183.6493     69.398      2.646      0.009        46.509   320.789 ============================================================================== Omnibus:                       34.477   Durbin-Watson:                   2.198 Prob(Omnibus):                  0.000   Jarque-Bera (JB):              110.667 Skew:                           0.812   Prob(JB):                     9.31e-25 Kurtosis:                       6.837   Cond. No.                     1.81e+03 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 1.81e+03. This might indicate that there are strong multicollinearity or other numerical problems.
0 notes