Don't wanna be here? Send us removal request.
Text
Course 3 Week 4 Homework – Logistic Regression
The Null hypothesis for this assignment is:
There is NO relationship between the whether or not a person votes (binary categorical explanatory variable) and participates in professional or community groups in (binary categorical response variable). And also that the impact of being ethnically black as a binary categorical explanatory variable is not associated with participation in these types of groups.
Results of just relating Voting status to whether a person participates in these groups:
· p value of .0046 – statistically significant relationship
· Odds Ratio is 1.96, indicating that the probability of Participating in groups increases 1.96 times among those who Voted versus those who did Not Vote. It is predicted (in 95 samples of this population out of 100 samples) that the increase would be between 1.23 and 3.13 times.
Analysis of Maximum Likelihood Estimates
Parameter
DF
Estimate
Standard Error
Wald Chi-Square
Pr > ChiSq
Intercept
1
-1.2217
0.2150
32.2765
<.0001
VOTED
1
0.6744
0.2381
8.0230
0.0046
Odds Ratio Estimates
Effect
Point Estimate
95% Wald Confidence Limits
VOTED
1.963
1.231
3.130
Parameter
Estimate
Standard Error
t Value
Pr > |t|
Intercept
0.3414634146
0.08548164
3.99
<.0001
VOTED
0.2944589155
0.09740946
3.02
0.0026
After Ethnicity-black was added to the analysis:
Voting status p value remained < .05 (result = .0091). This indicates that Voting is still significantly associated with Participating in groups and Ethnicity-black is NOT a confounding variable.
As also shown in the results below, p value of Ethnicity-black is .032, indicating significant association with Participation in groups.
Odds Ratios for both Voting status and Ethnicity-black are both > 1, so probability of Participating in groups increases with these factors. Specifically:
· Probability of Participating in groups increases 1.87 times among those who Voted versus those who did Not Vote, after controlling for Ethnicity-Black status. It is predicted (in 95 samples of this population out of 100 samples) that the increase would be between 1.17 and 2.29 times.
· Probability of Participating in groups increases 1.50 times among those who are Ethnically-Black versus those who are not, after controlling for Voting status. It is predicted (in 95 samples of this population out of 100 samples) that the increase would be between 1.04 and 2.17 times.
THEREFORE: we can ACCEPT the hypothesis that Voting and being Ethnically black are associated with increasing whether one participates in professional or community groups.
The LOGISTIC Procedure
Model Information
Data Set
WORK.NEW
Response Variable
PARTICIPATED
Number of Response Levels
2
Model
binary logit
Optimization Technique
Fisher's scoring
Observations Summary
Number of Observations Read
553
Number of Observations Used
535
Response Profile
Response Profile
Ordered Value
PARTICIPATED
Total Frequency
1
1
179
2
0
356
Probability modeled is PARTICIPATED=1.
Note:18 observations were deleted due to missing values for the response or explanatory variables.
Convergence Status
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Fit Statistics
Model Fit Statistics
Criterion
Intercept Only
Intercept and Covariates
AIC
683.991
674.740
SC
688.273
687.587
-2 Log L
681.991
668.740
Global Tests
Testing Global Null Hypothesis: BETA=0
Test
Chi-Square
DF
Pr > ChiSq
Likelihood Ratio
13.2505
2
0.0013
Score
12.7559
2
0.0017
Wald
12.4411
2
0.0020
Parameter Estimates
Analysis of Maximum Likelihood Estimates
Parameter
DF
Estimate
Standard Error
Wald Chi-Square
Pr > ChiSq
Intercept
1
-1.4065
0.2343
36.0362
<.0001
VOTED
1
0.6255
0.2399
6.7986
0.0091
ETHNICBLK
1
0.4036
0.1882
4.6005
0.0320
Odds Ratios
Odds Ratio Estimates
Effect
Point Estimate
95% Wald Confidence Limits
VOTED
1.869
1.168
2.991
ETHNICBLK
1.497
1.035
2.165
Association Statistics
Association of Predicted Probabilities and Observed Responses
Percent Concordant
41.4
Somers' D
0.163
Percent Discordant
25.1
Gamma
0.244
Percent Tied
33.4
Tau-a
0.073
Pairs
63724
c
0.581
0 notes
Text
Course 3 Week 3 Homework – Multiple Regression:Summary
The Null hypothesis for this assignment is:
There is NO relationship between the whether or not a person votes (binary categorical explanatory variable) and the number of professional or community groups that person participates in (quantitative response variable) – and will be examining whether being ethnically black is a confounding binary categorical explanatory variable.
Original results of just relating Voting status to number of groups a person is in: p value of .0026 & linear equation (as shown by Estimates) of NBR_GROUPS_IN = .29*VOTED + .34
Parameter
Estimate
Standard Error
t Value
Pr > |t|
Intercept
0.3414634146
0.08548164
3.99
<.0001
VOTED
0.2944589155
0.09740946
3.02
0.0026
After Ethnicity-black was added to the analysis:
Voting status p value remained > .05 (result = .0084). This indicates that Voting is still significantly associated with Number of groups and Ethnicity-black is NOT a confounding variable. The Coefficient is .26
As also shown in the results below, p value of Ethnicity-black has a p value of .0007, indicating significant association with Number of groups a person is in – and its co-efficient is .28. THEREFORE: we can ACCEPT the hypothesis that Voting and being Ethnically black are associated with the Number of groups one participates in.
The linear equation for this is: NBR_GROUPS_IN = .26*VOTED + .28*ETHNICBLK + .22.
It is noted that the R Squared value is only .04, meaning these variables only explain about 4% of the result – so there are likely missing variables from this model.
· I tried substituting Gender and also Have Children for ETHNICBLK. These both showed p value > .05 and did not cause VOTED p value to increase beyond .05, so these variables are neither significant to this model nor confounders.
· Comment on other complexities with this data will be discussed in blog entry on regression diagnostic plots.
The GLM Procedure
Number of Observations
Number of Observations Read
553
Number of Observations Used
535
Dependent Variable: NBR_GROUPS_IN
Source
DF
Sum of Squares
Mean Square
F Value
Pr > F
Model
2
18.4819468
9.2409734
10.49
<.0001
Error
532
468.7778663
0.8811614
Corrected Total
534
487.2598131
Fit Statistics
R-Square
Coeff Var
Root MSE
NBR_GROUPS_IN Mean
0.037930
165.1992
0.938702
0.568224
Type I Model ANOVA
Source
DF
Type I SS
Mean Square
F Value
Pr > F
VOTED
1
8.21292698
8.21292698
9.32
0.0024
ETHNICBLK
1
10.26901980
10.26901980
11.65
0.0007
Type III Model ANOVA
Source
DF
Type III SS
Mean Square
F Value
Pr > F
VOTED
1
6.16684962
6.16684962
7.00
0.0084
ETHNICBLK
1
10.26901980
10.26901980
11.65
0.0007
Solution
Parameter
Estimate
Standard Error
t Value
Pr > |t|
95% Confidence Limits
Intercept
0.2210017570
0.09170099
2.41
0.0163
0.0408613010
0.4011422131
VOTED
0.2568187849
0.09707845
2.65
0.0084
0.0661146584
0.4475229115
ETHNICBLK
0.2795619602
0.08189197
3.41
0.0007
0.1186906628
0.4404332575
0 notes
Text
Course 3 Week 3 Homework – Multiple Regression: Summary
The Null hypothesis for this assignment is:
There is NO relationship between the whether or not a person votes (binary categorical explanatory variable) and the number of professional or community groups that person participates in (quantitative response variable) – and will be examining whether being ethnically black is a confounding binary categorical explanatory variable.
Original results of just relating Voting status to number of groups a person is in: p value of .0026 & linear equation (as shown by Estimates) of NBR_GROUPS_IN = .29*VOTED + .34
Parameter
Estimate
Standard Error
t Value
Pr > |t|
Intercept
0.3414634146
0.08548164
3.99
<.0001
VOTED
0.2944589155
0.09740946
3.02
0.0026
After Ethnicity-black was added to the analysis:
Voting status p value remained > .05 (result = .0084). This indicates that Voting is still significantly associated with Number of groups and Ethnicity-black is NOT a confounding variable. The Coefficient is .26
As also shown in the results below, p value of Ethnicity-black has a p value of .0007, indicating significant association with Number of groups a person is in – and its co-efficient is .28.
The linear equation for this is: NBR_GROUPS_IN = .26*VOTED + .28*ETHNICBLK + .22.
It is noted that the R Squared value is only .04, meaning these variables only explain about 4% of the result – so there are likely missing variables from this model.
· I tried substituting Gender and also Have Children for ETHNICBLK. These both showed p value > .05 and did not cause VOTED p value to increase beyond .05, so these variables are neither significant to this model nor confounders.
· Comment on other complexities with this data will be discussed in blog entry on regression diagnostic plots.
The GLM Procedure
Number of Observations
Number of Observations Read
553
Number of Observations Used
535
Dependent Variable: NBR_GROUPS_IN
Source
DF
Sum of Squares
Mean Square
F Value
Pr > F
Model
2
18.4819468
9.2409734
10.49
<.0001
Error
532
468.7778663
0.8811614
Corrected Total
534
487.2598131
Fit Statistics
R-Square
Coeff Var
Root MSE
NBR_GROUPS_IN Mean
0.037930
165.1992
0.938702
0.568224
Type I Model ANOVA
Source
DF
Type I SS
Mean Square
F Value
Pr > F
VOTED
1
8.21292698
8.21292698
9.32
0.0024
ETHNICBLK
1
10.26901980
10.26901980
11.65
0.0007
Type III Model ANOVA
Source
DF
Type III SS
Mean Square
F Value
Pr > F
VOTED
1
6.16684962
6.16684962
7.00
0.0084
ETHNICBLK
1
10.26901980
10.26901980
11.65
0.0007
Solution
Parameter
Estimate
Standard Error
t Value
Pr > |t|
95% Confidence Limits
Intercept
0.2210017570
0.09170099
2.41
0.0163
0.0408613010
0.4011422131
VOTED
0.2568187849
0.09707845
2.65
0.0084
0.0661146584
0.4475229115
ETHNICBLK
0.2795619602
0.08189197
3.41
0.0007
0.1186906628
0.4404332575
0 notes
Photo
COURSE3 WEEK3 MULTIPLE REGRESSION PLOTS:
QQ Plot: My understanding of this chart is that it essentially diagrams the values into quartiles. With 67% of the data value = 0, it makes sense that there is such a large amount to the left of this plot since the data is heavily skewed right. The other data values (1-5) have a more normal distribution.
Standard residuals plot: Evidence of a challenged model - there are 5 outliers above 3; 3% have a value above absolute 2.5; 6% have a value above absolute 2.
Residual plot for response variable of VOTED: doesn't appear to be meaningful, for this binary categorical variable
Leverage plot: There are several outliers. However they do not have Leverage. Nor are there any non-outliers with Leverage.
0 notes
Text
Course 3 Week 2 Homework – Basics of Linear Regression - Code and SAS results
LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;
DATA new; SET mydata.oll_pds (KEEP = CASEID W2_CASEID2 W1_L1_A W1_L1_C W1_L2_1 W1_L2_2 W1_L2_3 W1_L2_5 W1_L3 W2_QB1A W2_QB3 PPAGE PPAGECAT PPETHM W1_D1 W1_N1A W1_P17 W1_P17A PPHHSIZE PPGENDER); LABEL W1_L1_A="Participation in Professional Association in last 12 months" W1_L1_C="Participation in Cultural Organization in last 12 months" W1_L2_1="Participation in NAACP in last 12 months" W1_L2_2="Participation in National Urban League in last 12 months" W1_L2_3="Participation in Southern Christian Leadership Conference in last 12 months" W1_L2_5="Participation in Occupy Wall Street Movement in last 12 months" W1_L3="Participation in community group in last 12 months" W2_QB1A="Participation in election on Nov 6 2012" W2_QB3="Regularity of Voting" PPAGE="Age" PPAGECAT="Age Category" PPETHM="Ethnic Category" W1_D1 = "Rating of Barack Obama" W1_P17 = "Have Children" W1_P17A = "Number of children" PPGENDER = "Gender" ; /* used to view original data and validate accuracy of later program steps*/ /*proc print;*/ /*Data management - coding out missing data for those variables where need no other form of reassignment*/ If W2_QB3 = -1 then W2_QB3 = .; /* -1 = missing*/ If W1_D1 = -1 OR W1_D1 = 998 then W1_D1 = .; /*-1 = missing and 998 = refused*/ If W1_P17A = -1 then W1_P17A = .; /*-1 = missing*/ If W1_B2 = -1 then W1_B2 = .; /*-1 = missing*/ /*Data management - Create secondary variable to aggregate voting participation to show only whether did vote or not or missing. Set missing and not sure code values to missing. */ If W2_QB1A >= 2 AND W2_QB1A <= 5 then /*2 through 5 describe different methods of voting*/ VOTED = 1;/* transformed to 1 = yes I voted*/ ELSE IF W2_QB1A = -1 OR W2_QB1A = 6 then /*-1 = missing, 6 = not sure*/ W2_QB1A = .; /*transformed to one value for Missing*/ ELSE IF W2_QB1A = 1 then /* 1 = Did not vote*/ VOTED = 0; /*transformed to zero, the code that usually means "no"*/ If W1_P17 = -1 then W1_P17 = .;/*-1 = missing*/ ELSE IF W1_P17 = 2 then /* 2 = No*/ W1_P17 = 0; /*transformed to zero, the code that usually means "no"*/ /*Data management - creating a secondary variable and reassigning code values to populate new variable NBR_GROUPS_IN. This will be used understand context of group participation and whether is a large enough sample, by understanding how many groups a person participates in. This field needs each survey result to be a simple yes/no (1/0). Summing the number of "participates" values for each question is not the same value as the number of distinct people who participate, as some participate in multiple organizations. */ /*Also these steps reassigning missing data. Setting to '.' for original variable - and setting to '0' for use in NBR_GROUPS_IN variable - because setting to Missing prevents NBR_GROUPS_IN value from being calculated*/ IF W1_L1_A = 1 OR W1_L1_A = 2 then do; /* 1 = participated more than twice, 2=participated once or twice*/ W1_L1_A = 1 ; /* transformed to 1 (yes I participated)*/ W1_L1_A_MissEqNo = 1; /* transformed to 1 (yes I participated)*/ END; ELSE IF W1_L1_A = -1 then do; /* -1 = refused*/ W1_L1_A = .; W1_L1_A_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END; ELSE DO; W1_L1_A = 0; /* transformed to 0 = no I didn’t participate*/ W1_L1_A_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END; IF W1_L1_C = 1 OR W1_L1_C = 2 THEN DO; /* 1 = participated more than twice, 2=participated once or twice*/ W1_L1_C = 1; /* transformed to 1 (yes I participated)*/ W1_L1_C_MissEqNo = 1; /* transformed to 1 (yes I participated)*/ END; ELSE IF W1_L1_C = -1 then do; /*-1 = refused*/ W1_L1_C = .; W1_L1_C_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END; ELSE DO; W1_L1_C = 0; /* transformed to 0 = no I didn’t participate*/ W1_L1_C_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END;
IF W1_L3 = 1 THEN DO; W1_L3 = 1; /* included for consistency with other variables��� management*/ W1_L3_MissEqNo = 1; /* transformed to 1 (yes I participated)*/ END; IF W1_L3 = -1 THEN DO; /*-1 = refused*/ W1_L3 = .; W1_L3_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END; ELSE IF W1_L3 = 2 then do; /* 2= no*/ W1_L3 = 0; /*transformed 2 to 0 to have consistent meaning with other No values*/ W1_L3_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END;
NBR_GROUPS_IN = W1_L1_A_MissEqNo + W1_L1_C_MissEqNo + W1_L2_1 + W1_L2_2 + W1_L2_3 + W1_L2_5 + W1_L3_MissEqNo; /*Bin Participation in the various activities into one variable*/ IF W1_L1_A = 1 OR W1_L1_A = 2 THEN PARTICIPATED = 1; /*Evaluate participation in Professional Association */ ELSE IF W1_L1_C = 1 OR W1_L1_C = 2 THEN PARTICIPATED = 1; /*Evaluate participation in Cultural Organization*/ ELSE IF W1_L2_1 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in NAACP*/ ELSE IF W1_L2_2 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in National Urban League*/ ELSE IF W1_L2_3 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in Southern Christian Leadership Conference*/ ELSE IF W1_L2_5 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in Occupy Wall Street Movement*/ ELSE IF W1_L3 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in community group*/ ELSE PARTICIPATED = 0; /*Set remainder to 0 - meaning No participation*/ /*Transform Gender into only 0 & 1 values for regression. Male is originally = 1 in data, so not transform needed. Converted Female (original value of 2) to 0 */ IF PPGENDER = 2 then PPGENDER = 0; /*Since Voting metrics are essential to testing hypothesis, have selected for observations where Voting metrics are populated (this is indicated by Wave 2 participation number - column W2_CASEID2 - because those questions occurred in Wave 2 testing)*/ IF W2_CASEID2 > 0; /*Narrowing hypothesis to age range of 18 to 44 because older ages typically do vote and are community engaged so wish to select for those who may not be voting or may not be community engaged - or both */ IF PPAGE GE 18 AND PPAGE LE 44;
PROC SORT ; by CASEID; PROC FREQ; TABLES VOTED; PROC FREQ; TABLES NBR_GROUPS_IN; PROC GLM; model NBR_GROUPS_IN = VOTED;
RESULTS:
The FREQ Procedure
VOTED
Frequency
Percent
Cumulative Frequency
Cumulative Percent
Frequency Missing = 18
0
123
22.99
123
22.99
1
412
77.01
535
100.00
The FREQ Procedure
NBR_GROUPS_IN
Frequency
Percent
Cumulative Frequency
Cumulative Percent
0
369
66.73
369
66.73
1
98
17.72
467
84.45
2
50
9.04
517
93.49
3
30
5.42
547
98.92
4
4
0.72
551
99.64
5
2
0.36
553
100.00
The GLM Procedure
Number of Observations Read
553
Number of Observations Used
535
The GLM Procedure
Dependent Variable: NBR_GROUPS_IN
Source
DF
Sum of Squares
Mean Square
F Value
Pr > F
Model
1
8.2129270
8.2129270
9.14
0.0026
Error
533
479.0468861
0.8987746
Corrected Total
534
487.2598131
R-Square
Coeff Var
Root MSE
NBR_GROUPS_IN Mean
0.016855
166.8421
0.948037
0.568224
Source
DF
Type I SS
Mean Square
F Value
Pr > F
VOTED
1
8.21292698
8.21292698
9.14
0.0026
Source
DF
Type III SS
Mean Square
F Value
Pr > F
VOTED
1
8.21292698
8.21292698
9.14
0.0026
Parameter
Estimate
Standard Error
t Value
Pr > |t|
Intercept
0.3414634146
0.08548164
3.99
<.0001
VOTED
0.2944589155
0.09740946
3.02
0.0026
0 notes
Text
Course 3 Week 2 Homework – Basics of Linear Regression - Model results
(SAS Results)
The FREQ Procedure
VOTED
Frequency
Percent
Cumulative Frequency
Cumulative Percent
Frequency Missing = 18
0
123
22.99
123
22.99
1
412
77.01
535
100.00
The FREQ Procedure
NBR_GROUPS_IN
Frequency
Percent
Cumulative Frequency
Cumulative Percent
0
369
66.73
369
66.73
1
98
17.72
467
84.45
2
50
9.04
517
93.49
3
30
5.42
547
98.92
4
4
0.72
551
99.64
5
2
0.36
553
100.00
The GLM Procedure
Number of Observations Read
553
Number of Observations Used
535
The GLM Procedure
Dependent Variable: NBR_GROUPS_IN
Source
DF
Sum of Squares
Mean Square
F Value
Pr > F
Model
1
8.2129270
8.2129270
9.14
0.0026
Error
533
479.0468861
0.8987746
Corrected Total
534
487.2598131
R-Square
Coeff Var
Root MSE
NBR_GROUPS_IN Mean
0.016855
166.8421
0.948037
0.568224
Source
DF
Type I SS
Mean Square
F Value
Pr > F
VOTED
1
8.21292698
8.21292698
9.14
0.0026
Source
DF
Type III SS
Mean Square
F Value
Pr > F
VOTED
1
8.21292698
8.21292698
9.14
0.0026
Parameter
Estimate
Standard Error
t Value
Pr > |t|
Intercept
0.3414634146
0.08548164
3.99
<.0001
VOTED
0.2944589155
0.09740946
3.02
0.0026
0 notes
Text
Course 3 Week 2 Homework – Basics of Linear Regression - Code
LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;
DATA new; SET mydata.oll_pds (KEEP = CASEID W2_CASEID2 W1_L1_A W1_L1_C W1_L2_1 W1_L2_2 W1_L2_3 W1_L2_5 W1_L3 W2_QB1A W2_QB3 PPAGE PPAGECAT PPETHM W1_D1 W1_N1A W1_P17 W1_P17A PPHHSIZE PPGENDER); LABEL W1_L1_A="Participation in Professional Association in last 12 months" W1_L1_C="Participation in Cultural Organization in last 12 months" W1_L2_1="Participation in NAACP in last 12 months" W1_L2_2="Participation in National Urban League in last 12 months" W1_L2_3="Participation in Southern Christian Leadership Conference in last 12 months" W1_L2_5="Participation in Occupy Wall Street Movement in last 12 months" W1_L3="Participation in community group in last 12 months" W2_QB1A="Participation in election on Nov 6 2012" W2_QB3="Regularity of Voting" PPAGE="Age" PPAGECAT="Age Category" PPETHM="Ethnic Category" W1_D1 = "Rating of Barack Obama" W1_P17 = "Have Children" W1_P17A = "Number of children" PPGENDER = "Gender" ; /* used to view original data and validate accuracy of later program steps*/ /*proc print;*/ /*Data management - coding out missing data for those variables where need no other form of reassignment*/ If W2_QB3 = -1 then W2_QB3 = .; /* -1 = missing*/ If W1_D1 = -1 OR W1_D1 = 998 then W1_D1 = .; /*-1 = missing and 998 = refused*/ If W1_P17A = -1 then W1_P17A = .; /*-1 = missing*/ If W1_B2 = -1 then W1_B2 = .; /*-1 = missing*/ /*Data management - Create secondary variable to aggregate voting participation to show only whether did vote or not or missing. Set missing and not sure code values to missing. */ If W2_QB1A >= 2 AND W2_QB1A <= 5 then /*2 through 5 describe different methods of voting*/ VOTED = 1;/* transformed to 1 = yes I voted*/ ELSE IF W2_QB1A = -1 OR W2_QB1A = 6 then /*-1 = missing, 6 = not sure*/ W2_QB1A = .; /*transformed to one value for Missing*/ ELSE IF W2_QB1A = 1 then /* 1 = Did not vote*/ VOTED = 0; /*transformed to zero, the code that usually means "no"*/ If W1_P17 = -1 then W1_P17 = .;/*-1 = missing*/ ELSE IF W1_P17 = 2 then /* 2 = No*/ W1_P17 = 0; /*transformed to zero, the code that usually means "no"*/ /*Data management - creating a secondary variable and reassigning code values to populate new variable NBR_GROUPS_IN. This will be used understand context of group participation and whether is a large enough sample, by understanding how many groups a person participates in. This field needs each survey result to be a simple yes/no (1/0). Summing the number of "participates" values for each question is not the same value as the number of distinct people who participate, as some participate in multiple organizations. */ /*Also these steps reassigning missing data. Setting to '.' for original variable - and setting to '0' for use in NBR_GROUPS_IN variable - because setting to Missing prevents NBR_GROUPS_IN value from being calculated*/ IF W1_L1_A = 1 OR W1_L1_A = 2 then do; /* 1 = participated more than twice, 2=participated once or twice*/ W1_L1_A = 1 ; /* transformed to 1 (yes I participated)*/ W1_L1_A_MissEqNo = 1; /* transformed to 1 (yes I participated)*/ END; ELSE IF W1_L1_A = -1 then do; /* -1 = refused*/ W1_L1_A = .; W1_L1_A_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END; ELSE DO; W1_L1_A = 0; /* transformed to 0 = no I didn’t participate*/ W1_L1_A_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END; IF W1_L1_C = 1 OR W1_L1_C = 2 THEN DO; /* 1 = participated more than twice, 2=participated once or twice*/ W1_L1_C = 1; /* transformed to 1 (yes I participated)*/ W1_L1_C_MissEqNo = 1; /* transformed to 1 (yes I participated)*/ END; ELSE IF W1_L1_C = -1 then do; /*-1 = refused*/ W1_L1_C = .; W1_L1_C_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END; ELSE DO; W1_L1_C = 0; /* transformed to 0 = no I didn’t participate*/ W1_L1_C_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END;
IF W1_L3 = 1 THEN DO; W1_L3 = 1; /* included for consistency with other variables’ management*/ W1_L3_MissEqNo = 1; /* transformed to 1 (yes I participated)*/ END; IF W1_L3 = -1 THEN DO; /*-1 = refused*/ W1_L3 = .; W1_L3_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END; ELSE IF W1_L3 = 2 then do; /* 2= no*/ W1_L3 = 0; /*transformed 2 to 0 to have consistent meaning with other No values*/ W1_L3_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END;
NBR_GROUPS_IN = W1_L1_A_MissEqNo + W1_L1_C_MissEqNo + W1_L2_1 + W1_L2_2 + W1_L2_3 + W1_L2_5 + W1_L3_MissEqNo; /*Bin Participation in the various activities into one variable*/ IF W1_L1_A = 1 OR W1_L1_A = 2 THEN PARTICIPATED = 1; /*Evaluate participation in Professional Association */ ELSE IF W1_L1_C = 1 OR W1_L1_C = 2 THEN PARTICIPATED = 1; /*Evaluate participation in Cultural Organization*/ ELSE IF W1_L2_1 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in NAACP*/ ELSE IF W1_L2_2 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in National Urban League*/ ELSE IF W1_L2_3 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in Southern Christian Leadership Conference*/ ELSE IF W1_L2_5 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in Occupy Wall Street Movement*/ ELSE IF W1_L3 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in community group*/ ELSE PARTICIPATED = 0; /*Set remainder to 0 - meaning No participation*/ /*Transform Gender into only 0 & 1 values for regression. Male is originally = 1 in data, so not transform needed. Converted Female (original value of 2) to 0 */ IF PPGENDER = 2 then PPGENDER = 0; /*Since Voting metrics are essential to testing hypothesis, have selected for observations where Voting metrics are populated (this is indicated by Wave 2 participation number - column W2_CASEID2 - because those questions occurred in Wave 2 testing)*/ IF W2_CASEID2 > 0; /*Narrowing hypothesis to age range of 18 to 44 because older ages typically do vote and are community engaged so wish to select for those who may not be voting or may not be community engaged - or both */ IF PPAGE GE 18 AND PPAGE LE 44;
PROC SORT ; by CASEID; PROC FREQ; TABLES VOTED; PROC FREQ; TABLES NBR_GROUPS_IN; PROC GLM; model NBR_GROUPS_IN = VOTED;
0 notes
Text
Course 3 Week 2 Homework – Basics of Linear Regression - Summary
Adapting my original hypothesis to this assignment requirements, the Null hypothesis this model is testing is: There is NO relationship between the whether or not a person votes (binary categorical explanatory variable) and the number of professional or community groups that person participates in (quantitative response variable).
Variable values:
· VOTED (whether person voted in November 2012 US federal election): 0 = No, 1 = Yes
· NBR_GROUPS_IN: 0-5
Summary of results:
P value: .0026 – this value does cross the alpha threshold indicating we can REJECT hypothesis and conclude that there is a relationship between voting and the number of groups participated in.
Beta1 value: The model provides the beta value (VOTED estimate in results table) of .29, showing a Positive association of Voting with the number of groups participated in
· Beta1 value is also known as the regression co-efficient “m” in the linear equation y=mx + b
· Beta0 value is the intercept “b” in the linear equation y=mx + b. For this model that value is .34
0 notes
Text
Course 3 Regression, Week 1 Intro to Regression Homework
BACKGROUND ON DATA USED FOR STUDY
METHODS:
1 & 2. Sample & Data Collection Procedures:
Data is from the Outlook on Life Surveys, conducted by GfK Knowledge Networks on behalf of the University of California Irvine. Data is made available by the Inter-university Consortium for Political and Social Research (ICPSR). http://www.icpsr.umich.edu/icpsrweb/content/membership/index.html.
The two instances of this survey were fielded between August and December 2012 from a sample from GfK’s web panel designed to be representative of the United States population. Panel members are randomly recruited through probability-based sampling and households are provided with access to the Internet and hardware if needed. Random-digit dialing and address-based sampling methodologies are used. The target population were non-institutionalized adults 18 years of age and older.
A total of 2294 respondents participated during Wave 1 survey of the Outlook on Life and 1601 were interviewed during Wave 2.
The focus of my research question is whether younger adult’s participation in community and professional groups is related to their voting habits. Because this included a voting participation survey question only in Wave 2 survey, only those who participated in Wave 2 were included in this research (therefore, up to 1601 participants)
Participants were further restricted to those who were 44 years or less – this resulted in a sample of 553 participants in the research.
The ethnic composition was White (n= 192, 35%), African American (n=297, 54%), Other-Non-Hispanic & 2 or more races (n =24, 4%), Hispanic (n=40, 7%). The gender composition was male (n=260, 47%) and female (n=293, 53%).
3. Measures – including variables used & data management performed:
Variables used:
· Participation in 7 non-partisan professional or community groups within the last 12 months - Categorical data
o Explanatory variables
o Categorical data
o Values for 2 groups (choices: frequency of participation and refusal to answer): -1, 1, 2, 3, 4
o Values for 4 groups (choices: yes/no): 0, 1
o Value for 1 group (choices: yes/no/refusal to answer): -1, 1, 2
· Participation in election on 11/7/2012
o Response variable
o Categorical data
o Values -1, 1-6 (choices: various methods of voting, didn’t vote, refusal to answer)
· Age
o Selection criteria for study
o Categorical data
o Values 18-81
Data Management:
· Missing – set all variables with a data value meaning No Answer to Missing
· Participation in community or professional group values:
o 2 group’s variables that listed frequency in addition to participation (variables W1_L1_A & W1_L1_C): transformed frequency values (values 1 and 2) to 1 and values meaning no participation or belonging (3 and 4) to 0
o 1 group’s variables that captured participation with different values from other questions (variables W1_L1_A & W1_L1_C): transformed Yes & No values from 1 and 2, respectively, to align with values used for the other variables (to 1 and 0)
· PARTICIPATED variable: derived a variable which consolidated participation value of all 7 groups/types of groups into 1 variable. Rule: if a person participated with any 1 or more times in any group, set value to Y; else value = N
· MissEqNo variables: To be able to sum the number of groups the person participated in, wanted Value to only reflect 0 (No Participation) or 1 (Participation at any frequency). Therefore, set derived variables for the 3 groups which had Missing values as -1 to 0
· Selection criteria:
o Respondents with Age less than or equal to 44
o Data records with a Wave2 CaseID
0 notes
Text
Course 2 Data Analysis Tools: Week 4 Exploring Statistical Interactions
Null Hypothesis: That for 18 – 44 year olds, whether or not they have children DOES NOT moderate the relationship between whether they participate in community or professional groups to their likelihood to vote.
Alternate Hypothesis: That for 18 – 44 year olds, whether or not they have children DOES moderate the relationship between whether they participate in community or professional groups to their likelihood to vote.
SAS Program:
LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;
DATA new;
SET mydata.oll_pds (KEEP = CASEID W2_CASEID2 W1_L1_A W1_L1_C W1_L2_1 W1_L2_2 W1_L2_3 W1_L2_5 W1_L3 W2_QB1A W2_QB3 PPAGE PPAGECAT PPETHM W1_D1 W1_N1A W1_P17 W1_P17A PPHHSIZE);
LABEL
W1_P17 = "Have Children";
(showing only needed portions of code – excludes all data management and selection criteria)
/* Week 4's assignment: PARTICIPATED to VOTED, influenced by Having Children */
PROC FREQ; TABLES VOTED*PARTICIPATED/CHISQ;
PROC GCHART; VBAR PARTICIPATED/DISCRETE TYPE=MEAN SUMVAR=VOTED;
PROC SORT; BY W1_P17;
PROC FREQ; TABLES VOTED*PARTICIPATED/CHISQ; BY W1_P17;
Results Summary and Analysis:
ChiSquared result on only the 2 variables (PARTICIPATED & VOTED)
* ChiSquared value is substantial at 8.2. (normal threshold to consider a significant association is > 3.84). This indicates there is a notable difference between expected and observed results and the association of the variables is significant.
· P value is .0042 which is far less than the significance threshold of .05 – this low value suggests that the relationship is statistically significant and reliable such that samples would give the same results
· Conclusion on the 2 variables: With high Chi Squared and low p value, this indicates a strong associative relationship between Participation in community and professional groups and Voting.
ChiSquared result regarding moderation of Having Children on Participating in groups and Voting
· For those without children (Have Children = 0): ChiSquared value is not large (3.01) and p value does not cross significance threshold of .05 (p is .082)
· For those with children (Have Children =1): ChiSquared value is large (5.31) and p value does cross significance threshold of .05 (p is .021)
Conclusion:
· Null hypothesis should be rejected and Alternate hypothesis accepted: whether a person has children DOES moderate the relationship between Participation in community groups and Voting
o This is shown by the fact that the ChiSquared and p values between the 2 Have Children values (0-No/1-Yes) are significantly different. That is, the values for Having children show a significant ChiSquared and p values and those for Not Having children do not.
OUTPUT from FREQ procedures:
Table VOTED * PARTICIPATED (WITHOUT MODERATOR)
Cross-Tabular Freq Table
Frequency
Percent
Row Pct
Col Pct
Table of VOTED by PARTICIPATED
VOTED
PARTICIPATED
0
1
Total
0
95
17.76
77.24
26.69
28
5.23
22.76
15.64
123
22.99
1
261
48.79
63.35
73.31
151
28.22
36.65
84.36
412
77.01
Total
356
66.54
179
33.46
535
100.00
Frequency Missing = 18
Chi-Square Tests
Statistic
DF
Value
Prob
Chi-Square
1
8.2040
0.0042
Likelihood Ratio Chi-Square
1
8.6083
0.0033
Continuity Adj. Chi-Square
1
7.5921
0.0059
Mantel-Haenszel Chi-Square
1
8.1886
0.0042
Phi Coefficient
0.1238
Contingency Coefficient
0.1229
Cramer's V
0.1238
Fisher's Exact Test
Fisher's Exact Test
Cell (1,1) Frequency (F)
95
Left-sided Pr <= F
0.9988
Right-sided Pr >= F
0.0025
Table Probability (P)
0.0013
Two-sided Pr <= P
0.0045
Effective Sample Size = 535 Frequency Missing = 18
RESULTS WITH MODERATOR “Have Children”
Have Children=0
Table VOTED * PARTICIPATED (WITH MODERATOR)
Cross-Tabular Freq Table
Frequency
Percent
Row Pct
Col Pct
Table of VOTED by PARTICIPATED
VOTED
PARTICIPATED
0
1
Total
0
48
16.90
71.64
26.97
19
6.69
28.36
17.92
67
23.59
1
130
45.77
59.91
73.03
87
30.63
40.09
82.08
217
76.41
Total
178
62.68
106
37.32
284
100.00
Frequency Missing = 9
Chi-Square Tests
Statistic
DF
Value
Prob
Chi-Square
1
3.0131
0.0826
Likelihood Ratio Chi-Square
1
3.1000
0.0783
Continuity Adj. Chi-Square
1
2.5324
0.1115
Mantel-Haenszel Chi-Square
1
3.0025
0.0831
Phi Coefficient
0.1030
Contingency Coefficient
0.1025
Cramer's V
0.1030
Fisher's Exact Test
Fisher's Exact Test
Cell (1,1) Frequency (F)
48
Left-sided Pr <= F
0.9714
Right-sided Pr >= F
0.0544
Table Probability (P)
0.0258
Two-sided Pr <= P
0.0855
Effective Sample Size = 284 Frequency Missing = 9
Have Children=1
Table VOTED * PARTICIPATED
Cross-Tabular Freq Table
Frequency
Percent
Row Pct
Col Pct
Table of VOTED by PARTICIPATED
VOTED
PARTICIPATED
0
1
Total
0
44
17.96
83.02
25.58
9
3.67
16.98
12.33
53
21.63
1
128
52.24
66.67
74.42
64
26.12
33.33
87.67
192
78.37
Total
172
70.20
73
29.80
245
100.00
Frequency Missing = 4
Chi-Square Tests
Statistic
DF
Value
Prob
Chi-Square
1
5.3094
0.0212
Likelihood Ratio Chi-Square
1
5.7577
0.0164
Continuity Adj. Chi-Square
1
4.5564
0.0328
Mantel-Haenszel Chi-Square
1
5.2877
0.0215
Phi Coefficient
0.1472
Contingency Coefficient
0.1456
Cramer's V
0.1472
Fisher's Exact Test
Fisher's Exact Test
Cell (1,1) Frequency (F)
44
Left-sided Pr <= F
0.9949
Right-sided Pr >= F
0.0140
Table Probability (P)
0.0089
Two-sided Pr <= P
0.0267
Effective Sample Size = 245 Frequency Missing = 4
0 notes
Text
Course 2 Data Analysis Tools: Week 3 Pearson Correlation Coefficient
Null Hypothesis: That for 18 – 44 year olds, how many children they have DOES NOT have a relationship to their age.
Alternate Hypothesis: That for 18 – 44 year olds, how many children they have DOES have a relationship to their age.
NOTE about homework and variables selected:
Outlook on life didn’t have a lot of quantitative variables, so simply chose 2 that could display use of the Pearson test.
SAS Program:
LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;
DATA new;
SET mydata.oll_pds (KEEP = CASEID W2_CASEID2 W1_L1_A W1_L1_C W1_L2_1 W1_L2_2
W1_L2_3 W1_L2_5 W1_L3 W2_QB1A W2_QB3 PPAGE PPAGECAT PPETHM
W1_D1 W1_N1A W1_P17A PPHHSIZE);
LABEL
PPAGE="Age"
W1_P17A = "Number of children";
(showing only needed portions of code – excludes all data management and selection criteria)
PROC CORR; VAR PPAGE W1_P17A;
OUTPUT from CORR procedure:
2 Variables:
PPAGE W1_P17A
Simple Statistics
Variable
N
Mean
Std Dev
Sum
Minimum
Maximum
Label
PPAGE
553
31.64195
8.26652
17498
18.00000
44.00000
Age
W1_P17A
249
2.25301
1.21659
561.00000
1.00000
9.00000
Number of children
Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations
PPAGE
W1_P17A
PPAGE Age
1.00000 553
0.30393 <.0001 249
W1_P17A Number of children
0.30393 <.0001 249
1.00000 249
Results Summary and Analysis:
PEARSON (r) CORRELATION COEFFICIENT
· P value is <.0001 which is far less than the significance threshold of .05 – this low value suggests that the relationship is statistically significant and reliable such that samples would give the same results
· Coefficient (r) is .304. This shows a weak linear relationship (as it is closer to 0 than 1)
· r2 indicates what proportion of the variability in one variable is described by variation in the second variable (a.k.a. Coefficient of Determination).
o rsquared => .304 * .304 = .09 => only 9% of the time we would be able to predict – low ability to form predictions
Conclusion:
· Null hypothesis should be accepted: how many children a person has DOES NOT have a relationship to their age.
o The r and r2 values indicate very weak relationship
o Although the p value shows statistical significance, the magnitude of the effect is essentially meaningless (r2) and the relationship is weak (r)
o Because the meaning of the p and r values seemed to point to conflicting conclusions, I found an answer in the Course Forum on this situation to led me to this conclusion
0 notes
Text
Course 2 Data Analysis Tools: Week 2 Chi Square test of Independence
Null Hypothesis: That for 18 – 44 year olds, whether or not they voted DOES NOT vary by number of community or professional groups they participated in.
Alternate Hypothesis: That for 18 – 44 year olds, whether or not they voted varies by number of community or professional groups they participated in.
NOTE about homework and variables selected:
It appears that the assignment was to be applied on a situation with:
1. Categorical variable with more than 2 levels. My dataset and subject area has two: Number of community groups participated in & age category/group.
2. Chi Squared test was statistically significant (p < .05).
However, neither of these variables’ relationship to Voted variable had a p < .05. Because this is the question I am working with, I completed the assignment using Number of groups participated in, as it is the best I have.
Program code and all charts and statistics are below the Conclusion. Results are in this and a 2nd tumblr post as was too long to be all in one post.
Conclusion:
CHI SQUARED test (initial test on all categories/values)
· P value is .0587 – close to the threshold of .05, but not crossing it.
· From this result, this relationship is not significant and the Null hypothesis that there is no relationship between number of groups involved in and voting must be accepted.
Posthoc CHI SQUARED TESTS
· Bonferroni adjusted p value threshold to rejecting the null hypothesis is .003
· P values across the 15 pairs range from .9677 to .0167 – no values cross the threshold
· The lowest p values are between 0 groups involved in and 2 and 3 groups involved in (.0167 and .0521, respectively).
· From this result, this relationship is not significant and the Null hypothesis that there is no relationship between number of groups involved in and voting must be accepted.
SAS Program:
LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;
DATA new;
SET mydata.oll_pds (KEEP = CASEID W2_CASEID2 W1_L1_A W1_L1_C W1_L2_1 W1_L2_2
W1_L2_3 W1_L2_5 W1_L3 W2_QB1A W2_QB3 PPAGE PPAGECAT PPETHM);
LABEL W1_L1_A="Participation in Professional Association in last 12 months"
W1_L1_C="Participation in Cultural Organization in last 12 months"
W1_L2_1="Participation in NAACP in last 12 months"
W1_L2_2="Participation in National Urban League in last 12 months"
W1_L2_3="Participation in Southern Christian Leadership Conference in last 12 months"
W1_L2_5="Participation in Occupy Wall Street Movement in last 12 months"
W1_L3="Participation in community group in last 12 months"
W2_QB1A="Participation in election on Nov 6 2012"
W2_QB3="Regularity of Voting"
PPAGE="Age"
PPAGECAT="Age Category"
PPETHM="Ethnic Category";
(omitting from this listing the data management and selection code)
/*FREQ statement is Response Var*Explanatory Var/CHISQ */
PROC FREQ; TABLES VOTED*NBR_GROUPS_IN/CHISQ;
Data COMPARISON1; SET NEW;
IF NBR_GROUPS_IN = 0 OR NBR_GROUPS_IN = 1;
PROC FREQ; TABLES VOTED*NBR_GROUPS_IN/CHISQ;
RUN;
Data COMPARISON2; SET NEW;
IF NBR_GROUPS_IN = 0 OR NBR_GROUPS_IN = 2;
PROC FREQ; TABLES VOTED*NBR_GROUPS_IN/CHISQ;
RUN;
Data COMPARISON3; SET NEW;
IF NBR_GROUPS_IN = 0 OR NBR_GROUPS_IN = 3;
PROC FREQ; TABLES VOTED*NBR_GROUPS_IN/CHISQ;
RUN;
Data COMPARISON4; SET NEW;
IF NBR_GROUPS_IN = 0 OR NBR_GROUPS_IN = 4;
PROC FREQ; TABLES VOTED*NBR_GROUPS_IN/CHISQ;
RUN;
Data COMPARISON5; SET NEW;
IF NBR_GROUPS_IN = 0 OR NBR_GROUPS_IN = 5;
PROC FREQ; TABLES VOTED*NBR_GROUPS_IN/CHISQ;
RUN;
Data COMPARISON6; SET NEW;
IF NBR_GROUPS_IN = 1 OR NBR_GROUPS_IN = 2;
PROC FREQ; TABLES VOTED*NBR_GROUPS_IN/CHISQ;
RUN;
Data COMPARISON7; SET NEW;
IF NBR_GROUPS_IN = 1 OR NBR_GROUPS_IN = 3;
PROC FREQ; TABLES VOTED*NBR_GROUPS_IN/CHISQ;
RUN;
Data COMPARISON8; SET NEW;
IF NBR_GROUPS_IN = 1 OR NBR_GROUPS_IN = 4;
PROC FREQ; TABLES VOTED*NBR_GROUPS_IN/CHISQ;
RUN;
Data COMPARISON9; SET NEW;
IF NBR_GROUPS_IN = 1 OR NBR_GROUPS_IN = 5;
PROC FREQ; TABLES VOTED*NBR_GROUPS_IN/CHISQ;
RUN;
Data COMPARISON10; SET NEW;
IF NBR_GROUPS_IN = 2 OR NBR_GROUPS_IN = 3;
PROC FREQ; TABLES VOTED*NBR_GROUPS_IN/CHISQ;
RUN;
Data COMPARISON11; SET NEW;
IF NBR_GROUPS_IN = 2 OR NBR_GROUPS_IN = 4;
PROC FREQ; TABLES VOTED*NBR_GROUPS_IN/CHISQ;
RUN;
Data COMPARISON12; SET NEW;
IF NBR_GROUPS_IN = 2 OR NBR_GROUPS_IN = 5;
PROC FREQ; TABLES VOTED*NBR_GROUPS_IN/CHISQ;
RUN;
Data COMPARISON13; SET NEW;
IF NBR_GROUPS_IN = 3 OR NBR_GROUPS_IN = 4;
PROC FREQ; TABLES VOTED*NBR_GROUPS_IN/CHISQ;
RUN;
Data COMPARISON14; SET NEW;
IF NBR_GROUPS_IN = 3 OR NBR_GROUPS_IN = 5;
PROC FREQ; TABLES VOTED*NBR_GROUPS_IN/CHISQ;
RUN;
Data COMPARISON15; SET NEW;
IF NBR_GROUPS_IN = 4 OR NBR_GROUPS_IN = 5;
PROC FREQ; TABLES VOTED*NBR_GROUPS_IN/CHISQ;
RUN;
POST HOC TEST: BONFERRONI ADJUSTMENT
P divided by category levels => .05/15 = .003 <= New adjusted p value that is the threshold to rejecting the Null hypothesis.
Table of VOTED by NBR_GROUPS_IN
VOTED
NBR_GROUPS_IN
Frequency Percent Row Pct Col Pct
0
1
2
3
4
5
Total
0
95 17.76 77.24 26.69
19 3.55 15.45 19.59
5 0.93 4.07 10.64
3 0.56 2.44 10.34
1 0.19 0.81 25.00
0 0.00 0.00 0.00
123 22.99
1
261 48.79 63.35 73.31
78 14.58 18.93 80.41
42 7.85 10.19 89.36
26 4.86 6.31 89.66
3 0.56 0.73 75.00
2 0.37 0.49 100.00
412 77.01
Total
356 66.54
97 18.13
47 8.79
29 5.42
4 0.75
2 0.37
535 100.00
Frequency Missing = 18
Statistics for Table of VOTED by NBR_GROUPS_IN
Statistic
DF
Value
Prob
Chi-Square
5
10.6554
0.0587
Likelihood Ratio Chi-Square
5
12.2673
0.0313
Mantel-Haenszel Chi-Square
1
9.0007
0.0027
Phi Coefficient
0.1411
Contingency Coefficient
0.1397
Cramer's V
0.1411
WARNING: 33% of the cells have expected counts less than 5. Chi-Square may not be a valid test.
Effective Sample Size = 535 Frequency Missing = 18
Table of VOTED by NBR_GROUPS_IN
VOTED
NBR_GROUPS_IN
Frequency Percent Row Pct Col Pct
0
1
Total
0
95 20.97 83.33 26.69
19 4.19 16.67 19.59
114 25.17
1
261 57.62 76.99 73.31
78 17.22 23.01 80.41
339 74.83
Total
356 78.59
97 21.41
453 100.00
Frequency Missing = 14
Statistics for Table of VOTED by NBR_GROUPS_IN
Statistic
DF
Value
Prob
Chi-Square
1
2.0392
0.1533
Likelihood Ratio Chi-Square
1
2.1239
0.1450
Continuity Adj. Chi-Square
1
1.6797
0.1950
Mantel-Haenszel Chi-Square
1
2.0347
0.1537
Phi Coefficient
0.0671
Contingency Coefficient
0.0669
Cramer's V
0.0671
Fisher's Exact Test
Cell (1,1) Frequency (F)
95
Left-sided Pr <= F
0.9431
Right-sided Pr >= F
0.0958
Table Probability (P)
0.0389
Two-sided Pr <= P
0.1868
Effective Sample Size = 453 Frequency Missing = 14
Table of VOTED by NBR_GROUPS_IN
VOTED
NBR_GROUPS_IN
Frequency Percent Row Pct Col Pct
0
2
Total
0
95 23.57 95.00 26.69
5 1.24 5.00 10.64
100 24.81
1
261 64.76 86.14 73.31
42 10.42 13.86 89.36
303 75.19
Total
356 88.34
47 11.66
403 100.00
Frequency Missing = 16
Statistics for Table of VOTED by NBR_GROUPS_IN
Statistic
DF
Value
Prob
Chi-Square
1
5.7306
0.0167
Likelihood Ratio Chi-Square
1
6.6971
0.0097
Continuity Adj. Chi-Square
1
4.9028
0.0268
Mantel-Haenszel Chi-Square
1
5.7164
0.0168
Phi Coefficient
0.1192
Contingency Coefficient
0.1184
Cramer's V
0.1192
Fisher's Exact Test
Cell (1,1) Frequency (F)
95
Left-sided Pr <= F
0.9972
Right-sided Pr >= F
0.0095
Table Probability (P)
0.0068
Two-sided Pr <= P
0.0184
Effective Sample Size = 403 Frequency Missing = 16
Table of VOTED by NBR_GROUPS_IN
VOTED
NBR_GROUPS_IN
Frequency Percent Row Pct Col Pct
0
3
Total
0
95 24.68 96.94 26.69
3 0.78 3.06 10.34
98 25.45
1
261 67.79 90.94 73.31
26 6.75 9.06 89.66
287 74.55
Total
356 92.47
29 7.53
385 100.00
Frequency Missing = 14
Statistics for Table of VOTED by NBR_GROUPS_IN
Statistic
DF
Value
Prob
Chi-Square
1
3.7734
0.0521
Likelihood Ratio Chi-Square
1
4.4761
0.0344
Continuity Adj. Chi-Square
1
2.9614
0.0853
Mantel-Haenszel Chi-Square
1
3.7636
0.0524
Phi Coefficient
0.0990
Contingency Coefficient
0.0985
Cramer's V
0.0990
Fisher's Exact Test
Cell (1,1) Frequency (F)
95
Left-sided Pr <= F
0.9906
Right-sided Pr >= F
0.0353
Table Probability (P)
0.0258
Two-sided Pr <= P
0.0734
Effective Sample Size = 385 Frequency Missing = 14
Table of VOTED by NBR_GROUPS_IN
VOTED
NBR_GROUPS_IN
Frequency Percent Row Pct Col Pct
0
4
Total
0
95 26.39 98.96 26.69
1 0.28 1.04 25.00
96 26.67
1
261 72.50 98.86 73.31
3 0.83 1.14 75.00
264 73.33
Total
356 98.89
4 1.11
360 100.00
Frequency Missing = 13
Statistics for Table of VOTED by NBR_GROUPS_IN
Statistic
DF
Value
Prob
Chi-Square
1
0.0057
0.9396
Likelihood Ratio Chi-Square
1
0.0058
0.9392
Continuity Adj. Chi-Square
1
0.0000
1.0000
Mantel-Haenszel Chi-Square
1
0.0057
0.9397
Phi Coefficient
0.0040
Contingency Coefficient
0.0040
Cramer's V
0.0040
WARNING: 50% of the cells have expected counts less than 5. Chi-Square may not be a valid test.
Fisher's Exact Test
Cell (1,1) Frequency (F)
95
Left-sided Pr <= F
0.7126
Right-sided Pr >= F
0.7104
Table Probability (P)
0.4229
Two-sided Pr <= P
1.0000
Effective Sample Size = 360 Frequency Missing = 13
Table of VOTED by NBR_GROUPS_IN
VOTED
NBR_GROUPS_IN
Frequency Percent Row Pct Col Pct
0
5
Total
0
95 26.54 100.00 26.69
0 0.00 0.00 0.00
95 26.54
1
261 72.91 99.24 73.31
2 0.56 0.76 100.00
263 73.46
Total
356 99.44
2 0.56
358 100.00
Frequency Missing = 13
Statistics for Table of VOTED by NBR_GROUPS_IN
Statistic
DF
Value
Prob
Chi-Square
1
0.7265
0.3940
Likelihood Ratio Chi-Square
1
1.2376
0.2659
Continuity Adj. Chi-Square
1
0.0024
0.9606
Mantel-Haenszel Chi-Square
1
0.7245
0.3947
Phi Coefficient
0.0450
Contingency Coefficient
0.0450
Cramer's V
0.0450
WARNING: 50% of the cells have expected counts less than 5. Chi-Square may not be a valid test.
Fisher's Exact Test
Cell (1,1) Frequency (F)
95
Left-sided Pr <= F
1.0000
Right-sided Pr >= F
0.5391
Table Probability (P)
0.5391
Two-sided Pr <= P
1.0000
Effective Sample Size = 358 Frequency Missing = 13
Remainder of value pair tests are in the following link - full set of results was too much for 1 tumblr post.
https://greentine.tumblr.com/post/159660542333/course2week2-chi-square-test-post-hoc-tests
0 notes
Text
Course2Week2 - Chi Square Test - Post Hoc tests for NBR_GROUPS_IN not including value 1 & Post Hoc summary chart
Table of VOTED by NBR_GROUPS_IN
VOTED
NBR_GROUPS_IN
Frequency Percent Row Pct Col Pct
2
3
Total
0
5 6.58 62.50 10.64
3 3.95 37.50 10.34
8 10.53
1
42 55.26 61.76 89.36
26 34.21 38.24 89.66
68 89.47
Total
47 61.84
29 38.16
76 100.00
Frequency Missing = 4
Statistics for Table of VOTED by NBR_GROUPS_IN
Statistic
DF
Value
Prob
Chi-Square
1
0.0016
0.9677
Likelihood Ratio Chi-Square
1
0.0016
0.9677
Continuity Adj. Chi-Square
1
0.0000
1.0000
Mantel-Haenszel Chi-Square
1
0.0016
0.9679
Phi Coefficient
0.0046
Contingency Coefficient
0.0046
Cramer's V
0.0046
WARNING: 50% of the cells have expected counts less than 5. Chi-Square may not be a valid test.
Fisher's Exact Test
Cell (1,1) Frequency (F)
5
Left-sided Pr <= F
0.6554
Right-sided Pr >= F
0.6419
Table Probability (P)
0.2973
Two-sided Pr <= P
1.0000
Effective Sample Size = 76 Frequency Missing = 4
Table of VOTED by NBR_GROUPS_IN
VOTED
NBR_GROUPS_IN
Frequency Percent Row Pct Col Pct
2
4
Total
0
5 9.80 83.33 10.64
1 1.96 16.67 25.00
6 11.76
1
42 82.35 93.33 89.36
3 5.88 6.67 75.00
45 88.24
Total
47 92.16
4 7.84
51 100.00
Frequency Missing = 3
Statistics for Table of VOTED by NBR_GROUPS_IN
Statistic
DF
Value
Prob
Chi-Square
1
0.7324
0.3921
Likelihood Ratio Chi-Square
1
0.5915
0.4418
Continuity Adj. Chi-Square
1
0.0023
0.9621
Mantel-Haenszel Chi-Square
1
0.7181
0.3968
Phi Coefficient
-0.1198
Contingency Coefficient
0.1190
Cramer's V
-0.1198
WARNING: 50% of the cells have expected counts less than 5. Chi-Square may not be a valid test.
Fisher's Exact Test
Cell (1,1) Frequency (F)
5
Left-sided Pr <= F
0.4038
Right-sided Pr >= F
0.9369
Table Probability (P)
0.3407
Two-sided Pr <= P
0.4038
Effective Sample Size = 51 Frequency Missing = 3
Table of VOTED by NBR_GROUPS_IN
VOTED
NBR_GROUPS_IN
Frequency Percent Row Pct Col Pct
2
5
Total
0
5 10.20 100.00 10.64
0 0.00 0.00 0.00
5 10.20
1
42 85.71 95.45 89.36
2 4.08 4.55 100.00
44 89.80
Total
47 95.92
2 4.08
49 100.00
Frequency Missing = 3
Statistics for Table of VOTED by NBR_GROUPS_IN
Statistic
DF
Value
Prob
Chi-Square
1
0.2369
0.6264
Likelihood Ratio Chi-Square
1
0.4401
0.5071
Continuity Adj. Chi-Square
1
0.0000
1.0000
Mantel-Haenszel Chi-Square
1
0.2321
0.6300
Phi Coefficient
0.0695
Contingency Coefficient
0.0694
Cramer's V
0.0695
WARNING: 75% of the cells have expected counts less than 5. Chi-Square may not be a valid test.
Fisher's Exact Test
Cell (1,1) Frequency (F)
5
Left-sided Pr <= F
1.0000
Right-sided Pr >= F
0.8044
Table Probability (P)
0.8044
Two-sided Pr <= P
1.0000
Effective Sample Size = 49 Frequency Missing = 3
Table of VOTED by NBR_GROUPS_IN
VOTED
NBR_GROUPS_IN
Frequency Percent Row Pct Col Pct
3
4
Total
0
3 9.09 75.00 10.34
1 3.03 25.00 25.00
4 12.12
1
26 78.79 89.66 89.66
3 9.09 10.34 75.00
29 87.88
Total
29 87.88
4 12.12
33 100.00
Frequency Missing = 1
Statistics for Table of VOTED by NBR_GROUPS_IN
Statistic
DF
Value
Prob
Chi-Square
1
0.7088
0.3999
Likelihood Ratio Chi-Square
1
0.5868
0.4436
Continuity Adj. Chi-Square
1
0.0006
0.9802
Mantel-Haenszel Chi-Square
1
0.6873
0.4071
Phi Coefficient
-0.1466
Contingency Coefficient
0.1450
Cramer's V
-0.1466
WARNING: 75% of the cells have expected counts less than 5. Chi-Square may not be a valid test.
Fisher's Exact Test
Cell (1,1) Frequency (F)
3
Left-sided Pr <= F
0.4196
Right-sided Pr >= F
0.9376
Table Probability (P)
0.3572
Two-sided Pr <= P
0.4196
Effective Sample Size = 33 Frequency Missing = 1
Table of VOTED by NBR_GROUPS_IN
VOTED
NBR_GROUPS_IN
Frequency Percent Row Pct Col Pct
3
5
Total
0
3 9.68 100.00 10.34
0 0.00 0.00 0.00
3 9.68
1
26 83.87 92.86 89.66
2 6.45 7.14 100.00
28 90.32
Total
29 93.55
2 6.45
31 100.00
Frequency Missing = 1
Statistics for Table of VOTED by NBR_GROUPS_IN
Statistic
DF
Value
Prob
Chi-Square
1
0.2291
0.6322
Likelihood Ratio Chi-Square
1
0.4216
0.5161
Continuity Adj. Chi-Square
1
0.0000
1.0000
Mantel-Haenszel Chi-Square
1
0.2217
0.6378
Phi Coefficient
0.0860
Contingency Coefficient
0.0856
Cramer's V
0.0860
WARNING: 75% of the cells have expected counts less than 5. Chi-Square may not be a valid test.
Fisher's Exact Test
Cell (1,1) Frequency (F)
3
Left-sided Pr <= F
1.0000
Right-sided Pr >= F
0.8129
Table Probability (P)
0.8129
Two-sided Pr <= P
1.0000
Effective Sample Size = 31 Frequency Missing = 1
Table of VOTED by NBR_GROUPS_IN
VOTED
NBR_GROUPS_IN
Frequency Percent Row Pct Col Pct
4
5
Total
0
1 16.67 100.00 25.00
0 0.00 0.00 0.00
1 16.67
1
3 50.00 60.00 75.00
2 33.33 40.00 100.00
5 83.33
Total
4 66.67
2 33.33
6 100.00
Statistics for Table of VOTED by NBR_GROUPS_IN
Statistic
DF
Value
Prob
Chi-Square
1
0.6000
0.4386
Likelihood Ratio Chi-Square
1
0.9081
0.3406
Continuity Adj. Chi-Square
1
0.0000
1.0000
Mantel-Haenszel Chi-Square
1
0.5000
0.4795
Phi Coefficient
0.3162
Contingency Coefficient
0.3015
Cramer's V
0.3162
WARNING: 100% of the cells have expected counts less than 5. Chi-Square may not be a valid test.
Fisher's Exact Test
Cell (1,1) Frequency (F)
1
Left-sided Pr <= F
1.0000
Right-sided Pr >= F
0.6667
Table Probability (P)
0.6667
Two-sided Pr <= P
1.0000
Sample Size = 6
TABLE of POST HOC CHI SQUARED p values
pvalue
0
1
2
3
4
5
0
*
1
.1533
*
2
.0167
.1766
*
3
.0521
.2500
.9677
*
4
.9396
.7901
.3921
.3999
*
5
.3940
.4863
.6264
.6322
.4386
*
0 notes
Photo
Distribution chart for association of NBR_GROUPS_IN to Ethnic Category
0 notes
Text
Course 1 Data Mgmt & Visualization: Week 4 Visualization
Hypothesis: That 18 – 44 year olds who are involved in community or professional groups are more likely to vote. And that the more groups they are involved in, the greater the likelihood.
SAS Program:
(note - have to use scroll bar to see code to the right - which is mostly just comments)
LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly; DATA new; SET mydata.oll_pds (KEEP = CASEID W2_CASEID2 W1_L1_A W1_L1_C W1_L2_1 W1_L2_2 W1_L2_3 W1_L2_5 W1_L3 W2_QB1A W2_QB3 PPAGE PPAGECAT PPETHM); LABEL W1_L1_A="Participation in Professional Association in last 12 months" W1_L1_C="Participation in Cultural Organization in last 12 months" W1_L2_1="Participation in NAACP in last 12 months" W1_L2_2="Participation in National Urban League in last 12 months" W1_L2_3="Participation in Southern Christian Leadership Conference in last 12 months" W1_L2_5="Participation in Occupy Wall Street Movement in last 12 months" W1_L3="Participation in community group in last 12 months" W2_QB1A="Participation in election on Nov 6 2012" W2_QB3="Regularity of Voting" PPAGE="Age" PPAGECAT="Age Category" PPETHM="Ethnic Category"; /* used to view original data and validate accuracy of later program steps*/ /*proc print;*/ /*Data management - coding out missing data for those variables where need no other form of reassignment*/ If W2_QB3 = -1 then W2_QB3 = .; /* -1 = missing*/ /*Data management - Create secondary variable to aggregate voting participation to show only whether did vote or not or missing. Set missing and not sure code values to missing. */ If W2_QB1A >= 2 AND W2_QB1A <= 5 then /*2 through 5 describe different methods of voting*/ VOTED = 1;/* transformed to 1 = yes I voted*/ ELSE IF W2_QB1A = -1 OR W2_QB1A = 6 then /*-1 = missing, 6 = not sure*/ W2_QB1A = .; /*transformed to one value for Missing*/ ELSE IF W2_QB1A = 1 then /* 1 = Did not vote*/ VOTED = 0; /*transformed to zero, the code that usually means "no"*/ /*Data management - creating a secondary variable and reassigning code values to populate new variable NBR_GROUPS_IN. This will be used understand context of group participation and whether is a large enough sample, by understanding how many groups a person participates in. This field needs each survey result to be a simple yes/no (1/0). Summing the number of "participates" values for each question is not the same value as the number of distinct people who participate, as some participate in multiple organizations. */ /*Also these steps reassigning missing data. Setting to '.' for original variable - and setting to '0' for use in NBR_GROUPS_IN variable - because setting to Missing prevents NBR_GROUPS_IN value from being calculated*/ IF W1_L1_A = 1 OR W1_L1_A = 2 then do; /* 1 = participated more than twice, 2=participated once or twice*/ W1_L1_A = 1 ; /* transformed to 1 (yes I participated)*/ W1_L1_A_MissEqNo = 1; /* transformed to 1 (yes I participated)*/ END; ELSE IF W1_L1_A = -1 then do; /* -1 = refused*/ W1_L1_A = .; W1_L1_A_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END; ELSE DO; W1_L1_A = 0; /* transformed to 0 = no I didn’t participate*/ W1_L1_A_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END; IF W1_L1_C = 1 OR W1_L1_C = 2 THEN DO; /* 1 = participated more than twice, 2=participated once or twice*/ W1_L1_C = 1; /* transformed to 1 (yes I participated)*/ W1_L1_C_MissEqNo = 1; /* transformed to 1 (yes I participated)*/ END; ELSE IF W1_L1_C = -1 then do; /*-1 = refused*/ W1_L1_C = .; W1_L1_C_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END; ELSE DO; W1_L1_C = 0; /* transformed to 0 = no I didn’t participate*/ W1_L1_C_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END; IF W1_L3 = 1 THEN DO; W1_L3 = 1; /* included for consistency with other variables’ management*/ W1_L3_MissEqNo = 1; /* transformed to 1 (yes I participated)*/ END; IF W1_L3 = -1 THEN DO; /*-1 = refused*/ W1_L3 = .; W1_L3_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END; ELSE IF W1_L3 = 2 then do; /* 2= no*/ W1_L3 = 0; /*transformed 2 to 0 to have consistent meaning with other No values*/ W1_L3_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END; NBR_GROUPS_IN = W1_L1_A_MissEqNo + W1_L1_C_MissEqNo + W1_L2_1 + W1_L2_2 + W1_L2_3 + W1_L2_5 + W1_L3_MissEqNo; /*Bin Participation in the various activities into one variable*/ IF W1_L1_A = 1 OR W1_L1_A = 2 THEN PARTICIPATED = 1; /*Evaluate participation in Professional Association */ ELSE IF W1_L1_C = 1 OR W1_L1_C = 2 THEN PARTICIPATED = 1; /*Evaluate participation in Cultural Organization*/ ELSE IF W1_L2_1 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in NAACP*/ ELSE IF W1_L2_2 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in National Urban League*/ ELSE IF W1_L2_3 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in Southern Christian Leadership Conference*/ ELSE IF W1_L2_5 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in Occupy Wall Street Movement*/ ELSE IF W1_L3 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in community group*/ ELSE PARTICIPATED = 0; /*Set remainder to 0 - meaning No participation*/ /*Since Voting metrics are essential to testing hypothesis, have selected for observations where Voting metrics are populated (this is indicated by Wave 2 participation number - column W2_CASEID2 - because those questions occurred in Wave 2 testing)*/ IF W2_CASEID2 > 0; /*Narrowing hypothesis to age range of 18 to 44 because older ages typically do vote and are community engaged so wish to select for those who may not be voting or may not be community engaged - or both */ IF PPAGE GE 18 AND PPAGE LE 44; proc sort ; by CASEID; /*used to validate accuracy of derived and transformed values proc print; */ proc freq ; tables W1_L1_A W1_L1_A_MissEqNo W1_L1_C W1_L1_C_MissEqNo W1_L2_1 W1_L2_2 W1_L2_3 W1_L2_5 W1_L3 W1_L3_MissEqNo W2_QB1A VOTED W2_QB3 PPAGE PPAGECAT PPETHM NBR_GROUPS_IN PARTICIPATED; /*Univariate graphs*/ proc univariate; var NBR_GROUPS_IN; proc gchart; vbar participated/discrete type = pct width=20; /*univariate graph for particpated variable*/ proc gchart; vbar voted/discrete type = pct width = 20; /*univariate graph for voted variable*/ proc gchart; vbar nbr_groups_in/discrete type = pct width = 15; /*univariate graph for nbr_groups_in variable*/ /*Bivariate graphs*/ proc gchart; vbar participated/discrete type =mean sumvar =voted;/* bivariate graph for explanatory variable Participated and response variable voted*/ proc gchart; vbar NBR_GROUPS_IN/discrete type =mean sumvar =voted ; /* bivariate graph for explanatory variable nbr_groups_in and response variable voted*/ RUN;
UNIVARIATE GRAPHS (3 variables are discussed: NBR_GROUPS_IN, PARTICIPATED and VOTED)
1. NBR_GROUPS_IN – this variable aggregates into a count the results of several questions on whether respondents participated in 7 groups described in the Outlook on Life survey
A. Used Proc gchart for a Histogram (bar chart) for this quantitative variable –
· 66% of people are not involved in any group; 18% in 1 group; 9% in 2 groups; 5% in 3 groups; 2% in 4 groups; 1% in 5 groups; 0 in 6 or 7 of those with questions in this survey data
· Assessment: Results are skewed right. This could be because each additional group one is involved in requires an additional amount of time and energy, so fewer people would be engaged in several groups.
(To see graph go to this tumblr post - I was not able to figure out how to past those images into a text tumblr, unfortunately. All 5 charts for this assignment are in this same post, in the correct order to match this text)
Graph link:
https://greentine.tumblr.com/post/159175222318/graphs-and-charts-for-data-management
B. NBR_GROUPS_IN - Used Proc univariate for this quantitative variable –
· Evaluation of Center:
i. Mean = .57 (in a range from 0 to 7 possible, the low number given the large proportion of 0s makes sense)
ii. Mode = 0 (as also described in Histogram above)
iii. Median = 0 (not surprising given the overwhelming count of 0 in this distribution)
· Evaluation of Spread:
i. Standard Deviation = .95
ii. The average number of groups = Median +/- Std Deviation => 0 + .95 or .95.
iii. Assessment – “normal” values of participation in groups is almost 1 (rounded up .95). This means that the “unusual” amount of the population participate in more than 1 group – and this matches the information in the histogram, where it shows only about 35% participate in more than 1 group.
Moments
N
553
Sum Weights
553
Mean
0.56781193
Sum Observations
314
Std Deviation
0.95525535
Variance
0.91251278
Skewness
1.78281069
Kurtosis
2.76542196
Uncorrected SS
682
Corrected SS
503.707052
Coeff Variation
168.234461
Std Error Mean
0.04062159
Basic Statistical Measures
Location
Variability
Mean
0.567812
Std Deviation
0.95526
Median
0.000000
Variance
0.91251
Mode
0.000000
Range
5.00000
Interquartile Range
1.00000
Quantiles (Definition 5)
Level
Quantile
100% Max
5
99%
4
95%
3
90%
2
75% Q3
1
50% Median
0
25% Q1
0
10%
0
5%
0
1%
0
0% Min
0
Extreme Observations
Lowest
Highest
Value
Obs
Value
Obs
0
551
4
332
0
550
4
402
0
548
4
539
0
547
5
298
0
546
5
420
2. PARTICIPATED – an aggregation of results of participation in any of the 7 groups in the survey, into a simple Yes/No type of response
A. Used Proc gchart for a Histogram (bar chart) for this Categorical variable –
· 66% of people are not involved in any group; 35% participated in 1 or more of the 7 groups which had questions in this survey data.
· Assessment: vast majority of people do not participate in community or professional groups evaluated.
B. (NO PROC UNIVARIATE DONE ON PARTICIPATED, as it is not a Quantitative variable)
Graph link:
https://greentine.tumblr.com/post/159175222318/graphs-and-charts-for-data-management
3. VOTED – an aggregation of survey responses on How and Whether the respondent voted on November 6 2012 into a simple Yes/No type of response
A. Used Proc gchart for a Histogram (bar chart) for this Categorical variable
a. 24% of people did not Vote on the day in question; 76% of people did vote on the day.
b. Assessment: Vast majority of people did vote on this day. This seems reasonable because it was a US Presidential election, which usually brings high participation.
B. (NO PROC UNIVARIATE DONE ON VOTED, as it is not a Quantitative variable)
Graph link:
https://greentine.tumblr.com/post/159175222318/graphs-and-charts-for-data-management
BIVARIATE GRAPHS (Discussed: NBR_GROUPS_IN with VOTED and PARTICIPATED with VOTED)
PARTICIPATED with VOTED –
· PARTICIPATED is the Explanatory/independent variable and VOTED is the Response/dependent variable
· ASSESSMENT:
o 73% of those who didn’t participate in groups voted in contrast with 85% of those participating in 1 or more groups
§ 12% difference is more than just a couple percent and suggests there is a relationship between these two concepts
§ Given how few votes at times can make a difference in an election (especially a city or state election where the influence of 1 vote is greater than nationally), this difference could be impactful
o Given the usual low tendency of people to vote in non-Presidential years, a possible follow up survey could examine this result in a non-Presidential year, to see if community/professional involvement creates an even-greater differential.
Graph link:
https://greentine.tumblr.com/post/159175222318/graphs-and-charts-for-data-management
NBR_GROUPS_IN with VOTED –
· NBR_GROUPS_IN is the Explanatory/independent variable and VOTED is the Response/dependent variable
· ASSESSMENT:
o 73% of those who didn’t participate in groups voted; 80-100% of those in 1-5 groups did – with the exception of those in 4 groups which Voted about 75% of the time
§ This does suggest a fairly linear relationship between the number of groups participating in and likelihood of voting
§ Confidence in assessment: there were only 4 respondents in 4 groups and only 2 in the 5 group category. So, the certainty of this result and relationship is not highly confident.
o Alternate assessment: The number of respondents in 2 and 3 groups is 50 and 30 respectively (good sized numbers) and the Voting likelihood is the same at 90%. A greater sample size might find that Voting likelihood flattens at 90% at 2 groups and does not increase with greater number of group involvement.
CONCLUSION: Being involved in as few as 1 or 2 groups may greatly increase Voting.
Graph link:
https://greentine.tumblr.com/post/159175222318/graphs-and-charts-for-data-management
0 notes
Photo
Graphs and Charts for Data Management & Visualization Week 4 homework
0 notes
Text
Course 1 Data Mgmt & Visualization: Week 3 Data Management (updated)
Hypothesis: That 18 – 44 year olds who are engaged in community or professional organizations are more likely to vote
SAS Program:
LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly; DATA new; SET mydata.oll_pds (KEEP = CASEID W2_CASEID2 W1_L1_A W1_L1_C W1_L2_1 W1_L2_2 W1_L2_3 W1_L2_5 W1_L3 W2_QB1A W2_QB3 PPAGE PPAGECAT PPETHM); LABEL W1_L1_A="Participation in Professional Association in last 12 months" W1_L1_C="Participation in Cultural Organization in last 12 months" W1_L2_1="Participation in NAACP in last 12 months" W1_L2_2="Participation in National Urban League in last 12 months" W1_L2_3="Participation in Southern Christian Leadership Conference in last 12 months" W1_L2_5="Participation in Occupy Wall Street Movement in last 12 months" W1_L3="Participation in community group in last 12 months" W2_QB1A="Participation in election on Nov 6 2012" W2_QB3="Regularity of Voting" PPAGE="Age" PPAGECAT="Age Category" PPETHM="Ethnic Category"; /* used to view original data and validate accuracy of later program steps*/ /*proc print;*/ /*Data management - coding out missing data for those variables where need no other form of reassignment*/ If W2_QB1A = -1 OR W2_QB1A = 6 then W2_QB1A = .; /*-1 = missing, 6 = not sure*/ If W2_QB3 = -1 then W2_QB3 = .; /* -1 = missing*/ /*Data management - creating a secondary variable and reassigning code values to populate new variable NBR_GROUPS_IN. This will be used understand context of group participation and whether is a large enough sample, by understanding how many groups a person participates in. This field needs each survey question answer to be a simple yes/no (1/0). Summing the number of "participates" values for each question is not the same value as the number of distinct people who participate, as some participate in multiple organizations. */ /* Also, these steps reassign missing data. Setting to '.' for original variable - and setting to '0' for use in NBR_GROUPS_IN variable - because setting to Missing prevents NBR_GROUPS_IN value from being calculated*/ IF W1_L1_A = 1 OR W1_L1_A = 2 then do; /* 1 = participated more than twice, 2=participated once or twice*/ W1_L1_A = 1 ; /* transformed to 1 (yes I participated)*/ W1_L1_A_MissEqNo = 1; /* transformed to 1 (yes I participated)*/ END; ELSE IF W1_L1_A = -1 then do; /* -1 = refused*/ W1_L1_A = .; W1_L1_A_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END; ELSE DO; W1_L1_A = 0; /* transformed to 0 = no I didn’t participate*/ W1_L1_A_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END; IF W1_L1_C = 1 OR W1_L1_C = 2 THEN DO; /* 1 = participated more than twice, 2=participated once or twice*/ W1_L1_C = 1; /* transformed to 1 (yes I participated)*/ W1_L1_C_MissEqNo = 1; /* transformed to 1 (yes I participated)*/ END; ELSE IF W1_L1_C = -1 then do; /*-1 = refused*/ W1_L1_C = .; W1_L1_C_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END; ELSE DO; W1_L1_C = 0; /* transformed to 0 = no I didn’t participate*/ W1_L1_C_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END; IF W1_L3 = 1 THEN DO; W1_L3 = 1; /* included for consistency with other variables’ management*/ W1_L3_MissEqNo = 1; /* transformed to 1 (yes I participated)*/ END; IF W1_L3 = -1 THEN DO; /*-1 = refused*/ W1_L3 = .; W1_L3_MissEqNo = 0; /* transformed to 0 = no I didn’t participate*/ END; ELSE IF W1_L3 = 2 then do; /* 2= no*/ W1_L3 = 0; /*transformed 2 to 0 to have consistent meaning with other No values*/ W1_L3_MissEqNo = 0; /*transformed to 0 = no I didn’t participate*/ END; NBR_GROUPS_IN = W1_L1_A_MissEqNo + W1_L1_C_MissEqNo + W1_L2_1 + W1_L2_2 + W1_L2_3 + W1_L2_5 + W1_L3_MissEqNo; /*Bin Participation in the various activities into one variable*/ IF W1_L1_A = 1 OR W1_L1_A = 2 THEN PARTICIPATED = 1; /*Evaluate participation in Professional Association */ ELSE IF W1_L1_C = 1 OR W1_L1_C = 2 THEN PARTICIPATED = 1; /*Evaluate participation in Cultural Organization*/ ELSE IF W1_L2_1 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in NAACP*/ ELSE IF W1_L2_2 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in National Urban League*/ ELSE IF W1_L2_3 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in Southern Christian Leadership Conference*/ ELSE IF W1_L2_5 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in Occupy Wall Street Movement*/ ELSE IF W1_L3 = 1 THEN PARTICIPATED = 1; /*Evaluate participation in community group*/ ELSE PARTICIPATED = 0; /*Set remainder to 0 - meaning No participation*/ /*Since Voting metrics are essential to testing hypothesis, have selected for observations where Voting metrics are populated (this is indicated by Wave 2 participation number - column W2_CASEID2 - because those questions occurred in Wave 2 testing)*/ IF W2_CASEID2 > 0; /*Narrowing hypothesis to age range of 18 to 44 because older ages typically do vote and are community engaged so wish to select for those who may not be voting or may not be community engaged - or both */ IF PPAGE GE 18 AND PPAGE LE 44; proc sort ; by CASEID; proc freq ; tables W1_L1_A W1_L1_A_MissEqNo W1_L1_C W1_L1_C_MissEqNo W1_L2_1 W1_L2_2 W1_L2_3 W1_L2_5 W1_L3 W1_L3_MissEqNo W2_QB1A W2_QB3 PPAGE PPAGECAT PPETHM NBR_GROUPS_IN PARTICIPATED; RUN;
Four variables’ frequency tables
(included 4 variables to represent the different types)
· Of 2276 observations, after removing for the criteria above, there are 553 observations. So far this seems sufficient to draw some conclusions.
Participation in Professional Association in last 12 months
Data Management applied:
* Transformed data values 1 &2 which show participation at different frequency to 1/Yes. This is because am only interested in whether participated – and provides similar format to all 4 of the other groups’ data.
· Transformed data values 3&4 which indicate no participation in this group to 0/No
· Transformed Refused value to Missing
Values: 0 (Not participated), 1 (Yes participated at any frequency in 12 months), Missing (refused or missing)
Frequencies: 20% of people reported Yes, they participated in this activity – significant percentage to find this useful. This is the most common group studied for people to participate in.
Missing data: 3% records have missing data – not so large to impact usefulness
Participation in Professional Association in last 12 months
W1_L1_A
Frequency
Percent
Cumulative Frequency
Cumulative Percent
0
448
83.58
448
83.58
1
88
16.42
536
100.00
Frequency Missing = 17
Participation in Professional Association in last 12 months – Where Missing is translated to 0/No
Data Management applied:
· Transformed data values 1 &2 which show participation at different frequency to 1/Yes. This is because am only interested in whether participated – and provides similar information to all 4 of the other groups’ data.
· Transformed data values 3&4 which indicate no participation in this group to 0/No
· Transformed Refused value to No. This was done because when trying to sum how many people were involved in any activity, Missing values turns the Sum to missing, not a numeric value. For summing, a zero in this variable is appropriate.
Values: 0 (Not participated – or Refused), 1 (Yes participated at any frequency in 12 months)
Frequencies: 20% of people reported Yes, they participated in this activity – significant percentage to find this useful. This is the most common group studied for people to participate in.
Missing data: None due to transformation to zero
W1_L1_A_MissEqNo
Frequency
Percent
Cumulative Frequency
Cumulative Percent
0
465
84.09
465
84.09
1
88
15.91
553
100.00
NBR_GROUPS_IN
Data Management applied:
· Created this field to understand context of group participation and whether is a large enough sample, by understanding how many groups a person participates in.
· Summing the number of "participates" values for each question is not the same value as the number of distinct people who participate, as some participate in multiple organizations. Where it would have looked like 314 or 57% participate in activities if just summing “participates” values for each group- it turns out there are only about 33% participating
Values: 0 though 5
Frequencies: 33% of people are involved in one or more groups which is a good sized sample. Highest frequency of participation is 1 –98 or 53% of those participating in any groups. 1% of people in the sample of 553 are in 4 or 5 groups.
Missing data: None as this is a derived field where the rules ensured it getting populated
NBR_GROUPS_IN
Frequency
Percent
Cumulative Frequency
Cumulative Percent
0
369
66.73
369
66.73
1
98
17.72
467
84.45
2
50
9.04
517
93.49
3
30
5.42
547
98.92
4
4
0.72
551
99.64
5
2
0.36
553
100.00
PARTICIPATED
Data Management applied:
* Created this field to data to understand context of group participation and whether is a large enough sample, by understanding how many groups a person participates in.
· Summing the number of "participates" values for each question is not the same value as the number of distinct people who participate, as some participate in multiple organizations. Where it would have looked like 314 or 57% participate in activities if just summing “participates” values for each group- it turns out there are only about 33% participating
Values: 0 and 1
Frequencies: 33% of people are involved in one or more groups which is a good sized sample
Missing data: None as this is a derived field where the rules ensured it getting populated
PARTICIPATED
Frequency
Percent
Cumulative Frequency
Cumulative Percent
0
369
66.73
369
66.73
1
184
33.27
553
100.00
0 notes