srushtisuresh
srushtisuresh
Data Management and Visualization
4 posts
Don't wanna be here? Send us removal request.
srushtisuresh · 5 years ago
Text
WEEK 4 : DATA MANAGEMENT AND VISUALIZATION
STEP 1: Create graphs of your variables one at a time (univariate graphs).
FOR REFERENCES FOR THE GRAPH: 0 denotes : Not at all 1 denotes : 1-2 days in a week 2 denotes : 3-4 days in a week 3 denotes : 5 or more days in a week
Variable 1 : The code for the following : maindata["H1DA1"] = maindata["H1DA1"].astype('category') seaborn.countplot(x="H1DA1", data=maindata) plt.xlabel('House chores performed by the adolescents') plt.title('Univariate graph') d1=maindata["H1DA1"].describe() print(d1)
Tumblr media
We can clearly see that the house chores are actively performed by the adolescents.
 Variable 2: The code for the following : maindata["H1DA4"] = maindata["H1DA4"].astype('category') seaborn.countplot(x="H1DA4", data=maindata) plt.xlabel('Cycling performed by the adolescents') plt.title('Univariate graph') d4=maindata["H1DA4"].describe() print(d4)
Tumblr media
Clearly, A very less number of adolescents are involved in activities like cycling.
 Variable 3: The code for the following : maindata["H1DA5"] = maindata["H1DA5"].astype('category') seaborn.countplot(x="H1DA5", data=maindata) plt.xlabel('Active sport being performed by the adolescents') plt.title('Univariate graph') d5=maindata["H1DA5"].describe() print(d5)
Tumblr media
Clearly, maximum number of students aren’t involved in an active sport. However, there are quite a number of students involved in active sports.
Variable 4: maindata["H1DA6"] = maindata["H1DA6"].astype('category') seaborn.countplot(x="H1DA6", data=maindata) plt.xlabel('Walking/Jogging performed by the adolescents') plt.title('Univariate graph') d6=maindata["H1DA6"].describe() print(d6)
Tumblr media
Clearly, highest number of adolescents are moderately involved in activities like walking/jogging.
Variable 5 : As the values of the No. of drinks consumed per year quantitative, histogram has been created
Tumblr media
The graph does not really give us a clear idea as to what is the low/moderate/high alcohol consumption rates. However, a basic idea can be inferred that “0-400” drinks in a year are consumed by the maximum number of adolescents.
 STEP 2: Create a graph showing the association between your explanatory and response variables (bivariate graph).
The code of the following:
#Preparing alcohol consumption on the X axis for final analysis with daily activities maindata["ALCOHOL_FREQ"]=maindata["ALCOHOL_FREQ"].astype('category') maindata["ALCOHOL_FREQ"]=pandas.cut(maindata.ALCOHOL_FREQ, [0,104,260,20000])
print('0-104 drinks/year is considered low drinking')
print('104-260 drinks/year is considered moderate drinking')
print('260-20000 drinks/year is considered high drinking')
alcohol_categories=maindata["ALCOHOL_FREQ"].value_counts(sort=False, dropna=True)
print(alcohol_categories)
#Preparing the y axis for each daily acitivity into 2 categories for analysis
maindata["H1DA1"]=maindata["H1DA1"].astype('category')  def HOUSECHORES (row1):
   if row1["H1DA1"] == 0 :
       return 0
   elif row1["H1DA1"] == 1 :
       return 0
   elif row1["H1DA1"] == 2 :
       return 1
   elif row1["H1DA1"] == 3 :
       return 1    
maindata["HOUSECHORES"] = maindata.apply(lambda row1: HOUSECHORES (row1), axis=1) seaborn.catplot(x="ALCOHOL_FREQ", y="HOUSECHORES", data=maindata, kind="bar");
plt.xlabel("Alcohol consumption by the adolescents in the form of no. of drinks/year")
plt.ylabel("Daily house chores performed") maindata["H1DA4"]=maindata["H1DA4"].astype('category') def CYCLING (row2):
   if row2["H1DA4"] == 0 :
       return 0
   elif row2["H1DA4"] == 1 :
       return 0
   elif row2["H1DA4"] == 2 :
       return 1
   elif row2["H1DA4"] == 3 :
       return 1
maindata["CYCLING"] = maindata.apply(lambda row2: CYCLING (row2), axis=1) seaborn.catplot(x="ALCOHOL_FREQ", y="CYCLING", data=maindata, kind="bar"); plt.xlabel("Alcohol consumption by the adolescents in the form of no. of drinks/year") plt.ylabel("Daily cycling activity performed") maindata["H1DA5"]=maindata["H1DA5"].astype('category')
def ACTIVE (row3):
   if row3["H1DA5"] == 0 :
       return 0
   elif row3["H1DA5"] == 1 :
       return 0
   elif row3["H1DA5"] == 2 :
       return 1
   elif row3["H1DA5"] == 3 :
       return 1
maindata["ACTIVE"] = maindata.apply(lambda row3: ACTIVE (row3), axis=1) seaborn.catplot(x="ALCOHOL_FREQ", y="ACTIVE", data=maindata, kind="bar"); plt.xlabel("Alcohol consumption by the adolescents in the form of no. of drinks/year") plt.ylabel("Active sport being played") maindata["H1DA6"]=maindata["H1DA6"].astype('category')
 def WALKING (row4):
   if row4["H1DA6"] == 0 :
       return 0
   elif row4["H1DA6"] == 1 :
       return 0
   elif row4["H1DA6"] == 2 :
       return 1
   elif row4["H1DA6"] == 3 :
       return 1
maindata["WALKING"] = maindata.apply(lambda row4: WALKING (row4), axis=1) seaborn.catplot(x="ALCOHOL_FREQ", y="WALKING", data=maindata, kind="bar"); plt.xlabel("Alcohol consumption by the adolescents in the form of no. of drinks/year") plt.ylabel("Walking/Jogging performed")
Outputs of the bivariate graphs :
The values shall lie between 0 to 1. (0 denotes NO involvement in physical activities, 1 denotes involvement in physical activities)
The description of the graphs are mentioned in step 3
Tumblr media Tumblr media Tumblr media Tumblr media
 STEP 3 :  Write a few sentences describing what your graphs reveal in terms of your individual variables and the relationship between them.
In the above graphs I have split the number of drinks into three categories. 0-104 drinks/year is considered to be low alcohol consumption in adolescents 104-260 drinks/year is considered to be moderate alcohol consumption in adolescents 260+ drinks/year is considered to be high alcohol consumption in adolescents
I have generated bi-variate graphs for every daily activity and compared it with the alcohol consumption of the adolescents. From the 1st graph of “House Chores”, it is clearly visible that the adolescents who drink the least have a slightly higher interest in the house chores, however, there is not much difference between the 3 categories.
From the 2nd graph of “Cycling”, moderate drinking adolescents did highest cycling in the week. But if we see the number closely, all the three bars lie below average involvement in cycling activity. Thus all 3 categories are not physically active in terms of cycling/skating.
From the 3rd graph of “Active Sport”, high drinking adolescents are found to be the most active on the field as compared to the other 2 categories. However, the difference isn’t large.
From the 4th graph of “Walking/Jogging”, the lowest and highest drinkers lie on the average of being physically active.
Through all the analysis and by taking the average, it’s clear that my hypothesis didn’t prove right. Initial hypothesis : “ High alcohol consumption is positively related to high physical activity” However, through this observational data, there is no correlation to be found. Almost like, they are not dependent on each other as: The avg association between LOW alcohol consumption and physical activities is 0.435. The avg association between MODERATE alcohol consumption and physical activities is 0.424 The avg association between HIGH alcohol consumption and physical activities is 0.445
0 notes
srushtisuresh · 5 years ago
Text
WEEK 3 : DATA MANAGEMENT AND VISUALIZATION
Step 1 : Post your program
(The code I have posted is specifically for week 3)
# The maindata is created where missing data and data where # the adolescents have refused to answer have been eliminated
subdata=data[(data['H1DA1']>=0) & (data['H1DA1']<=3) & (data['H1DA3']>=0) & (data['H1DA3']<=3) & (data['H1DA4']>=0) & (data['H1DA4']<=3) & (data['H1DA5']>=0) & (data['H1DA5']<=3) & (data['H1DA6']>=0) & (data['H1DA6']<=3) & (data['H1TO15']>=1) & (data['H1TO15']<=7) & (data['H1TO16']>=1) & (data['H1TO16']<=90) & (data['H1TO17']>=1) & (data['H1TO17']<=7) & (data['H1TO18']>=1) & (data['H1TO18']<=7) & (data['H1FS6']>=0) & (data['H1FS6']<=3) & (data['H1FS7']>=0) & (data['H1FS7']<=3) & (data['H1FS11']>=0) & (data['H1FS11']<=3) & (data['H1FS15']>=0) & (data['H1FS15']<=3) & (data['H1FS16']>=0) & (data['H1FS16']<=3) & (data['H1FS19']>=0) & (data['H1FS19']<=3)]                        
 maindata=subdata.copy()
 print("----------------------------------------------------")
print("The chosen subset consists of : ")
print("----------------------------------------------------")
 print('Var1DA- Number of times house chores were performed in the last week')
DAC10 = maindata['H1DA1'].value_counts().sort_index(ascending=True)
print(DAC10)
 print('Var1DA- % of the times house chores were performed in the last week')
DAP10 = maindata["H1DA1"].value_counts(normalize=True).sort_index(ascending=True)
print(DAP10)
 print('Var2DA- Number of times bicyling was done in the last week')
DAC20 = maindata["H1DA4"].value_counts().sort_index(ascending=True)
print(DAC20)
 print('Var2DA- % of the times bicycling was done in the last week')
DAP20 = maindata["H1DA4"].value_counts(normalize=True).sort_index(ascending=True)
print(DAP20)
 print('Var3DA- Number of times an active sport was played in the last week')
DAC30 = maindata["H1DA5"].value_counts().sort_index(ascending=True)
print(DAC30)
 print('Var3DA- % of the times an active sport was played in the last week')
DAP30 = maindata["H1DA5"].value_counts(normalize=True).sort_index(ascending=True)
print(DAP30)
 print('Var4DA- Number of times walking/jogging was done in the last week')
DAC40 = maindata["H1DA6"].value_counts().sort_index(ascending=True)
print(DAC40)
 print('Var4DA- % of the times walking/jogging was done in the last week')
DAP40 = maindata["H1DA6"].value_counts(normalize=True).sort_index(ascending=True)
print(DAP40)
 print('Var1AC- Number of times alcohol was consumed in the past 12 months')
ACC10 = maindata["H1TO15"].value_counts().sort_index(ascending=True)
print(ACC10)
 print('Var1AC- % of the times alcohol was consumed in the past 12 months')
ACP10 = maindata["H1TO15"].value_counts(normalize=True).sort_index(ascending=True)
print(ACP10)
 print('Var2AC- Number of drinks usually had each time')
ACC20 = maindata["H1TO16"].value_counts().sort_index(ascending=True)
print(ACC20)
 print('Var2AC- % of the number of drinks usually had each time')
ACP20 = maindata["H1TO16"].value_counts(normalize=True).sort_index(ascending=True)
print(ACP20)
 print('Var3AC- Number of days, 5 or more drinks were consumed')
ACC30 = maindata["H1TO17"].value_counts().sort_index(ascending=True)
print(ACC30)
 print('Var3AC- % of the number of days, 5 or more drinks were consumed')
ACP30 = maindata["H1TO17"].value_counts(normalize=True).sort_index(ascending=True)
print(ACP30)
 print('Var4AC- Number of highly drunk days in the past 12 months')
ACC40 = maindata["H1TO18"].value_counts().sort_index(ascending=True)
print(ACC40)
 print('Var4AC- % of the number of highly drunk days in the past 12 months')
ACP40 = maindata["H1TO18"].value_counts(normalize=True).sort_index(ascending=True)
print(ACP40)
 print('Var1FS- Count of the adolescents who felt depressed')
FSC10 = maindata["H1FS6"].value_counts().sort_index(ascending=True)
print(FSC10)
 print('Var1FS- % of the adolescents who felt depressed')
FSP10 = maindata["H1FS6"].value_counts(normalize=True).sort_index(ascending=True)
print(FSP10)
 print('Var2FS- Count of the adolescents who felt too tired to do things')
FSC20 = maindata["H1FS7"].value_counts().sort_index(ascending=True)
print(FSC20)
 print('Var2FS- % of the adolescents who felt too tired to do things')
FSP20 = maindata["H1FS7"].value_counts(normalize=True).sort_index(ascending=True)
print(FSP20)
 print('Var3FS- Count of the adolescents who are happy')
FSC30 = maindata["H1FS11"].value_counts().sort_index(ascending=True)
print(FSC30)
 print('Var3FS- % of the adolescents who are happy')
FSP30 = maindata["H1FS11"].value_counts(normalize=True).sort_index(ascending=True)
print(FSP30)
 print('Var4AC- Number of highly drunk days in the past 12 months')
ACC40 = maindata["H1TO18"].value_counts().sort_index(ascending=True)
print(ACC40)
 print('Var4AC- % of the number of highly drunk days in the past 12 months')
ACP40 = maindata["H1TO18"].value_counts(normalize=True).sort_index(ascending=True)
print(ACP40)
 print('Var1FS- Count of the adolescents who felt depressed')
FSC10 = maindata["H1FS6"].value_counts().sort_index(ascending=True)
print(FSC10)
 print('Var1FS- % of the adolescents who felt depressed')
FSP10 = maindata["H1FS6"].value_counts(normalize=True).sort_index(ascending=True)
print(FSP10)
 print('Var2FS- Count of the adolescents who felt too tired to do things')
FSC20 = maindata["H1FS7"].value_counts().sort_index(ascending=True)
print(FSC20)
 print('Var2FS- % of the adolescents who felt too tired to do things')
FSP20 = maindata["H1FS7"].value_counts(normalize=True).sort_index(ascending=True)
print(FSP20)
 print('Var3FS- Count of the adolescents who are happy')
FSC30 = maindata["H1FS11"].value_counts().sort_index(ascending=True)
print(FSC30)
 print('Var3FS- % of the adolescents who are happy')
FSP30 = maindata["H1FS11"].value_counts(normalize=True).sort_index(ascending=True)
print(FSP30)
 print('Var4FS- Count of the adolescents who enjoyed life')
FSC40 = maindata["H1FS15"].value_counts().sort_index(ascending=True)
print(FSC40)
 print('Var4FS- % of the adolescents who enjoyed life')
FSP40 = maindata["H1FS15"].value_counts(normalize=True).sort_index(ascending=True)
print(FSP40)
 print('Var5FS- Count of the adolescents who felt sad')
FSC50 = maindata["H1FS16"].value_counts().sort_index(ascending=True)
print(FSC50)
 print('Var5FS- % of the adolescents who felt sad')
FSP50 = maindata["H1FS16"].value_counts(normalize=True).sort_index(ascending=True)
print(FSP50)
 print('Var6FS- Count of the adolescents who felt that life is not worth living')
FSC60 = maindata["H1FS19"].value_counts().sort_index(ascending=True)
print(FSC60)
 print('Var6FS- % of the adolescents who felt that life is not worth living')
FSP60 = maindata["H1FS19"].value_counts(normalize=True).sort_index(ascending=True)
print(FSP60)
 print('Number of rows present in the dataset \n', len(maindata))
print('Number of columns present in the dataset \n', len(maindata.columns))
 print("To calculate the number of drinks in a year:")
# as 1 means drinking alcohol everyday, that means he is drinking on all 365 days
# as 2 means drinking alcohol 3-5 times a week, that means he is drinking for around 209 days
# as 3 means drinking alcohol 1-2 times a week, that means he is drinking for around 104 days
# as 4 means drinking alcohol 2-3 times a month, that means he is drinking for around 36 days
# as 5 means drinking alcohol once a month, that means he is drinking for around 12 days
# as 6 means drinking alcohol for 1-2 days in the past year
  maindata['H1TO15'] = maindata['H1TO15'].replace(1, 365)
maindata['H1TO15'] = maindata['H1TO15'].replace(2, 209)
maindata['H1TO15'] = maindata['H1TO15'].replace(3, 104)
maindata['H1TO15'] = maindata['H1TO15'].replace(4, 36)
maindata['H1TO15'] = maindata['H1TO15'].replace(5, 12)
maindata['H1TO15'] = maindata['H1TO15'].replace(6, 2)
maindata['ALCOHOL_FREQ'] = maindata['H1TO15'] * maindata['H1TO16']
secondary_variable = maindata[['AID', 'H1TO15', 'H1TO16', 'ALCOHOL_FREQ']]
display(secondary_variable.head(25))
 Step 2 :  display at least 3 of your data managed variables as frequency distributions. 
The first step to manage the data is to create a subset of main data which eliminates all the missing and unanswered data by the adolescents. The subset consists of 100% definite answers under the labels.
Here is the screenshot of my 3 outputs after removing missing and unanswered data:
0 denotes : never or rarely 1 denotes : sometimes 2 denotes : a lot of the time 3 denotes : most of the time or all of the time
Tumblr media
0 denotes : No house chores performed in the week 1 denotes : 1 – 2 times in the week 2 denotes : 3 – 4 times in the week 3 denotes : 5 or more times in the week
Tumblr media
1 denotes : every day or almost every day 2 denotes : 3 to 5 days a week 3 denotes : 1 or 2 days a week 4 denotes : 2 or 3 days a month 5 denotes  : 3-12 times in the past 12 months 6 denotes : 1 or 2 days in the past 12 months
Tumblr media
STEP 3:  Write a few sentences describing these frequency distributions in terms of the values the variables take, how often they take them, the presence of missing data, etc.
As I mentioned earlier the first step I took was to create a subset of the main data that didn’t include unanswered and missing data. This cleaned my subset and ready to use for the analysis.
Another method taught was coding valid data and recording variables. I didn’t use this step in my data as it doesn’t really make a difference in reading the data. As I have already made a subset and mentioned the description of the variable, It didn’t really make a difference. Moreover, reading the variables is not complex in my data set, thus I choose to eliminate this step.
Next step was to create a secondary variable. As I have to know the consumption of alcohol of every adolescent in the entire year, I chose to multiple the two variables. One that said “how many drink the adolescent had each time (H1TO16) ” and second that said “No. of days the adolescent had alcohol in the year (H1TO15) ”. The product of these two variables gave me “No. of drinks consumed by the adolescent in the year”.
Here is the output:
This is the output of the first 20 adolescents:
Tumblr media
0 notes
srushtisuresh · 5 years ago
Text
WEEK 2 : Data management and Visualization
Step 1 : Share your program
To understand the variables, please take a look at my week 1 assignment where I have mentioned the variables that I have chosen from the selected sections.
import pandas
import numpy
# loading the dataset
data = pandas.read_csv('addhealthdata.csv', low_memory=False)
 # Uppercase all dataframe column names
data.columns = map(str.upper, data.columns)
#Bug fix for display formats to avoid run time errors
pandas.set_option('display.float_format', lambda x: '%f'%x)
print('Number of rows present in the dataset \n', len(data))
print('Number of columns present in the dataset \n', len(data.columns))
 print(' ----------------------------------------------- ')
 #Frequency distribution of the variables
print('Var1DA- Number of times house chores were performed in the last week')
DAC1 = data['H1DA1'].value_counts().sort_index(ascending=True)
print(DAC1)
 print('Var1DA- % of the times house chores were performed in the last week')
DAP1 = data["H1DA1"].value_counts(normalize=True).sort_index(ascending=True)
print(DAP1)
 print('Var2DA- Number of times bicyling was done in the last week')
DAC2 = data["H1DA4"].value_counts().sort_index(ascending=True)
print(DAC2)
 print('Var2DA- % of the times bicycling was done in the last week')
DAP2 = data["H1DA4"].value_counts(normalize=True).sort_index(ascending=True)
print(DAP2)
 print('Var3DA- Number of times an active sport was played in the last week')
DAC3 = data["H1DA5"].value_counts().sort_index(ascending=True)
print(DAC3)
 print('Var3DA- % of the times an active sport was played in the last week')
DAP3 = data["H1DA5"].value_counts(normalize=True).sort_index(ascending=True)
print(DAP3)
 print('Var4DA- Number of times walking/jogging was done in the last week')
DAC4 = data["H1DA6"].value_counts().sort_index(ascending=True)
print(DAC4)
 print('Var4DA- % of the times walking/jogging was done in the last week')
DAP4 = data["H1DA6"].value_counts(normalize=True).sort_index(ascending=True)
print(DAP4)
 print('Var1AC- Number of times alcohol was consumed in the past 12 months')
ACC1 = data["H1TO15"].value_counts().sort_index(ascending=True)
print(ACC1)
 print('Var1AC- % of the times alcohol was consumed in the past 12 months')
ACP1 = data["H1TO15"].value_counts(normalize=True).sort_index(ascending=True)
print(ACP1)
 print('Var2AC- Number of drinks usually had each time')
ACC2 = data["H1TO16"].value_counts().sort_index(ascending=True)
print(ACC2)
 print('Var2AC- % of the number of drinks usually had each time')
ACP2 = data["H1TO16"].value_counts(normalize=True).sort_index(ascending=True)
print(ACP2)
 print('Var3AC- Number of days, 5 or more drinks were consumed')
ACC3 = data["H1TO17"].value_counts().sort_index(ascending=True)
print(ACC3)
 print('Var3AC- % of the number of days, 5 or more drinks were consumed')
ACP3 = data["H1TO17"].value_counts(normalize=True).sort_index(ascending=True)
print(ACP3)
 print('Var4AC- Number of highly drunk days in the past 12 months')
ACC4 = data["H1TO18"].value_counts().sort_index(ascending=True)
print(ACC4)
 print('Var4AC- % of the number of highly drunk days in the past 12 months')
ACP4 = data["H1TO18"].value_counts(normalize=True).sort_index(ascending=True)
print(ACP4)
 print('Var1FS- Count of the adolescents who felt depressed')
FSC1 = data["H1FS6"].value_counts().sort_index(ascending=True)
print(FSC1)
 print('Var1FS- % of the adolescents who felt depressed')
FSP1 = data["H1FS6"].value_counts(normalize=True).sort_index(ascending=True)
print(FSP1)
 print('Var2FS- Count of the adolescents who felt too tired to do things')
FSC2 = data["H1FS7"].value_counts().sort_index(ascending=True)
print(FSC2)
 print('Var2FS- % of the adolescents who felt too tired to do things')
FSP2 = data["H1FS7"].value_counts(normalize=True).sort_index(ascending=True)
print(FSP2)
 print('Var3FS- Count of the adolescents who are happy')
FSC3 = data["H1FS11"].value_counts().sort_index(ascending=True)
print(FSC3)
 print('Var3FS- % of the adolescents who are happy')
FSP3 = data["H1FS11"].value_counts(normalize=True).sort_index(ascending=True)
print(FSP3)
 print('Var4FS- Count of the adolescents who enjoyed life')
FSC4 = data["H1FS15"].value_counts().sort_index(ascending=True)
print(FSC4)
 print('Var4FS- % of the adolescents who enjoyed life')
FSP4 = data["H1FS15"].value_counts(normalize=True).sort_index(ascending=True)
print(FSP4)
 print('Var5FS- Count of the adolescents who felt sad')
FSC5 = data["H1FS16"].value_counts().sort_index(ascending=True)
print(FSC5)
 print('Var5FS- % of the adolescents who felt sad')
FSP5 = data["H1FS16"].value_counts(normalize=True).sort_index(ascending=True)
print(FSP5)
 print('Var6FS- Count of the adolescents who felt that life is not worth living')
FSC6 = data["H1FS19"].value_counts().sort_index(ascending=True)
print(FSC6)
 print('Var6FS- % of the adolescents who felt that life is not worth living')
FSP6 = data["H1FS19"].value_counts(normalize=True).sort_index(ascending=True)
print(FSP6)
  Step 2) the output that displays three of your variables as frequency tables
First : Here 0 denotes “No active sports played at all”                   1 denotes “1 – 2 times played in the week”                   2 denotes “3 – 4 times played in the week”                   3 denotes “5 or more times played in the week”
Tumblr media
Second : Here 1 denotes “every day or almost every day”                         2 denotes “3 to 5 days a week”                         3 denotes “1 or 2 days a week”                         4 denotes “2 or 3 days a month”                         5 denotes  “3-12 times in the past 12 months                         6 denotes “ 1 or 2 days in the past 12 months”
Tumblr media
 Third : Here 0 denotes “No jogging/walking in the week”                   1 denotes “1 – 2 times played in the week”                   2 denotes “3 – 4 times played in the week”                   3 denotes “5 or more times played in the week”
Tumblr media
  Step 3 : A few sentences describing your frequency distributions in terms of the values the variables take, how often they take them, the presence of missing data, etc.
I have created a subset of my main data which eliminates all the missing and unanswered data by the adolescents. I haven’t put any filter over the grade they are in as the dataset in itself is on adolescents. The subset consists of 100% definite answers under the labels.
The initial data consists of 6504 rows and 2829 columns but now the dataset that I will be working on consists of 2905 rows and 2829 columns.
I have also created a new subset that will take only those adolescents who are physically active and consume considerable amount of alcohol to check for the way they think about themselves. This was my selected 2nd topic of research under the main topic. By creating this subset, I can study the variation in the emotions of such adolescents who ironically do two things simultaneously that increases and decreases their health conditions at the same time.
From the frequency distribution table we can see that nearly 62% of the adolescents are physically active and around 43% of the adolescents consume alcohol frequently. The challenge in the further weeks is to find out if the categories I mentioned above are true for the same adolescents.
0 notes
srushtisuresh · 5 years ago
Text
WEEK 1 : DATA MANAGEMENT AND VISUALIZATION by Coursera
STEP 1: Choose a data set that you would like to work with.
AddHealth  : National Longitudinal Study of Adolescent Health
 STEP 2. Identify a specific topic of interest
My topic of interest would be “How is alcohol consumption associated with physically active adolescents”. It is the paradox in the sentence that draws my attention to this specific topic. The main aim would be to identify whether they are positively or negatively correlated to each other. Also, being physically active here includes playing active sport/walking/bicycling/household chores.
 STEP 3. Prepare a codebook of your own (i.e., print individual pages or copy screen and paste into a new document) from the larger codebook that includes the questions/items/variables that measure your selected topics.)
I have chosen to pick Section 2 (Daily Activities) and Section 28 (Tobacco, Alcohol, Drugs). To identify the adolescents for further research, I will go with Section 1 (General Introductory) and use it as a key.
From Section 1 (General Introductory) : AID
Section 2: H1DA1 - Household Chores Section 2: H1DA3 - Television / video games Section 2: H1DA4 - Roller-blading / Roller-skating / Skate-boarding / Bicycling Section 2: H1DA5 - Active Sport Section 2: H1DA6 - Jogging / Walking / Gymnastics / Dancing
Section 28: H1TO15 - Alcohol consumption in the past 12 months Section 28: H1TO16 - Number of drinks every time Section 28: H1TO17 - 5 or more drinks in a row Section 28: H1TO18 - Number of days of gotten highly drunk Section 28: H1TO21 - Problems with school work because of drinking
 STEP 4. Identify a second topic that you would like to explore in terms of its association with your original topic.
We are aware about the fact that physical activity is known to improve physical and mental health, while consumption of alcohol does exactly opposite. Adolescents who consume alcohol and are physically active interest me in digging into their emotional scale. “How they feel about themselves and others”. Basically,  “what goes on, in the mind of such adolescents”. To finally conclude their proclivity towards fulfillment or dissatisfaction with life.
 STEP 5. Add questions/items/variables documenting this second topic to your personal codebook.
I have chosen Section 10 : Feelings scale for my second topic in terms of its association with my original topic.
Section 10: H1FS6 – Felt depressed Section 10: H1FS7 – Felt too tired to do things Section 10: H1FS6 – Felt depressed Section 10: H1FS9 – Thought life to be a failure Section 10: H1FS11 – Felt happy Section 10: H1FS15 – Enjoy life Section 10: H1FS16 – Felt sad Section 10: H1FS19 – Felt life was not worth living
 STEP 6. Perform a literature review to see what research has been previously done on this topic. Use sites such as Google Scholar (http://scholar.google.com) to search for published academic work in the area(s) of interest. Try to find multiple sources, and take note of basic bibliographic information.
STEP 7. Based on your literature review, develop a hypothesis about what you believe the association might be between these topics. Be sure to integrate the specific variables you selected into the hypothesis.
Combining Step 6 and 7:   Keywords : Physical activity, Exercise, Alcohol Consumption, Drinking, Sports, School, Health Behavior
My topic of interest shows the correlation between the people who are physically active and their consumption of alcohol. Based on the published academic work related to this topic, it has been concluded that alcohol consumption and physical activity are positively correlated across all ages.
Various other results related to this topic are - There seems to be a curvilinear association between sports participation and frequency of drinking. The results provide strong support for the existence of an incongruous positive association between alcohol consumption and physical activity in college students. The positive association persists at heavy drinking levels.
Thus, my hypothesis would be that there exists a positive correlation between physical activity and alcohol consumption.
References:
1. Examining Physical Activity Levels and Alcohol Consumption: Are People Who Drink More Active? Anna K. Piazza-Gardner MS, Adam E. Barry, PhD, First Published January 1, 2012 : https://journals.sagepub.com/doi/abs/10.4278/ajhp.100929-LIT-328
2. Do Alcohol Consumers Exercise More? Findings from a National Survey. Michael T. French, PhD, Ioana Popovici, PhD, Johanna Catherine Maclean, MA, First Published September 1, 2009 : https://journals.sagepub.com/doi/abs/10.4278/ajhp.0801104
3. Drink and be Active? The Associations Between Drinking and Participation in Sports, Jean Lock Kunz | Published online: 11 Jul 2009 : https://www.tandfonline.com/doi/abs/10.3109/16066359709004359
4. The incongruous alcohol-activity association: Physical activity and alcohol consumption in college students, Jessica R.B.Musselman Patricia C.Rutledge, Received 16 July 2009, Revised 7 July 2010, Accepted 7 July 2010, Available online 15 July 2010 : https://www.sciencedirect.com/science/article/pii/S1469029210000944
2 notes · View notes