#LifeExpectancy
Explore tagged Tumblr posts
Text
Tumblr media Tumblr media Tumblr media
Sean bienvenidos japonistasarqueologicos a una nueva actualidad del país del sol naciente, en esta ocasión en el tema de actualidad de esta opción será la natalidad y de cómo afectará al país dicho esto pónganse cómodos que empezamos. - La esperanza de vida en Japón es la más alta del mundo, llegando a los 100 años, lo cual lo convierte en el país más longevo del mundo, Japón para 2046, según nuevos estudios llevados a cabo por National Geographic, será de 37.5% de población envejecida. - La tasa de natalidad y los estudios realizados por Statista Research Department en septiembre 29, 2022 natalidades son de 1,69. Seguramente llegue a un punto en el que se llegará a un equilibrio entre la tasa de natalidad y la tasa de mortalidad. - ¿Cuál será el destino de Japón? ¿Qué opinan al respecto? Espero que os haya gustado y nos vemos en próximas publicaciones que pasen una buena semana. - 今回は「出生率」と「出生率の推移」についてです。 - 日本の平均寿命は、それが世界で最も長く生きている国になり、100年に達し、世界で最も高いです、日本は2046年までに、ナショナルジオグラフィックが実施した新しい研究によると、人口の37.5%が高齢化されることになります。 - 2022年9月29日��Statista Research Departmentが行った研究によると、出生率・出生数は1.69である。おそらく、出生率と死亡率が均衡する地点に達するだろう。 - 日本の運命はどうなるのか、あなたはどう思いますか?私はあなたがそれを好きで、将来の記事であなたを参照してください願っています良い週を持っています。 - Welcome japanistasarqueologicos to a new news from the country of the rising sun, this time the topic of this option will be the birth rate and how it will affect the country, that said, make yourselves comfortable and let's start. - Life expectancy in Japan is the highest in the world, reaching 100 years, which makes it the longest living country in the world, Japan by 2046, according to new studies conducted by National Geographic, will be 37.5% of the population aged. - The birth rate and birth rates are 1.69, according to studies conducted by Statista Research Department on September 29, 2022. It will probably reach a point where a balance will be reached between the birth rate and the death rate. - What will Japan's fate be, and what do you think about it? I hope you liked it and see you in future posts have a good week.
32 notes · View notes
socialismforall · 1 year ago
Text
Weekly COVID-19 Update for 2023-12-24
COVID is still airborne, and COVID still very much isn't over.
Northeastern and Midwestern USA SARS2 virus levels in wastewater are *soaring*, Northeast is currently at 1500 copies/mL (~750 copies indicates a strong surge), and Midwest is at 1300 copies/mL. Southeastern and Western USA are maintaining relatively lower levels between 600 and 700 overall, but both are still climbing. See https://biobot.io/data for county-specific data as results can vary widely between locales.
Tumblr media
How to reduce your risk of infection? The SARS2 virus is airborne and can spread like smoke, so #MaskUp with an #N95 or better, avoid superspreader events and locations, and stay up-to-date on your boosters. Do it for yourself, so you don't catch SARS-CoV-2, and for others, so you don't spread SARS-CoV-2. Even if you're fully vaccinated, your risk of developing #LongCOVID following an infection is lower but not zero, and multiple reinfections increase your odds of negative health outcomes. Plan A always should be to prevent an infection from developing by wearing a respirator with a good seal around your mouth and nose (FFP2, FFP3, KN95, N95, N99, P100, etc.).
Holiday tips:
-If someone tells you that COVID is over, you might ask them why, if we didn't consider COVID to be over in 2020 or 2021, when the COVID wastewater levels were lower, why should we consider it over now, when the virus is circulating in even higher amounts?
-"Fewer cases" doesn't mean much when most of the at-home rapid tests don't get counted in official records, and the most accurate PCR tests are neither freely available nor given to everyone getting on a plane or attending classes.
-"Fewer deaths" also means less when you remember that about 1,200,000 of the most vulnerable people already have died from it, COVID-19 remains the #3 cause of death in 2023 (behind heart disease and cancer, the risk of both of which may be increased by COVID), and the risk of a Long COVID/post-acute COVID syndrome (PACS) disability or other potentially life-shortening organ damage (brain, kidney, lung, immune, etc.) isn't measured just by the death count. Also, the USA's life expectancy still hasn't recovered from the drop it experienced following the start of the pandemic.
source: https://biobot.io/data
source: https://www.webmd.com/a-to-z-guides/news/20231006/these-are-the-top-10-causes-of-death-in-the-us
source: https://publichealth.jhu.edu/2022/covid-and-the-heart-it-spares-no-one
source: https://pubmed.ncbi.nlm.nih.gov/33914346/
source: https://www.usatoday.com/story/news/nation/2023/11/29/average-us-life-expectancy-increased-not-pre-covid/71738611007/
47 notes · View notes
data-diaries · 7 days ago
Text
Tumblr media
Title : What is the relationship between income per person and life expectancy across different countries? Explanantion: The scatter plot below visually represents the relationship between income per person and life expectancy. It shows that as income increases, life expectancy generally tends to rise, indicating a positive correlation.
Observations:
Countries with higher income per person generally have a higher life expectancy.
There are a few outliers where life expectancy is low despite moderate income levels.
1 note · View note
andromedacancerhospital · 6 months ago
Text
Men's average life expectancy is 3 to 7 years less than women's. Discover the reasons behind this gap, from lifestyle choices to job-related risks, and learn how we can address men's health issues.
youtube
0 notes
marketagent · 10 months ago
Text
Tumblr media
In the second edition of #𝐦𝐨𝐧𝐝𝐚𝐲𝐟𝐚𝐜𝐭𝐬 we take a look at the countries with the highest average life expectancy in 2023 according to Worldometer:
🇭🇰 Hong Kong: 85.83
🇲🇴 Macao: 85.51
🇯🇵 Japan: 84.95
🇨🇭 Switzerland: 84.38
🇸🇬 Singapore: 84.27
🇮🇹 Italy: 84.20
🇰🇷 South Korea: 84.14
🇪🇸 Spain: 84.05
🇲🇹 Malta: 83.85
🇦🇺 Australia: 83.73
0 notes
sofitechfashion · 1 year ago
Text
Losing weight increases life expectancy.
It indeed depends on how much weight you need to lose and other lifestyle factors. However, if you follow some simple rules, for example, someone who is obese (BMI of 30 or over) and loses 5–10% of their body weight will see significant improvements in health and life expectancy. If you're already at a healthy weight (BMI below 25), then losing weight probably won't have much of an impact on your longevity, but that doesn't mean you have to give up your healthy lifestyle. There are a few different ways that losing weight can help you live longer:
1. By reducing the risk of developing obesity-related diseases such as type 2 diabetes
2. heart disease and certain types of cancer
3. It helps you maintain healthy blood pressure, cholesterol, and blood sugar levels.
4. It gives you more energy and improves your overall mood and quality of life.
Tumblr media
Read more about life expectancy in this article
1 note · View note
preponias · 1 year ago
Text
Tumblr media
India Ageing Report 2023
0 notes
usnewsper-business · 1 year ago
Text
You Are Likely To Live Longer Than You Think, According To Minister Mette Kierkgaard #10thArddMeeting #ArddMeeting #DayFourOfThe10thArddMeetingFeaturingMinisterMetteKierkgaard #lifeexpectancy #MinisterMetteKierkgaard #YouAreLikelyToLiveLongerThanYouThink
0 notes
usnewsper-politics · 1 year ago
Text
You Are Likely To Live Longer Than You Think, According To Minister Mette Kierkgaard #10thArddMeeting #ArddMeeting #DayFourOfThe10thArddMeetingFeaturingMinisterMetteKierkgaard #lifeexpectancy #MinisterMetteKierkgaard #YouAreLikelyToLiveLongerThanYouThink
0 notes
city-of-longevity · 1 year ago
Text
How long people in your city are expected to live?
Tumblr media
0 notes
goddessgardener · 2 years ago
Text
Walking RX, Hobbies, Life-Expectancy
Scientific evidence shows that walking is brain medicine. It grows brain cells, boost creativity, and enhances your mood. Are you ready to get up and get going? Do you have a hobby? If you could do anything just for fun, what would it be? Discover what hobbies may be waiting for you with simple strategies from experts. You’ll be healthier and happier. Living a long life in America is at its 100…
Tumblr media
View On WordPress
https://www.voiceamerica.com/episode/145165/walking-rx-hobbies-life-expectancy
0 notes
laloluna921 · 8 months ago
Text
The Interplay of Socioeconomic Status and Alcohol Consumption: Implications for Life Expectancy
I’ve chosen the NESARC dataset about life expectancy associated with alcohol consumption. This dataset is rich and provides a lot of interesting variables to explore.
This is a topic that has always intrigued me and I believe this dataset provides a great opportunity to explore it further.
CodeBook
Variable Name
Description
alcconsumption
2008 alcohol consumption per adult (age 15+), litres
lifeexpectancy
2011 life expectancy at birth (years)
Questions:
Is there a correlation between per capita income (income_per_person) and life expectancy (life_expectancy)?
How does alcohol consumption (alcohol_consumption) vary with per capita income (income_per_person)?
Is there a correlation between the level of education (education_level) and alcohol consumption (alcohol_consumption)?
How does alcohol consumption (alcohol_consumption) affect life expectancy (life_expectancy)?
Is there a difference in alcohol consumption (alcohol_consumption) and life expectancy (life_expectancy) between genders (gender)?
Variables:
Per capita income (income_per_person)
Life expectancy (life_expectancy)
Alcohol consumption (alcohol_consumption)
Level of education (education_level)
Gender (gender)
incomeperperson   
This is the Gross Domestic Product per capita in constant 2000 US$
New CodeBook
income_per_person
This variable represents the per capita income for each country. It’s a numerical variable measured in international dollars, fixed 2011 prices.
life_expectancy
This variable indicates the average number of years a newborn child would live if current mortality patterns were to stay the same throughout its life. It’s a numerical variable measured in years.
alcohol_consumption
This variable represents the recorded and estimated average alcohol consumption, adult (15+) per capita consumption in liters pure alcohol. It’s a numerical variable measured in liters.
education_level:
This variable indicates the average years of schooling for adults aged 25 and older. It’s a numerical variable measured in years.
References
Hawkins, B.R., & McCambridge, J. (2023). Association Between Daily Alcohol Intake and Risk of All-Cause Mortality: A Systematic Review and Meta-analyses. JAMA Network Open.
This study found that daily low or moderate alcohol intake was not significantly associated with all-cause mortality risk, while increased risk was evident at higher consumption levels, starting at lower levels for women than men.
Murakami, K., & Hashimoto, H. (2019). Associations of education and income with heavy drinking and problem drinking among men: evidence from a population-based study in Japan. BMC Public Health.
The study revealed that lower educational attainment was significantly associated with increased risks of both non-problematic heavy drinking and problem drinking. Lower income was significantly associated with a lower risk of non-problematic heavy drinking, but not of problem drinking.
Nooyens, A.C.J., Bueno-de-Mesquita, H.B., van Boxtel, M.P.J., van Gelder, B.M., Verhagen, H., & Verschuren, W.M.M. (2020). Alcohol consumption in later life and reaching longevity: the Netherlands Cohort Study. Age and Ageing.
The study found that in women, the total consumption of alcoholic beverages was inversely associated with the decline in global cognitive function over a 5-year period. Red wine consumption was inversely associated with the decline in global cognitive function as well as memory and flexibility.
Rigelsky, M., & Zelenka, V. (2021). Does Alcohol Consumption Affect Life Expectancy in OECD Countries. ResearchGate.
The research concluded that higher income was associated with greater longevity throughout the income distribution. The gap in life expectancy between the richest 1% and poorest 1% of individuals was 14.6 years for men and 10.1 years for women.
Chetty, R., Stepner, M., Abraham, S., Lin, S., Scuderi, B., Turner, N., Bergeron, A., & Cutler, D. (2016). The Association Between Income and Life Expectancy in the United States, 2001-2014. JAMA.
The study found that higher income was associated with greater longevity, and differences in life expectancy across income groups increased over time. Life expectancy for low-income individuals varied substantially across local areas
Given the variables selected from the Gapminder dataset life expectancy, alcohol consumption, and income per person.
Hypothesis
The socioeconomic status, characterized by factors such as income and education, along with lifestyle choices like alcohol consumption, significantly impacts an individual’s life expectancy and overall health. Specifically, higher income and education levels may be associated with lower risks of heavy and problematic drinking, which in turn could lead to increased longevity. However, the relationship between alcohol consumption and health outcomes might be complex and influenced by factors such as the type and amount of alcohol consumed, and the individual’s overall lifestyle and genetic predisposition.
2 notes · View notes
dataanalysisinfo · 9 months ago
Text
Exploring Global Longevity: Analyzing Life Expectancy and Urbanization Trends Across Nations
I would like to know more about the relation between climate change and urbanization and how this affects people’s lives all around the globe. For this reason,  I selected the database from the Gapminder codebook {Gapminder codebook (.pdf)}
Specifically, my Research Question is: Does life expectancy associated with urban rate per country?
So, I decided that I am most interested in exploring environmental factors of urban rate, in this case CO2 emissions and residential electricity consumption, that affect life expectancy dependence.
Sub-research Question: Do environmental factors like CO2 emissions and residential electricity consumption impact life expectancy in urban areas?
The variables of the research questions derived from the Gapminder codebook: co2emissions, lifeexpectancy, relectricperperson, urbanrate. (You can see the image in the end that I created an Excel shit with only these variables).
I have two hypothesis based on the results I found:
1.  That the more people are gathered in urban centers, the higher the technological development and the higher the industrialization rates, and this ultimately increases pollution that affects life expectancy in urban areas.
2. A positive relationship between CO2 emissions and life expectancy in West Africa. CO2 emissions may indirectly contribute to improved life expectancy through mechanisms such as enhanced healthcare infrastructure and increased access to medical services facilitated by economic activities associated with CO2 emissions, notably industrialization.
My hypothesis is based on the following literature review:
Elevated CO2 emissions in urban areas are expected to negatively impact life expectancy due to increased pollution. Prolonged exposure to high CO2 levels can result in respiratory and cardiovascular health problems, thereby reducing life expectancy. Additionally, electricity rates may indirectly affect urban CO2 emissions by shaping energy consumption behaviors.https://www.sciencedirect.com/science/article/pii/S2352550921001950
Reducing exposure to ambient fine-particulate air pollution led to notable and measurable enhancements in life expectancy in the United States.https://www.nejm.org/doi/full/10.1056/NEJMsa0805646
The detrimental impact of CO2 emissions on agricultural output, they might indirectly contribute positively to life expectancy in West Africa. Possible explanations for this unexpected relationship include enhancements in healthcare infrastructure and accessibility to medical services driven by economic activities linked to CO2 emissions.https://ojs.jssr.org.pk/index.php/jssr/article/view/115
The study highlights that CO2 emissions negatively impact life expectancy in both Asian and African countries, potentially due to increased urban pollution and deteriorating air quality. Economic progression has a mixed impact on life expectancy, with a negative overall effect but a positive influence observed in the highest economic quantile. This suggests that while economic growth may enhance life expectancy under certain conditions, it can also lead to negative health outcomes in urban areas due to pollution and lifestyle changes.https://ojs.jssr.org.pk/index.php/jssr/article/view/115
Tumblr media
2 notes · View notes
monuonrise · 2 years ago
Text
Running a k-means Cluster Analysis:
Machine Learning for Data Analysis
Week 4: Running a k-means Cluster Analysis
A k-means cluster analysis was conducted to identify underlying subgroups of countries based on their similarity of responses on 7 variables that represent characteristics that could have an impact on internet use rates. Clustering variables included quantitative variables measuring income per person, employment rate, female employment rate, polity score, alcohol consumption, life expectancy, and urban rate. All clustering variables were standardized to have a mean of 0 and a standard deviation of 1.
Because the GapMinder dataset which I am using is relatively small (N < 250), I have not split the data into test and training sets. A series of k-means cluster analyses were conducted on the training data specifying k=1-9 clusters, using Euclidean distance. The variance in the clustering variables that was accounted for by the clusters (r-square) was plotted for each of the nine cluster solutions in an elbow curve to provide guidance for choosing the number of clusters to interpret.
Load the data, set the variables to numeric, and clean the data of NA values
In [1]:''' Code for Peer-graded Assignments: Running a k-means Cluster Analysis Course: Data Management and Visualization Specialization: Data Analysis and Interpretation ''' import pandas as pd import numpy as np import matplotlib.pyplot as plt import statsmodels.formula.api as smf import statsmodels.stats.multicomp as multi from sklearn.cross_validation import train_test_split from sklearn import preprocessing from sklearn.cluster import KMeans data = pd.read_csv('c:/users/greg/desktop/gapminder.csv', low_memory=False) data['internetuserate'] = pd.to_numeric(data['internetuserate'], errors='coerce') data['incomeperperson'] = pd.to_numeric(data['incomeperperson'], errors='coerce') data['employrate'] = pd.to_numeric(data['employrate'], errors='coerce') data['femaleemployrate'] = pd.to_numeric(data['femaleemployrate'], errors='coerce') data['polityscore'] = pd.to_numeric(data['polityscore'], errors='coerce') data['alcconsumption'] = pd.to_numeric(data['alcconsumption'], errors='coerce') data['lifeexpectancy'] = pd.to_numeric(data['lifeexpectancy'], errors='coerce') data['urbanrate'] = pd.to_numeric(data['urbanrate'], errors='coerce') sub1 = data.copy() data_clean = sub1.dropna()
Subset the clustering variables
In [2]:cluster = data_clean[['incomeperperson','employrate','femaleemployrate','polityscore', 'alcconsumption', 'lifeexpectancy', 'urbanrate']] cluster.describe()
Out[2]:incomeperpersonemployratefemaleemployratepolityscorealcconsumptionlifeexpectancyurbanratecount150.000000150.000000150.000000150.000000150.000000150.000000150.000000mean6790.69585859.26133348.1006673.8933336.82173368.98198755.073200std9861.86832710.38046514.7809996.2489165.1219119.90879622.558074min103.77585734.90000212.400000-10.0000000.05000048.13200010.40000025%592.26959252.19999939.599998-1.7500002.56250062.46750036.41500050%2231.33485558.90000248.5499997.0000006.00000072.55850057.23000075%7222.63772165.00000055.7250009.00000010.05750076.06975071.565000max39972.35276883.19999783.30000310.00000023.01000083.394000100.000000
Standardize the clustering variables to have mean = 0 and standard deviation = 1
In [3]:clustervar=cluster.copy() clustervar['incomeperperson']=preprocessing.scale(clustervar['incomeperperson'].astype('float64')) clustervar['employrate']=preprocessing.scale(clustervar['employrate'].astype('float64')) clustervar['femaleemployrate']=preprocessing.scale(clustervar['femaleemployrate'].astype('float64')) clustervar['polityscore']=preprocessing.scale(clustervar['polityscore'].astype('float64')) clustervar['alcconsumption']=preprocessing.scale(clustervar['alcconsumption'].astype('float64')) clustervar['lifeexpectancy']=preprocessing.scale(clustervar['lifeexpectancy'].astype('float64')) clustervar['urbanrate']=preprocessing.scale(clustervar['urbanrate'].astype('float64'))
Split the data into train and test sets
In [4]:clus_train, clus_test = train_test_split(clustervar, test_size=.3, random_state=123)
Perform k-means cluster analysis for 1-9 clusters
In [5]:from scipy.spatial.distance import cdist clusters = range(1,10) meandist = [] for k in clusters: model = KMeans(n_clusters = k) model.fit(clus_train) clusassign = model.predict(clus_train) meandist.append(sum(np.min(cdist(clus_train, model.cluster_centers_, 'euclidean'), axis=1)) / clus_train.shape[0])
Plot average distance from observations from the cluster centroid to use the Elbow Method to identify number of clusters to choose
In [6]:plt.plot(clusters, meandist) plt.xlabel('Number of clusters') plt.ylabel('Average distance') plt.title('Selecting k with the Elbow Method') plt.show()
Tumblr media
64.media.tumblr.com
Interpret 3 cluster solution
In [7]:model3 = KMeans(n_clusters=4) model3.fit(clus_train) clusassign = model3.predict(clus_train)
Plot the clusters
In [8]:from sklearn.decomposition import PCA pca_2 = PCA(2) plt.figure() plot_columns = pca_2.fit_transform(clus_train) plt.scatter(x=plot_columns[:,0], y=plot_columns[:,1], c=model3.labels_,) plt.xlabel('Canonical variable 1') plt.ylabel('Canonical variable 2') plt.title('Scatterplot of Canonical Variables for 4 Clusters') plt.show()
Tumblr media
64.media.tumblr.com
Begin multiple steps to merge cluster assignment with clustering variables to examine cluster variable means by cluster.
Create a unique identifier variable from the index for the cluster training data to merge with the cluster assignment variable.
In [9]:clus_train.reset_index(level=0, inplace=True)
Create a list that has the new index variable
In [10]:cluslist = list(clus_train['index'])
Create a list of cluster assignments
In [11]:labels = list(model3.labels_)
Combine index variable list with cluster assignment list into a dictionary
In [12]:newlist = dict(zip(cluslist, labels)) print (newlist) {2: 1, 4: 2, 6: 0, 10: 0, 11: 3, 14: 2, 16: 3, 17: 0, 19: 2, 22: 2, 24: 3, 27: 3, 28: 2, 29: 2, 31: 2, 32: 0, 35: 2, 37: 3, 38: 2, 39: 3, 42: 2, 45: 2, 47: 1, 53: 3, 54: 3, 55: 1, 56: 3, 58: 2, 59: 3, 63: 0, 64: 0, 66: 3, 67: 2, 68: 3, 69: 0, 70: 2, 72: 3, 77: 3, 78: 2, 79: 2, 80: 3, 84: 3, 88: 1, 89: 1, 90: 0, 91: 0, 92: 0, 93: 3, 94: 0, 95: 1, 97: 2, 100: 0, 102: 2, 103: 2, 104: 3, 105: 1, 106: 2, 107: 2, 108: 1, 113: 3, 114: 2, 115: 2, 116: 3, 123: 3, 126: 3, 128: 3, 131: 2, 133: 3, 135: 2, 136: 0, 139: 0, 140: 3, 141: 2, 142: 3, 144: 0, 145: 1, 148: 3, 149: 2, 150: 3, 151: 3, 152: 3, 153: 3, 154: 3, 158: 3, 159: 3, 160: 2, 173: 0, 175: 3, 178: 3, 179: 0, 180: 3, 183: 2, 184: 0, 186: 1, 188: 2, 194: 3, 196: 1, 197: 2, 200: 3, 201: 1, 205: 2, 208: 2, 210: 1, 211: 2, 212: 2}
Convert newlist dictionary to a dataframe
In [13]:newclus = pd.DataFrame.from_dict(newlist, orient='index') newclus
Out[13]:0214260100113142163170192222243273282292312320352373382393422452471533543551563582593630......145114831492150315131523153315431583159316021730175317831790180318321840186118821943196119722003201120522082210121122122
105 rows × 1 columns
Rename the cluster assignment column
In [14]:newclus.columns = ['cluster']
Repeat previous steps for the cluster assignment variable
Create a unique identifier variable from the index for the cluster assignment dataframe to merge with cluster training data
In [15]:newclus.reset_index(level=0, inplace=True)
Merge the cluster assignment dataframe with the cluster training variable dataframe by the index variable
In [16]:merged_train = pd.merge(clus_train, newclus, on='index') merged_train.head(n=100)
Out[16]:indexincomeperpersonemployratefemaleemployratepolityscorealcconsumptionlifeexpectancyurbanratecluster0159-0.393486-0.0445910.3868770.0171271.843020-0.0160990.79024131196-0.146720-1.591112-1.7785290.498818-0.7447360.5059900.6052111270-0.6543650.5643511.0860520.659382-0.727105-0.481382-0.2247592329-0.6791572.3138522.3893690.3382550.554040-1.880471-1.9869992453-0.278924-0.634202-0.5159410.659382-0.1061220.4469570.62033335153-0.021869-1.020832-0.4073320.9805101.4904110.7233920.2778493635-0.6665191.1636281.004595-0.785693-0.715352-2.084304-0.7335932714-0.6341100.8543230.3733010.177691-1.303033-0.003846-1.24242828116-0.1633940.119726-0.3394510.338255-1.1659070.5304950.67993439126-0.630263-1.446126-0.3055100.6593823.1711790.033923-0.592152310123-0.163655-0.460219-0.8010420.980510-0.6448300.444628-0.560127311106-0.640452-0.2862350.1153530.659382-0.247166-2.104758-1.317152212142-0.635480-0.808186-0.7874660.0171271.155433-1.731823-0.29859331389-0.615980-2.113062-2.423400-0.625129-1.2442650.0060770.512695114160-0.6564731.9852172.199302-1.1068200.620643-1.371039-1.63383921556-0.430694-0.102586-0.2240530.659382-0.5547190.3254460.250272316180-0.559059-0.402224-0.6041870.338255-1.1776610.603401-1.777949317133-0.419521-1.668438-0.7331610.3382551.032020-0.659900-0.81098631831-0.618282-0.0155940.061048-1.2673840.211226-1.7590620.075026219171.801349-1.030498-0.4344840.6593820.7029191.1165791.8808550201450.447771-0.827517-1.731013-1.909640-1.1561120.4042250.7359771211000.974856-0.034925-0.0068330.6593822.4150301.1806761.173646022178-0.309804-1.755430-0.9368040.8199460.653945-1.6388680.2520513231732.6193200.3033760.217174-0.946256-1.0346581.2296851.99827802459-0.056177-0.2669040.2714790.8199462.0408730.5916550.63990432568-0.562821-0.3538960.0271070.338255-0.0316830.481486-0.1037773261080.111383-1.030498-1.690284-1.749076-1.3167450.5879080.999290127212-0.6582520.7286690.678765-0.464565-0.364702-1.781946-0.78874722819-0.6525281.1926250.6855540.498818-0.928876-1.306335-0.617060229188-0.662484-0.4505530.135717-1.106820-0.672255-0.147127-1.2726732..............................70140-0.594402-0.044591-0.8214060.819946-0.3157280.5125720.074137371148-0.0905570.052066-0.3190860.8199460.0936890.7235950.80625437211-0.4523170.1583900.549792-1.7490761.2768870.177913-0.140250373641.636776-0.779188-0.1697480.8199461.1084191.2715050.99128407484-0.117682-1.156153-0.5295180.9805101.8214720.5500380.5527263751750.604211-0.3248980.0882000.9805101.5903171.048938-0.287918376197-0.481087-0.0735890.393665-2.070203-0.356866-0.404628-0.287029277183-0.506714-0.808186-0.067926-2.070203-0.347071-2.051902-1.340281278210-0.628790-1.958410-1.887139-0.946256-1.297156-0.353290-1.08675317954-0.5150780.042400-0.1765360.1776910.5109430.6733710.467327380114-0.6661982.2945212.111056-0.625129-1.077755-0.229248-1.1365692814-0.5503841.5889211.445822-0.946256-0.245207-1.8114130.072358282911.575455-0.769523-0.1154430.980510-0.8426821.2795041.62732708377-0.5015740.332373-0.2783580.6593820.0545110.221758-0.28880838466-0.265535-0.0252600.305419-0.1434370.516820-0.6358011.332879385921.240375-1.243145-0.8349830.9805100.5677521.3035020.5785230862011.4545511.540592-0.733161-1.909640-1.2344700.7659211.014413187105-0.004485-1.281808-1.7513770.498818-0.8857790.3704051.418278188205-0.593947-0.1702460.305419-2.070203-0.629158-0.070373-0.8118762891540.504036-0.1605810.1696570.9805101.3846291.0649370.19511839045-0.6307520.061732-0.678856-0.625129-0.068902-1.377621-0.27991229197-0.6432031.3472771.2557550.498818-0.576267-1.199710-1.488839292632.067368-0.1992430.3597250.9805101.2298731.1133390.365916093211-0.6469130.1680550.3665130.498818-0.638953-2.020815-0.874146294158-0.422620-0.943506-0.2919340.8199461.8273490.505990-0.037060395135-0.6635950.2453810.4411820.338255-0.862272-0.018934-1.68276529679-0.6744750.6416770.1221410.338255-0.572349-2.111239-1.1223362971790.882197-0.653534-0.4344840.9805100.9810881.2578350.980609098149-0.6151691.0766361.4118810.017127-0.623282-0.626890-1.891814299113-0.464904-2.354706-1.4459120.8199460.4149550.5938830.5260393
100 rows × 9 columns
Cluster frequencies
In [17]:merged_train.cluster.value_counts()
Out[17]:3 39 2 35 0 18 1 13 Name: cluster, dtype: int64
Calculate clustering variable means by cluster
In [18]:clustergrp = merged_train.groupby('cluster').mean() print ("Clustering variable means by cluster") clustergrp Clustering variable means by cluster
Out[18]:indexincomeperpersonemployratefemaleemployratepolityscorealcconsumptionlifeexpectancyurbanratecluster093.5000001.846611-0.1960210.1010220.8110260.6785411.1956961.0784621117.461538-0.154556-1.117490-1.645378-1.069767-1.0827280.4395570.5086582100.657143-0.6282270.8551520.873487-0.583841-0.506473-1.034933-0.8963853107.512821-0.284648-0.424778-0.2000330.5317550.6146160.2302010.164805
Validate clusters in training data by examining cluster differences in internetuserate using ANOVA. First, merge internetuserate with clustering variables and cluster assignment data
In [19]:internetuserate_data = data_clean['internetuserate']
Split internetuserate data into train and test sets
In [20]:internetuserate_train, internetuserate_test = train_test_split(internetuserate_data, test_size=.3, random_state=123) internetuserate_train1=pd.DataFrame(internetuserate_train) internetuserate_train1.reset_index(level=0, inplace=True) merged_train_all=pd.merge(internetuserate_train1, merged_train, on='index') sub5 = merged_train_all[['internetuserate', 'cluster']].dropna()
In [21]:internetuserate_mod = smf.ols(formula='internetuserate ~ C(cluster)', data=sub5).fit() internetuserate_mod.summary()
Out[21]:
OLS Regression ResultsDep. Variable:internetuserateR-squared:0.679Model:OLSAdj. R-squared:0.669Method:Least SquaresF-statistic:71.17Date:Thu, 12 Jan 2017Prob (F-statistic):8.18e-25Time:20:59:17Log-Likelihood:-436.84No. Observations:105AIC:881.7Df Residuals:101BIC:892.3Df Model:3Covariance Type:nonrobustcoefstd errtP>|t|[95.0% Conf. Int.]Intercept75.20683.72720.1770.00067.813 82.601C(cluster)[T.1]-46.95175.756-8.1570.000-58.370 -35.534C(cluster)[T.2]-66.56684.587-14.5130.000-75.666 -57.468C(cluster)[T.3]-39.48604.506-8.7630.000-48.425 -30.547Omnibus:5.290Durbin-Watson:1.727Prob(Omnibus):0.071Jarque-Bera (JB):4.908Skew:0.387Prob(JB):0.0859Kurtosis:3.722Cond. No.5.90
Means for internetuserate by cluster
In [22]:m1= sub5.groupby('cluster').mean() m1
Out[22]:internetuseratecluster075.206753128.25501828.639961335.720760
Standard deviations for internetuserate by cluster
In [23]:m2= sub5.groupby('cluster').std() m2
Out[23]:internetuseratecluster014.093018121.75775228.399554319.057835
In [24]:mc1 = multi.MultiComparison(sub5['internetuserate'], sub5['cluster']) res1 = mc1.tukeyhsd() res1.summary()
Out[24]:
Multiple Comparison of Means - Tukey HSD,FWER=0.05group1group2meandifflowerupperreject01-46.9517-61.9887-31.9148True02-66.5668-78.5495-54.5841True03-39.486-51.2581-27.7139True12-19.6151-33.0335-6.1966True137.4657-5.76520.6965False2327.080817.461736.6999True
The elbow curve was inconclusive, suggesting that the 2, 4, 6, and 8-cluster solutions might be interpreted. The results above are for an interpretation of the 4-cluster solution.
In order to externally validate the clusters, an Analysis of Variance (ANOVA) was conducting to test for significant differences between the clusters on internet use rate. A tukey test was used for post hoc comparisons between the clusters. Results indicated significant differences between the clusters on internet use rate (F=71.17, p<.0001). The tukey post hoc comparisons showed significant differences between clusters on internet use rate, with the exception that clusters 0 and 2 were not significantly different from each other. Countries in cluster 1 had the highest internet use rate (mean=75.2, sd=14.1), and cluster 3 had the lowest internet use rate (mean=8.64, sd=8.40).
9 notes · View notes
arijit123 · 2 years ago
Text
Data Management and Visualization Assignment 1
My name is Arijit Banerjee and this blog is a part of Data Management and Visualization course on coursera. The submission of the assignments will be in the form of blogs. So I choose medium as my medium of assignment.
Assignment 1
In assignment 1, we've to choose dataset on which we've to work for the whole course. The five law books are given through which we've to elect a subcategory and two motifs/ variable on which we want to work.
STEP 1 Choose a data set that you would like to work with.
After reviewing five codebooks, I've decide to go with “ portion of the GapMinder ”. This data includes one time of multitudinous country- position pointers of health, wealth and development. I want work on health issue is the main reason to choose this text. In moment’s world, the average life expectation is increased. But some intoxication cause early death.
STEP 2. Identify a specific content of interest
As I want to explore the average life expectation including some intoxication, I would like to go with alcohol consumption and life expectation. As the alcohol consumption is so pernicious to health I would like to aim this issue to how important it’s dangerous. So I ’m considering how the alcohol consumption will prompt to health and other parameters or which parameters are related to to alcohol consumption.
STEP 3. Prepare a codebook of your own
As, GapMinder includes variables piecemeal from health( wealth, development). So then I ’m considering only incomeperperson, alcconsumption and lifeexpectancy as variables.
Image
STEP 4. Identify a alternate content that you would like to explore in terms of its association with your original content.
The alternate content which I would like to explore is urbanrate. While looking at the codebook, I allowed that there might be possibility that civic rate is connected to alcohol consumption. We can see in civic area, people are less apprehensive of their health and consume further toxic.
STEP 5. Add questions particulars variables establishing this alternate content to your particular codebook.
Is there any relation between civic rate and alcohol consumption?
Is there any relation between life expectation and alcohol consumption?
Is there any relation between Income and alcohol consumption?
The final Codebook
Image
STEP 6. Perform a literature review to see what exploration has been preliminarily done on this content
For the relation between urbanrate and alcohol consumption, I search so much on google scholar but there isn't a single paper on it. When I had hunt on google itself, I got to know that there's nor direct relation between this two. The relation as in terms of stress and culture. Those can be in megacity and civic are also. So I do n’t have to consider this term( urbanrate). For the relation between life expectation and alcohol consumption, I got lots of papers on it so do with income and alcohol consumption. Then are many papers and overview of their content.
1. Continuance income patterns and alcohol consumption probing the association between long- and short- term income circles and drinking( 1)
Overview In this paper, authors estimated the relationship between long- term and short- term measures of income. They used the data of US Panel Study on Income Dynamics. They also gave conclusion like the low income associated with heavy consumption. “ Continuance income patterns may have an circular association with alcohol use, intermediated through current socioeconomic position. ”( 1)
A Review of Expectancy Theory and Alcohol Consumption( 2)
According to their study, “ expectation manipulations and alcohol consumption three studies in the laboratory have shown that adding positive contemplations through word priming increases posterior consumption and two studies have shown that adding negative contemplations decreases it ”( 2)
2. Alcohol- related mortality by age and coitus and its impact on life expectation Estimates grounded on the Finnish death register( 3)
In this composition, the author studied presumptive results on the connection between alcohol- related mortality and age and coitus. Then are many statistics, “ According to the results, 6 of all deaths were alcohol related. These deaths were responsible for a 2 time loss in life expectation at age 15 times among men and0.4 times among women, which explains at least one- fifth of the difference in life contemplations between the relations. In the age group of 15 – 49 times, over 40 of all deaths among men and 15 among women were alcohol related. In this age group, over 50 of the mortality difference between the relations results from alcohol- related deaths ”( 3)
STEP 7. Grounded on your literature review, develop a thesis about what you believe the association might be between these motifs. Be sure to integrate the specific variables you named into the thesis.
After exploring similar papers, they're enough to establish the correlation between alcohol consumption and life expectation and also with Income( piecemeal from these three). After certain observation, following are my thesis
• The alcohol consumption is largely identified with life expectation.
• The social culture, group of people associated with person, stress and Income have direct correlation with alcohol consumption.
• Civic rate isn't directly connected to alcohol consumption.
Final Codebook
Reference
Cerdá, Magdalena, etal. “ Continuance Income Patterns and Alcohol Consumption probing the Association between Long- and Short- Term Income Circles and Drinking. ” Social Science & Medicine,vol. 73,no. 8, 2011,pp. 1178 – 1185., doi10.1016/j.socscimed.2011.07.025.
2) Jones, BarryT., etal. “ A Review of Expectancy Theory and Alcohol Consumption. ” Dependence,vol. 96,no. 1, 2001,pp. 57 – 72., doi10.1046/j. 1360 –0443.2001.961575.x.
3) Makela,P. “ Alcohol- Related Mortality by Age and Sex and Its Impact on Life Expectancy. ” The European Journal of Public Health,vol. 8,no. 1, 1998,pp. 43 – 51., doi10.1093/ eurpub/8.1.43.
#arijit#Data Management and Visualization#coursera
2 notes · View notes
eduardotleite · 2 years ago
Text
ANOVA Analysis
This is the task of Week 1 of the course Data Analysis Tools at the Coursera Plataform. The challenge is to execute an Analysis of Variance using the ANOVA Statistical Test. This type of analysis assesses whether the means of two or more groups are statistically different from each other. Is used whenever you want to compare the means (quantitative variables) of groups (categorical variables). The null hypothesis is that there is no difference in the mean of the quantitative variable across groups (categorical variable), while the alternative is that there is a difference.
DataSet Used – Gap Minder Gapminder identifies systematic misconceptions about important global trends and proportions and uses reliable data to develop easy to understand teaching materials to rid people of their misconceptions. Gapminder is an independent Swedish foundation with no political, religious, or economic affiliations. should visit it: https://www.gapminder.org/.
The dataset used has 16 variables and 213 rows. I choosed to analyze income per person (incomeperperson) and life expectancy (lifeexpectancy).
And how is the Question?
Is the life expectancy different among four categories of income per person (A,B,C,D,E)?
Since the income per person is a quantitative variable, I transformed it into a categorical variable, using parameters sugested by IBGE to classify the social class of according of income. For the parameters, I analyzed the boxplot posted below.
Tumblr media
the data in image is in portuguese, because the IBGE is an Brazilian institute.
The Code
I used the Anaconda to code in Python for this task. The code is posted below.
import numpy import pandas as pd import statsmodels.api as sm import statsmodels.formula.api as smf import statsmodels.stats.multicomp as multi import matplotlib.pyplot as plt import seaborn as sns import researchpy as rp import pycountry_convert as pc
df = pd.read_csv('gapminder.csv') df = df[['lifeexpectancy', 'incomeperperson']]
df['lifeexpectancy'] = df['lifeexpectancy'].apply(pd.to_numeric, errors='coerce') df['incomeperperson'] = df['incomeperperson'].apply(pd.to_numeric, errors='coerce')
def income_categories(row): if row["incomeperperson"]>15000: return "A" elif row["incomeperperson"]>5000: return "B" elif row["incomeperperson"]>3000: return "C" elif row["incomeperperson"]>1000: return "D" else: return "E"
df=df[(df['lifeexpectancy']>=1) & (df['lifeexpectancy']<=120) & (df['incomeperperson'] > 0) ]
df["Income_category"]=df.apply(income_categories, axis=1)
df = df[["Income_category","incomeperperson","lifeexpectancy"]].dropna()
df["Income_category"]=df.apply(income_categories, axis=1)
print (rp.summary_cont(df['lifeexpectancy']))
fig1, ax1 = plt.subplots() df_new = [df[df['Income_category']=='A']['lifeexpectancy'], df[df['Income_category']=='B']['lifeexpectancy'], df[df['Income_category']=='C']['lifeexpectancy'], df[df['Income_category']=='D']['lifeexpectancy'], df[df['Income_category']=='E']['lifeexpectancy']] ax1.set_title('life expectancy') ax1.boxplot(df_new) plt.show()
results = smf.ols('lifeexpectancy ~ C(Income_category)', data=df).fit() print (results.summary())
print ("Tukey") mc1 = multi.MultiComparison(df['lifeexpectancy'], df['Income_category']) print (mc1) res1 = mc1.tukeyhsd() print (res1.summary())
print ('means for for life expectancy by Income') m1= df.groupby('Income_category').mean() print (m1)
print ('Results') print ('standard deviations for life expectancy by Income') sd1 = df.groupby('Income_category').std() print (sd1)
Results – ANOVA Analysis
Aiming to answer the question of the task, I ran a test ANOVA. As shown below, from the 176 rows, 171 were used for the test, i have used a filter to remove some wrong values, as non numeric, negative, etc, reducing the rows of the original dataset
Tumblr media Tumblr media Tumblr media
The ANOVA analysis shows a graph for each category (above) and, as we can see, the life expectancy of A class, have the life expectative of 80.39 years while the E class have the life expectative of 59.15 years.
Tumblr media Tumblr media Tumblr media
2 notes · View notes