Who is the happiest?

I've decided to explore the data about a happiness indicator from across the world. I found a dataset about the subject from year 2016. The first task would be to check and clean up the data.



In [185]:

    
file_2016=pd.read_csv('2016.csv')
file_2016.head(n=5)









    Out[185]:







  
    
      
      Country
      Region
      Happiness Rank
      Happiness Score
      Lower Confidence Interval
      Upper Confidence Interval
      Economy (GDP per Capita)
      Family
      Health (Life Expectancy)
      Freedom
      Trust (Government Corruption)
      Generosity
      Dystopia Residual
    
  
  
    
      0
      Denmark
      Western Europe
      1
      7.526
      7.460
      7.592
      1.44178
      1.16374
      0.79504
      0.57941
      0.44453
      0.36171
      2.73939
    
    
      1
      Switzerland
      Western Europe
      2
      7.509
      7.428
      7.590
      1.52733
      1.14524
      0.86303
      0.58557
      0.41203
      0.28083
      2.69463
    
    
      2
      Iceland
      Western Europe
      3
      7.501
      7.333
      7.669
      1.42666
      1.18326
      0.86733
      0.56624
      0.14975
      0.47678
      2.83137
    
    
      3
      Norway
      Western Europe
      4
      7.498
      7.421
      7.575
      1.57744
      1.12690
      0.79579
      0.59609
      0.35776
      0.37895
      2.66465
    
    
      4
      Finland
      Western Europe
      5
      7.413
      7.351
      7.475
      1.40598
      1.13464
      0.81091
      0.57104
      0.41004
      0.25492
      2.82596

Since there are spaces in the column names, I want to rename them with a more easily referrable names for future use.



In [186]:

    
new_names=['country', 'region','happiness_rank', 'happiness_score', 'lower', 'upper','gdp_capita', 'family', 'health_life_exp', 'freedom', 'gov_trust', 'generosity', 'dystopia_res']
file_2016.columns=new_names



In [187]:

    
file_2016.head(n=5)









    Out[187]:







  
    
      
      country
      region
      happiness_rank
      happiness_score
      lower
      upper
      gdp_capita
      family
      health_life_exp
      freedom
      gov_trust
      generosity
      dystopia_res
    
  
  
    
      0
      Denmark
      Western Europe
      1
      7.526
      7.460
      7.592
      1.44178
      1.16374
      0.79504
      0.57941
      0.44453
      0.36171
      2.73939
    
    
      1
      Switzerland
      Western Europe
      2
      7.509
      7.428
      7.590
      1.52733
      1.14524
      0.86303
      0.58557
      0.41203
      0.28083
      2.69463
    
    
      2
      Iceland
      Western Europe
      3
      7.501
      7.333
      7.669
      1.42666
      1.18326
      0.86733
      0.56624
      0.14975
      0.47678
      2.83137
    
    
      3
      Norway
      Western Europe
      4
      7.498
      7.421
      7.575
      1.57744
      1.12690
      0.79579
      0.59609
      0.35776
      0.37895
      2.66465
    
    
      4
      Finland
      Western Europe
      5
      7.413
      7.351
      7.475
      1.40598
      1.13464
      0.81091
      0.57104
      0.41004
      0.25492
      2.82596

Let's make the first visualization of the distribution of average happiness_score by region.



In [188]:

    
region_avg=file_2016.groupby(['region'],as_index=False)['happiness_score'].mean()
region_avg.head(n=5)









    Out[188]:







  
    
      
      region
      happiness_score
    
  
  
    
      0
      Australia and New Zealand
      7.323500
    
    
      1
      Central and Eastern Europe
      5.370690
    
    
      2
      Eastern Asia
      5.624167
    
    
      3
      Latin America and Caribbean
      6.101750
    
    
      4
      Middle East and Northern Africa
      5.386053



In [189]:

    
import matplotlib.pyplot as plt
plt.style.use('ggplot')
import numpy as np



In [190]:

    
regions=region_avg['region']
scores=round(region_avg['happiness_score'],2)
y_pos = np.arange(len(regions))

plot1=plt.figure(figsize=(6,6))
plot1=plt.barh(y_pos, scores, align='center', alpha=0.5, color='y')
plot1=plt.yticks(y_pos, regions, fontweight='bold')
plot1=plt.title("Average Happiness Score by Region", color='black')
plot1=plt.xlabel("Region", fontweight='bold')
plot1=plt.ylabel("Happiness Score", fontweight='bold')

for i, v in enumerate(scores):
    plot1=plt.text(v, i , str(v), color='blue', fontweight='bold')



In [191]:

    
plt.show(plot1)

As you can see, the developed countries do feel happier on average than the developed countries.

Next it would be interesting to see, which aspect of life contributes to the score by how much on average.



In [192]:

    
values=[file_2016['gdp_capita'].mean(), file_2016['family'].mean(), file_2016['health_life_exp'].mean(), file_2016['family'].mean(),file_2016['gov_trust'].mean(),  
 file_2016['generosity'].mean()]     
labels=['GDP per Capita', 'Family', 'Health/life Exp.', 'Freedom', 'Gov. Trust', 'Generosity']

plot2=plt.figure(figsize=(6,6))
plot2=plt.pie(values, labels=labels, 
        autopct='%1.1f%%', shadow=True, startangle=90)



In [193]:

    
plt.show(plot2)

As the chart shows, poeple's happiness on average is driven by wealth, health, love and freedom. Worthy of note that wealth is the most important contributor while government trust is the least important.

Next, I show the countries whose happiness scores are the closest to their dystopia residual (the least happy place on earth). Dystopia is a hypothetical country that has the worst scores for each contributing factor where the least happy people live.



In [194]:

    
file_2016['living_hell']=file_2016['happiness_score']-file_2016['dystopia_res']
least_happy=file_2016.loc[file_2016['living_hell']<1.5]
least_happy_tb=(least_happy[['country', 'region']])



In [195]:

    
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from plotly.graph_objs import *
import plotly.figure_factory as ff
init_notebook_mode(connected=True)


table = ff.create_table(least_happy_tb)
plot(table)









    











    Out[195]:





'file:///Users/AnaS/Desktop/world-happiness-report/temp-plot.html'

As shown in the table, people who are living their worst dystopian nightmare are from Sub-Saharan Africa, which sadly makes sense because these countries are most deprived of all the factors that contribute to happiness according to this particular study.

In the next graph, I explore the distribution of the happiness score to check if it's normal. As you can see immediatly below, it is not exactly normal, for more evidence, I made a probability plot, which agrees with the density plot.



In [196]:

    
import seaborn as sns
plt.figure(figsize=(6,6))
sns.distplot(file_2016['happiness_score'], hist=False)
plt.title("Happiness Score Distribution", color="black")
plt.xlabel("Happiness Score", fontweight="bold")
plt.show()



In [198]:

    
import scipy
plt.figure(figsize=(6,6))
scipy.stats.probplot(file_2016['happiness_score'],  dist='norm', fit=True, plot=plt)
plt.title("Happiness Score Fitted to Normal Curve")
plt.show()

	Country	Region	Happiness Rank	Happiness Score	Lower Confidence Interval	Upper Confidence Interval	Economy (GDP per Capita)	Family	Health (Life Expectancy)	Freedom	Trust (Government Corruption)	Generosity	Dystopia Residual
0	Denmark	Western Europe	1	7.526	7.460	7.592	1.44178	1.16374	0.79504	0.57941	0.44453	0.36171	2.73939
1	Switzerland	Western Europe	2	7.509	7.428	7.590	1.52733	1.14524	0.86303	0.58557	0.41203	0.28083	2.69463
2	Iceland	Western Europe	3	7.501	7.333	7.669	1.42666	1.18326	0.86733	0.56624	0.14975	0.47678	2.83137
3	Norway	Western Europe	4	7.498	7.421	7.575	1.57744	1.12690	0.79579	0.59609	0.35776	0.37895	2.66465
4	Finland	Western Europe	5	7.413	7.351	7.475	1.40598	1.13464	0.81091	0.57104	0.41004	0.25492	2.82596

	region	happiness_score
0	Australia and New Zealand	7.323500
1	Central and Eastern Europe	5.370690
2	Eastern Asia	5.624167
3	Latin America and Caribbean	6.101750
4	Middle East and Northern Africa	5.386053