IS-360 Project 3


In [29]:
import pandas as pd
import csv
import matplotlib.pyplot as plt

Reading CSV File into Pandas DataFrame

READING CSV: I want to read the csv using the Pandas '.read_csv' which returns a DataFrame Object.


In [42]:
income_df = pd.read_csv('LifeExpectancyIncome.csv')

In [43]:
income_df


Out[43]:
Country United States Ukraine Ethiopia Average
0 Income / Person $41,678 $16,390 $954 $19,674
1 Life Expectancy 79 68 63 70

I have a few things to do, first i want to flip the positions of columns with the of row index variables.

I've chosen to use the transpose function, which causes the columns and index to change places.


In [44]:
income_df = income_df.set_index('Country').transpose()

In [45]:
income_df


Out[45]:
Country Income / Person Life Expectancy
United States $41,678 79
Ukraine $16,390 68
Ethiopia $954 63
Average $19,674 70
  • Issues with the way the data is stored in the DataFrame object, first of all since the data includes signs such as commas, and currency signs, its presented as a string format.

I would like to remove any characters that do not represent number values and return these as intergers.


In [46]:
income_df['Income / Person'] = income_df['Income / Person'].str.replace(r'[^-+\d.]', '').astype(int)
  • The line above returns the desired result removing, the varchars from the cells and returning their values as intergers that can be computed and analayzed further.

In [47]:
income_df['Life Expectancy'] = income_df['Life Expectancy'].astype(int)
  • The line above repeats the same process for the 'Life Expectancy' Column and returns their values as intergers. The only diference is there is no replacement of varchars since the cells do not contain other than numberical characters.

In [53]:
income_df


Out[53]:
Country Income / Person Life Expectancy
United States 41678 79
Ukraine 16390 68
Ethiopia 954 63
Average 19674 70
  • What is returned after cells are cleaned and returned as computable data, is a clean DataFrame object.

In [54]:
income_df.plot(kind='area')


Out[54]:
<matplotlib.axes._subplots.AxesSubplot at 0x10a15df50>

In [55]:
plt.show()

  • This AREA graph doesnt do a great job at displaying the diference or similarities between Income VS Life Expectancy. This could be due to the colors or the choice of graph.

In [18]:
income_df.plot(kind='bar', title='Income vs Life Expectancy')


Out[18]:
<matplotlib.axes._subplots.AxesSubplot at 0x10b3583d0>

In [19]:
plt.show() #bar graph

  • This bar chart does a good job of noting the diferences between income in measured countries, but does do well for representing their relationship to the Life Expectancy, alot of these values are not even visible.

In [ ]:
income_df.plot(x = 'Income / Person',y='Life Expectancy', kind='scatter')

In [ ]:
plt.show() #scatter chart

  • The Scatter plot above, does a great job of representing the diferences between the ages and Life Expectancy notably the diference between Incomes over 40k, which in relation to the diference between 20k and below 1k, are very simliar.

In [61]:
plt.figure(figsize=(9, 6))


Out[61]:
<matplotlib.figure.Figure at 0x11a447590>

In [62]:
income_df = income_df.drop('Ethiopia')

In [68]:
income_df.plot(x='Income / Person', y='Life Expectancy', kind='bar', stacked=True)


Out[68]:
<matplotlib.axes._subplots.AxesSubplot at 0x112539510>

In [ ]:
plt.show()

  • I think removing Ethiopia from the Data was crucial to understanding why the Income / Person has little effect, given the diference in Life Expectancy ranges between 40k, 19k, and 16k earners.

In [24]:
protein_df = pd.read_csv('HighProteinFoods.csv')

In [25]:
protein_df.stack()


Out[25]:
0   Food                     Almond Nuts
    Protein                        21.1g
    Carbs                           6.9g
    Fat                            55.8g
    Calories                    614 kcal
1   Food                       Anchovies
    Protein                        14.5g
    Carbs                           0.1g
    Fat                             2.8g
    Calories                     85 kcal
2   Food                       Asparagus
    Protein                         2.9g
    Carbs                           2.0g
    Fat                             0.6g
    Calories                     25 kcal
3   Food                         Avocado
    Protein                         1.9g
    Carbs                           1.9g
    Fat                            19.5g
    Calories                    195 kcal
4   Food                           Bacon
    Protein                        15.9g
    Fat                            19.8g
    Calories                    245 kcal
5   Food                     Baked Beans
    Protein                         9.5g
    Carbs                          22.1g
    Fat                             0.4g
    Calories                    130 kcal
6   Food                         Bananas
                          ...           
47  Calories                    105 kcal
48  Food                            Tofu
    Protein                        12.1g
    Carbs                           0.6g
    Fat                             6.0g
    Calories                    105 kcal
49  Food               Tuna Fish (Steak)
    Protein                        25.6g
    Carbs                             0g
    Fat                             0.5g
    Calories                    110 kcal
50  Food              Tuna Fish (Tinned)
    Protein                        26.3g
    Carbs                           0.0g
    Fat                            10.7g
    Calories                    202 kcal
51  Food        Turkey Breast (Skinless)
    Protein                        22.3g
    Carbs                             0g
    Fat                             1.2g
    Calories                    100 kcal
52  Food             Venison (Deer meat)
    Protein                        30.21
    Fat                             3.19
    Calories                    158 kcal
53  Food                          Yogurt
    Protein                         4.5g
    Carbs                           6.6g
    Fat                            11.0g
    Calories                    145 kcal
dtype: object

In [ ]:


In [ ]: