Exercises


In [1]:
import pandas as pd
from matplotlib import pyplot as plt
plt.style.use('ggplot')
%matplotlib inline
diamonds = pd.read_csv('diamonds.csv',index_col=0)

Task 1: Basics

  • have a look at the diamonds dataset: how many rows do we have?, what are the different columns?
  • create a DataFrame consisting only of the x, y and z columns
  • access row 5 to 15 in diamonds
  • create a DataFrame consisting only of row 5 to 15 and name the rows "A" to "K" (hint: each DataFrame has an .index attribute which can be modified)
  • access row "C" in the DataFrame you just created
  • use the mixed access operator (.ix) to get the price of the 500th diamond
  • group the diamnods by color and compute the mean of the price
  • find all the diamonds with more than 2 carat and plot their price distribution in a histogram
  • compute and plot the standard deviation of the x dimension for the different cuts

In [ ]:

Task 2: Import the "Kernmerkmale Bevölkerung (Geschlecht, Deutsche/Ausländer, 5 Altersgruppen)" data set

https://www.destatis.de/DE/PresseService/Presse/Pressekonferenzen/2013/Zensus2011/zensus_pk.html

download the excel file and use the read_excel method in pandas (hint: useful arguments are sheetname, header and index_col


In [ ]:

Task 4: Create a new DataFrame containing only the "Bundesländer" (hint: use the corresponding SATZART key)


In [ ]:

Task 5: Plot the age composition of the different Bundesländer


In [ ]:

Task 6: ...and now relative to the population in each Bundesland (hint the .div method of a DataFrame can be used to divide two dataframes)


In [ ]:

Task 7: Plot pie charts of the relative age distribution in Berlin and Sachsen


In [ ]:


In [ ]:

Task 8: Import a health care data set from Zeit online

The file is in the git repository: multiresistente_keime.xlsx. Import it using the read_excel function.


In [ ]:

Task 9: Merge the two data sets (hint: "Kreisschlüssel" is the same as "AGS")


In [ ]:

Task 10: Plot number of hospitalized patients per inhabitant against relative number of old people (65+)


In [ ]:

Task 11: Compute the correlation


In [ ]:


In [ ]: