Alex Brie, 29/06/2017
For the first self-taught lesson I just wanted to understand how to use the Jupyter notebook software, how to open a dataset (csv) and how to apply basic operations on it (filtering, basic numpy methods)
For dataset I didn't want just any dataset but a government one. Therefore I'm using a csv that contains the number of vaccines administered to children in the first trimester of 2017, in Romanian cities data.gov.ro
The good part: I'm able to use Jupyter, with autocomplete(after installing readline), look at documentation in a separate terminal using pydoc, create new cells, run them, open a csv, convert it into numpy array and then do basic operations on it such as filtering, etc.
The bad part: I didn't do anything that you can't do in 10 seconds by simply opening the aforementioned csv in Excel. Plus, my filtering probably sucks. But it's a start.
In [39]:
import pandas as pd
import numpy as np
In [40]:
datas = pd.read_csv('copii.csv', sep=';', names=["J","L","V", "N", "A"])
In [41]:
print(datas[0:10])
# print(datas.columns)
In [42]:
np_datas = np.array(datas)
In [43]:
judete = np_datas[:,0]
orase = np_datas[:, 1]
vaccinuri = np_datas[:, 2]
cantitati = np_datas[:, 3]
ani = np_datas[:, 4]
Test printing a filter
In [44]:
print(orase[judete=="Prahova"])
print(judete[orase=="Busteni"])
Prepare filter columns that allow us to select any county/town combo, or county/town/vaccine name combo
In [45]:
jud_or = judete + "_"+ orase
jud_or_vac = judete + "_"+ orase + "_"+ vaccinuri
In [46]:
print(np.sum(cantitati[jud_or=="Prahova_Busteni"]))
print(np.sum(cantitati[jud_or_vac=="Bucuresti_Sector 2_Hexacima"]))
Demo for extracting the quantities and vaccines names for a given county_city combo
In [47]:
na = np.array([vaccinuri[jud_or=="Prahova_Busteni"], cantitati[jud_or=="Prahova_Busteni"]]).T
In [48]:
print(na)
In [49]:
print (np.sum(na[:,-1]))
Identify the most common/popular/abundant vaccine
In [50]:
max_vac = np.max(na[:,-1])
vaccine_name = na[na[:,-1]==max_vac][0][0]
print(vaccine_name)
Now for the least common/popular/abundant vaccine
In [51]:
min_vac = np.min(na[:,-1])
vaccine_name = na[na[:,-1]==min_vac][0][0]
print(vaccine_name)
In [ ]: