Alex Brie, 29/06/2017
This is my first Jupyter notebook, a in-progress experiment of learning the ropes of Python and how to use it for datascience and deep learning.
Ultimate goal: create new deep network architectures using Keras and use them in revolutionary mobile apps that will change the way we perceive reality and man's purpose on earth (I'm obviously joking here, capisce?)
For the first self-taught lesson I just want to learn how to use Jupyter notebooks, how to open some dataset (csv) and how to apply some basic operations on it (filtering, basic numpy methods)
For dataset I didn't want just any dataset but a government one. Therefore I'm using a csv that contains the number of vaccines administered to children in the first trimester of 2017, in Romanian cities data.gov.ro
The good part: I'm able to use Jupyter, with autocomplete(after installing readline), look at documentation in a separate terminal using pydoc, create new cells, run them, open a csv, convert it into numpy array and then do basic operations on it such as filtering, etc.
The bad part: I didn't do anything that you can't do in 10 seconds by simply opening the aforementioned csv in Excel. Plus, my filtering probably sucks. But it's a start.
In [39]:
import pandas as pd
import numpy as np
In [40]:
datas = pd.read_csv('copii.csv', sep=';', names=["J","L","V", "N", "A"])
In [41]:
print(datas[0:10])
# print(datas.columns)
In [42]:
np_datas = np.array(datas)
In [43]:
judete = np_datas[:,0]
orase = np_datas[:, 1]
vaccinuri = np_datas[:, 2]
cantitati = np_datas[:, 3]
ani = np_datas[:, 4]
Test printing a filter
In [44]:
print(orase[judete=="Prahova"])
print(judete[orase=="Busteni"])
Prepare filter columns that allow us to select any county/town combo, or county/town/vaccine name combo
In [45]:
jud_or = judete + "_"+ orase
jud_or_vac = judete + "_"+ orase + "_"+ vaccinuri
In [46]:
print(np.sum(cantitati[jud_or=="Prahova_Busteni"]))
print(np.sum(cantitati[jud_or_vac=="Bucuresti_Sector 2_Hexacima"]))
Demo for extracting the quantities and vaccines names for a given county_city combo
In [47]:
na = np.array([vaccinuri[jud_or=="Prahova_Busteni"], cantitati[jud_or=="Prahova_Busteni"]]).T
In [48]:
print(na)
In [49]:
print (np.sum(na[:,-1]))
Identify the most common/popular/abundant vaccine
In [50]:
max_vac = np.max(na[:,-1])
vaccine_name = na[na[:,-1]==max_vac][0][0]
print(vaccine_name)
Now for the least common/popular/abundant vaccine
In [51]:
min_vac = np.min(na[:,-1])
vaccine_name = na[na[:,-1]==min_vac][0][0]
print(vaccine_name)
In [ ]: