Learning Python, Numpy and Deep Learning

Alex Brie, 29/06/2017

Chapter 1: my first Jupyter Notebook

First, baby steps:

For the first self-taught lesson I just wanted to understand how to use the Jupyter notebook software, how to open a dataset (csv) and how to apply basic operations on it (filtering, basic numpy methods)

For dataset I didn't want just any dataset but a government one. Therefore I'm using a csv that contains the number of vaccines administered to children in the first trimester of 2017, in Romanian cities data.gov.ro

Conclusion (later edit)

The good part: I'm able to use Jupyter, with autocomplete(after installing readline), look at documentation in a separate terminal using pydoc, create new cells, run them, open a csv, convert it into numpy array and then do basic operations on it such as filtering, etc.

The bad part: I didn't do anything that you can't do in 10 seconds by simply opening the aforementioned csv in Excel. Plus, my filtering probably sucks. But it's a start.


In [39]:
import pandas as pd
import numpy as np

In [40]:
datas = pd.read_csv('copii.csv', sep=';', names=["J","L","V", "N", "A"])

In [41]:
print(datas[0:10])
# print(datas.columns)


      J        L                   V   N     A
0  Alba    Abrud  BCG ( Alt produs )  10  2017
1  Alba    Abrud            Diftavax  23  2017
2  Alba    Abrud           Engerix B   5  2017
3  Alba    Abrud               Euvax   4  2017
4  Alba    Abrud            Hexacima  25  2017
5  Alba    Abrud       Infanrix hexa   3  2017
6  Alba    Abrud         M-M-RVAXPRO  26  2017
7  Alba    Abrud            Tetraxim  16  2017
8  Alba  Acmariu            Diftavax   2  2017
9  Alba  Acmariu         M-M-RVAXPRO   1  2017

In [42]:
np_datas = np.array(datas)

In [43]:
judete = np_datas[:,0]
orase = np_datas[:, 1]
vaccinuri = np_datas[:, 2]
cantitati = np_datas[:, 3]
ani = np_datas[:, 4]

Test printing a filter


In [44]:
print(orase[judete=="Prahova"])
print(judete[orase=="Busteni"])


['Adunati' 'Adunati' 'Adunati' ..., 'Zamfira' 'Zamfira' 'Zanoaga']
['Dolj' 'Dolj' 'Prahova' 'Prahova' 'Prahova' 'Prahova' 'Prahova' 'Prahova'
 'Prahova' 'Prahova']

Prepare filter columns that allow us to select any county/town combo, or county/town/vaccine name combo


In [45]:
jud_or = judete + "_"+ orase
jud_or_vac = judete + "_"+ orase + "_"+ vaccinuri

In [46]:
print(np.sum(cantitati[jud_or=="Prahova_Busteni"]))
print(np.sum(cantitati[jud_or_vac=="Bucuresti_Sector 2_Hexacima"]))


125
1199

Demo for extracting the quantities and vaccines names for a given county_city combo


In [47]:
na = np.array([vaccinuri[jud_or=="Prahova_Busteni"], cantitati[jud_or=="Prahova_Busteni"]]).T

In [48]:
print(na)


[['BCG ( Alt produs )' 18]
 ['Diftavax' 2]
 ['Euvax' 10]
 ['Hexacima' 33]
 ['M-M-RVAXPRO' 39]
 ['Priorix' 1]
 ['Tetraxim' 21]
 ['VVR' 1]]

In [49]:
print (np.sum(na[:,-1]))


125

Identify the most common/popular/abundant vaccine


In [50]:
max_vac = np.max(na[:,-1])
vaccine_name = na[na[:,-1]==max_vac][0][0]
print(vaccine_name)


M-M-RVAXPRO

Now for the least common/popular/abundant vaccine


In [51]:
min_vac = np.min(na[:,-1])
vaccine_name = na[na[:,-1]==min_vac][0][0]
print(vaccine_name)


Priorix

In [ ]: