Learning Python, Numpy and Deep Learning

Chapter 1: my first Jupyter Notebook

Alex Brie, 29/06/2017

This is my first Jupyter notebook, a in-progress experiment of learning the ropes of Python and how to use it for datascience and deep learning.

Ultimate goal: create new deep network architectures using Keras and use them in revolutionary mobile apps that will change the way we perceive reality and man's purpose on earth (I'm obviously joking here, capisce?)

But first, baby steps:

For the first self-taught lesson I just want to learn how to use Jupyter notebooks, how to open some dataset (csv) and how to apply some basic operations on it (filtering, basic numpy methods)

For dataset I didn't want just any dataset but a government one. Therefore I'm using a csv that contains the number of vaccines administered to children in the first trimester of 2017, in Romanian cities data.gov.ro

Conclusion (later edit)

The good part: I'm able to use Jupyter, with autocomplete(after installing readline), look at documentation in a separate terminal using pydoc, create new cells, run them, open a csv, convert it into numpy array and then do basic operations on it such as filtering, etc.

The bad part: I didn't do anything that you can't do in 10 seconds by simply opening the aforementioned csv in Excel. Plus, my filtering probably sucks. But it's a start.


In [39]:
import pandas as pd
import numpy as np

In [40]:
datas = pd.read_csv('copii.csv', sep=';', names=["J","L","V", "N", "A"])

In [41]:
print(datas[0:10])
# print(datas.columns)


      J        L                   V   N     A
0  Alba    Abrud  BCG ( Alt produs )  10  2017
1  Alba    Abrud            Diftavax  23  2017
2  Alba    Abrud           Engerix B   5  2017
3  Alba    Abrud               Euvax   4  2017
4  Alba    Abrud            Hexacima  25  2017
5  Alba    Abrud       Infanrix hexa   3  2017
6  Alba    Abrud         M-M-RVAXPRO  26  2017
7  Alba    Abrud            Tetraxim  16  2017
8  Alba  Acmariu            Diftavax   2  2017
9  Alba  Acmariu         M-M-RVAXPRO   1  2017

In [42]:
np_datas = np.array(datas)

In [43]:
judete = np_datas[:,0]
orase = np_datas[:, 1]
vaccinuri = np_datas[:, 2]
cantitati = np_datas[:, 3]
ani = np_datas[:, 4]

Test printing a filter


In [44]:
print(orase[judete=="Prahova"])
print(judete[orase=="Busteni"])


['Adunati' 'Adunati' 'Adunati' ..., 'Zamfira' 'Zamfira' 'Zanoaga']
['Dolj' 'Dolj' 'Prahova' 'Prahova' 'Prahova' 'Prahova' 'Prahova' 'Prahova'
 'Prahova' 'Prahova']

Prepare filter columns that allow us to select any county/town combo, or county/town/vaccine name combo


In [45]:
jud_or = judete + "_"+ orase
jud_or_vac = judete + "_"+ orase + "_"+ vaccinuri

In [46]:
print(np.sum(cantitati[jud_or=="Prahova_Busteni"]))
print(np.sum(cantitati[jud_or_vac=="Bucuresti_Sector 2_Hexacima"]))


125
1199

Demo for extracting the quantities and vaccines names for a given county_city combo


In [47]:
na = np.array([vaccinuri[jud_or=="Prahova_Busteni"], cantitati[jud_or=="Prahova_Busteni"]]).T

In [48]:
print(na)


[['BCG ( Alt produs )' 18]
 ['Diftavax' 2]
 ['Euvax' 10]
 ['Hexacima' 33]
 ['M-M-RVAXPRO' 39]
 ['Priorix' 1]
 ['Tetraxim' 21]
 ['VVR' 1]]

In [49]:
print (np.sum(na[:,-1]))


125

Identify the most common/popular/abundant vaccine


In [50]:
max_vac = np.max(na[:,-1])
vaccine_name = na[na[:,-1]==max_vac][0][0]
print(vaccine_name)


M-M-RVAXPRO

Now for the least common/popular/abundant vaccine


In [51]:
min_vac = np.min(na[:,-1])
vaccine_name = na[na[:,-1]==min_vac][0][0]
print(vaccine_name)


Priorix

In [ ]: