Sveučilište u Zagrebu
Fakultet elektrotehnike i računarstva

Strojno učenje

http://www.fer.unizg.hr/predmet/su

Ak. god. 2015./2016.

Bilježnica 0: Uvod u SciPy

Verzija: 0.5 (2015-10-15)

NEPOTPUNO

1. SciPy stack

Glavni paketi (core packages):

Python
NumPy
biblioteka SciPy
IPython
Matplotlib
SymPy
pandas
nose

Dodatni paketi:

Cython
SciKits paketi: scikit-learn, scikit-multilearn, scikit-image, ...

2. IPython notebook

Ćelije se evaluiraju sa SHIFT+ENTER

Markdown tekst s posebnim formatiranjem i kodom u $\LaTeX$-u: $f(\mathbf{x}) = \sum_{i=1}^n \ln \frac{P(x)P(y)}{P(x, y)}$



In [1]:

    
10









    Out[1]:





10



In [2]:

    
_









    Out[2]:





10



In [3]:









    Out[3]:





55



In [4]:









    Out[4]:





10



In [5]:

    
?



In [6]:

    
%quickref

Više: https://ipython.org/ipython-doc/3/interactive/tutorial.html

3. Python

3.1. Varijable i vrijednosti



In [7]:

    
x = 5



In [8]:

    
x









    Out[8]:





5



In [9]:

    
print(x)



In [10]:

    
print x



In [11]:

    
type(x)









    Out[11]:





int



In [12]:

    
(x + 1) ** 2









    Out[12]:





36



In [13]:

    
x += 1; x









    Out[13]:





6



In [14]:

    
?x



In [15]:

    
del x



In [16]:

    
x









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-16-401b30e3b8b5> in <module>()
----> 1 x

NameError: name 'x' is not defined



In [ ]:

    
X=7; varijabla_s_vrlo_dugackim_imenom = 747



In [ ]:

    
x=1; y=-2



In [ ]:

    
x==y



In [ ]:

    
(x==y)==False



In [ ]:

    
x!=y



In [ ]:

    
x==y or (x>0 and not y>0)



In [ ]:

    
z = 42 if x==y else 66



In [ ]:

    
z



In [ ]:

    
moj_string = 'ringe ringe'



In [ ]:

    
'hopa' + ' ' + "cupa"



In [ ]:

    
moj_string += ' raja'; moj_string



In [ ]:

    
len(moj_string)



In [ ]:

    
print "X=%0.2f y=%d, s='%s'" % (x, y, moj_string)



In [ ]:

    
1/2



In [ ]:

    
1/2.0



In [ ]:

    
1/float(2)



In [ ]:

    
round(0.5)

3.2. Matematičke funkcije



In [ ]:

    
import math



In [ ]:

    
math.sqrt(68)



In [ ]:

    
math.exp(1)



In [ ]:

    
math.log(_)



In [ ]:

    
math.log(100, 2)

Više: https://docs.python.org/2/library/math.html

3.3. Lista



In [ ]:

    
xs = [5, 6, 2, 3]   # Stvara listu



In [ ]:

    
xs



In [ ]:

    
xs[0]  # Zero-based indeksiranje



In [ ]:

    
xs[-1]  # Negativni indeksi broje od kraja liste



In [ ]:

    
xs[1] = 100  # Ažuriranje liste
xs



In [ ]:

    
xs[1] = 'foo'  # Liste mogu biti heterogene
xs



In [ ]:

    
xs[3] = [1,2]
xs



In [ ]:

    
xs.append(x)  # Dodaje na kraj
xs



In [ ]:

    
xs + [77, 88]



In [ ]:

    
xs.extend([77, 88]); xs



In [ ]:

    
xs.pop()  # Skida zadnji element liste



In [ ]:

    
xs



In [ ]:

    
xs[0:2]



In [ ]:

    
xs[1:]



In [ ]:

    
xs[:3]



In [ ]:

    
xs[:]



In [ ]:

    
xs[:-2]  # Sve osim zadnja dva



In [ ]:

    
xs[0:2] = [1,2]
xs



In [ ]:

    
range(10)



In [ ]:

    
range(1, 10)



In [ ]:

    
range(0, 51, 5)



In [ ]:

    
for x in range(5):
    print x



In [ ]:

    
for x in xs: print x



In [ ]:

    
for ix, x in enumerate(range(0, 51, 5)):
  print ix, x



In [ ]:

    
xs = []
for x in range(10):
    xs.append(x ** 2)
xs



In [ ]:

    
[x ** 2 for x in range(10)]



In [ ]:

    
[x ** 2 for x in range(10) if x % 2 == 0]



In [ ]:

    
[(x, x ** 2) for x in range(10)]



In [ ]:

    
zip([1, 2, 3], [4, 5, 6])



In [ ]:

    
zip(*[(1, 4), (2, 5), (3, 6)])



In [ ]:

    
xs, ys = zip(*[(1, 4), (2, 5), (3, 6)])



In [ ]:

    
xs



In [ ]:

    
map(lambda x : x + 1, xs)



In [ ]:

    
[ x + 1 for x in xs ]



In [ ]:

    
ys = []
for x in xs :
    ys.append(x + 1)
ys



In [ ]:

    
sum(ys)

3.4. Rječnik (mapa)



In [17]:

    
d = {'zagreb' : 790017, 'split' : 178102, 'rijeka' : 128624}



In [18]:

    
d['split']









    Out[18]:





178102



In [19]:

    
d['osijek']









    



---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-19-b6288821d7a0> in <module>()
----> 1 d['osijek']

KeyError: 'osijek'



In [20]:

    
d.get('osijek', 0)









    Out[20]:





0



In [21]:

    
d['osijek'] = 108048; d









    Out[21]:





{'osijek': 108048, 'rijeka': 128624, 'split': 178102, 'zagreb': 790017}



In [22]:

    
'rijeka' in d









    Out[22]:





True



In [23]:

    
d['zagreb'] = 790200; d









    Out[23]:





{'osijek': 108048, 'rijeka': 128624, 'split': 178102, 'zagreb': 790200}



In [24]:

    
del d['rijeka']; d









    Out[24]:





{'osijek': 108048, 'split': 178102, 'zagreb': 790200}

Iteriranje po rječniku:



In [25]:

    
for grad in d:
    print 'Grad %s ima %d stanovnika' % (grad, d[grad])









    



Grad osijek ima 108048 stanovnika
Grad split ima 178102 stanovnika
Grad zagreb ima 790200 stanovnika

Iteriranje po ključevima i po vrijednostima:



In [26]:

    
for grad, stanovnici in d.iteritems():
    print 'Grad %s ima %d stanovnika' % (grad, stanovnici)









    



Grad osijek ima 108048 stanovnika
Grad split ima 178102 stanovnika
Grad zagreb ima 790200 stanovnika

Ugniježđeni rječnici:



In [27]:

    
d2 = {'zagreb' : {'trešnjevka' : 120240, 'centar' : 145302}}
d2 ['zagreb']['trešnjevka']









    Out[27]:





120240

3.5. Funkcije



In [28]:

    
def inc(x): return x + 1



In [29]:

    
def sign(x):
    if x > 0:
        return 'pozitivno'
    elif x < 0:
        return 'negativno'
    else:
        return 'nula'

for x in [-1, 0, 1]:
    print sign(x)









    



negativno
nula
pozitivno

Podrazumijevani argumenti:



In [30]:

    
def broj_stanovnika(grad, godina=2015):
    if grad in d:
        return d[grad] + round((godina - 2015) * 10000 * (-1.2))
    else: 
        raise ValueError('Nepoznat neki grad')



In [31]:

    
broj_stanovnika('zagreb')









    Out[31]:





790200.0



In [32]:

    
broj_stanovnika('zagreb', godina=2020)









    Out[32]:





730200.0



In [33]:

    
broj_stanovnika('zadar')









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-33-ed4c8aabd867> in <module>()
----> 1 broj_stanovnika('zadar')

<ipython-input-30-4ea6caac6059> in broj_stanovnika(grad, godina)
      3         return d[grad] + round((godina - 2015) * 10000 * (-1.2))
      4     else:
----> 5         raise ValueError('Nepoznat neki grad')

ValueError: Nepoznat neki grad

3.6. Klase



In [34]:

    
class RegistarStanovnika:

    # Konstruktor
    def __init__(self, drzava, d):
        self.drzava = drzava  # Varijabla instance (drugačija za svaku instancu)
        self.d = d  
        
    prirast = -1.2  # Varijabla klase (dijele ju sve instance)

    # Metoda
    def broj_stanovnika(self, grad, godina=2015):
        if grad in self.d:
            return self.d[grad] + round((godina - 2015) * 10000 * self.prirast)
        else: 
            raise ValueError('Nepoznat neki grad')
    
    def ukupno_stanovnika(self):
        return sum(self.d.values())



In [35]:

    
reg = RegistarStanovnika('Hrvatska', {'zagreb' : 790017, 'split' : 178102, 'rijeka' : 128624})



In [36]:

    
reg.broj_stanovnika('split')









    Out[36]:





178102.0



In [37]:

    
reg.ukupno_stanovnika()









    Out[37]:





1096743

4. Numpy



In [1]:

    
import numpy as np



In [39]:

    
?np



In [40]:

    
np.__version__









    Out[40]:





'1.10.0.post2'

4.1. Polja

Jednodimenzijsko polje (polje ranga 1):



In [2]:

    
a = np.array([1, 2, 3])



In [42]:

    
a









    Out[42]:





array([1, 2, 3])



In [43]:

    
print a



In [44]:

    
type(a)









    Out[44]:





numpy.ndarray



In [45]:

    
a = np.array([1, 2, 3], dtype=np.float64)



In [46]:

    
a









    Out[46]:





array([ 1.,  2.,  3.])



In [47]:

    
a[0]









    Out[47]:





1.0



In [48]:

    
a[0] = 100; a









    Out[48]:





array([ 100.,    2.,    3.])



In [49]:

    
a.shape









    Out[49]:





(3,)



In [50]:

    
len(a)









    Out[50]:





3



In [51]:

    
np.array([1,'a',2])









    Out[51]:





array(['1', 'a', '2'], 
      dtype='|S21')

Matrica (dvodimenzijsko polje, polje ranga 2):



In [52]:

    
m = np.array([[1,2,3],[4,5,6]])



In [53]:

    
print m









    



[[1 2 3]
 [4 5 6]]



In [54]:

    
m[1]









    Out[54]:





array([4, 5, 6])



In [55]:

    
m[1,1]









    Out[55]:





5



In [56]:

    
m[1][1]









    Out[56]:





5



In [57]:

    
m.shape









    Out[57]:





(2, 3)



In [58]:

    
m2 = np.array([[1,2,3],[4,5]])



In [59]:

    
print m2









    



[[1, 2, 3] [4, 5]]

Izrezivanje (engl. slicing):



In [60]:

    
print m









    



[[1 2 3]
 [4 5 6]]



In [61]:

    
m[:,1]









    Out[61]:





array([2, 5])



In [62]:

    
m[0,1:3]









    Out[62]:





array([2, 3])



In [63]:

    
m[1,:2] = [77, 78]



In [64]:

    
m









    Out[64]:





array([[ 1,  2,  3],
       [77, 78,  6]])

Uočiti razliku:



In [65]:

    
m[:,0]  # daje polje ranga 1









    Out[65]:





array([ 1, 77])



In [66]:

    
m[:,0:1]  # daje polje ranga 2









    Out[66]:





array([[ 1],
       [77]])

Trodimenzijsko polje (tenzor ranga 3):



In [67]:

    
t = np.array([[[1,2],[3,4]],[[4,5],[6,7]]])



In [68]:

    
t.shape









    Out[68]:





(2, 2, 2)



In [69]:

    
t[0,1,1]









    Out[69]:





4



In [70]:

    
t[0]









    Out[70]:





array([[1, 2],
       [3, 4]])



In [71]:

    
t[0,:,1]









    Out[71]:





array([2, 4])

4.2. Stvaranje polja



In [6]:

    
np.zeros((5,5))









    Out[6]:





array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])



In [7]:

    
np.ones((3,1))









    Out[7]:





array([[ 1.],
       [ 1.],
       [ 1.]])



In [10]:

    
np.full((5,5), 55)









    Out[10]:





array([[ 55.,  55.,  55.,  55.,  55.],
       [ 55.,  55.,  55.,  55.,  55.],
       [ 55.,  55.,  55.,  55.,  55.],
       [ 55.,  55.,  55.,  55.,  55.],
       [ 55.,  55.,  55.,  55.,  55.]])



In [11]:

    
np.eye(6)









    Out[11]:





array([[ 1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.]])



In [14]:

    
np.random.random((4,4))









    Out[14]:





array([[ 0.34294388,  0.2407478 ,  0.54200906,  0.85047566],
       [ 0.84029626,  0.60056098,  0.84275663,  0.40206247],
       [ 0.59492988,  0.69943282,  0.67430892,  0.7806004 ],
       [ 0.07980246,  0.92217663,  0.35797981,  0.67351464]])



In [18]:

    
np.arange(1, 10)









    Out[18]:





array([1, 2, 3, 4, 5, 6, 7, 8, 9])



In [19]:

    
np.arange(1, 10, 2)









    Out[19]:





array([1, 3, 5, 7, 9])



In [20]:

    
np.linspace(1, 10, 5)









    Out[20]:





array([  1.  ,   3.25,   5.5 ,   7.75,  10.  ])



In [23]:

    
np.linspace(1, 10)









    Out[23]:





array([  1.        ,   1.18367347,   1.36734694,   1.55102041,
         1.73469388,   1.91836735,   2.10204082,   2.28571429,
         2.46938776,   2.65306122,   2.83673469,   3.02040816,
         3.20408163,   3.3877551 ,   3.57142857,   3.75510204,
         3.93877551,   4.12244898,   4.30612245,   4.48979592,
         4.67346939,   4.85714286,   5.04081633,   5.2244898 ,
         5.40816327,   5.59183673,   5.7755102 ,   5.95918367,
         6.14285714,   6.32653061,   6.51020408,   6.69387755,
         6.87755102,   7.06122449,   7.24489796,   7.42857143,
         7.6122449 ,   7.79591837,   7.97959184,   8.16326531,
         8.34693878,   8.53061224,   8.71428571,   8.89795918,
         9.08163265,   9.26530612,   9.44897959,   9.63265306,
         9.81632653,  10.        ])

Više: http://docs.scipy.org/doc/numpy/reference/routines.array-creation.html

4.3. Napredno indeksiranje

Indeksiranje poljem brojeva:



In [26]:

    
a = np.array([[1,2], [3, 4], [5, 6]]); a









    Out[26]:





array([[1, 2],
       [3, 4],
       [5, 6]])



In [28]:

    
a[0,1]









    Out[28]:





2



In [29]:

    
a[[0,2]]   # Nije isto kao a[0,2] !









    Out[29]:





array([[1, 2],
       [5, 6]])



In [30]:

    
a[[0,1,2], [0,1,0]]   # Isto kao: np.array([a[0,0], a[1,1], a[2,0]])









    Out[30]:





array([1, 4, 5])

Indeksiranje Booleovim poljem:



In [31]:

    
a









    Out[31]:





array([[1, 2],
       [3, 4],
       [5, 6]])



In [32]:

    
bool_ix = a > 2
bool_ix









    Out[32]:





array([[False, False],
       [ True,  True],
       [ True,  True]], dtype=bool)



In [33]:

    
a[bool_ix]









    Out[33]:





array([3, 4, 5, 6])



In [34]:

    
a[a > 2]









    Out[34]:





array([3, 4, 5, 6])

Više: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

4.4. Širenje i naslagivanje

Širenje (eng. broadcasting):



In [35]:

    
x = np.array([[1, 2], [3, 4]])
v = np.array([1, 2])



In [37]:

    
print x









    



[[1 2]
 [3 4]]



In [89]:

    
x + v









    Out[89]:





array([[2, 4],
       [4, 6]])



In [90]:

    
np.ones((2,2,3)) * 5









    Out[90]:





array([[[ 5.,  5.,  5.],
        [ 5.,  5.,  5.]],

       [[ 5.,  5.,  5.],
        [ 5.,  5.,  5.]]])

Naslagivanje (engl. stacking):



In [38]:

    
v









    Out[38]:





array([1, 2])



In [40]:

    
np.vstack([v, v])









    Out[40]:





array([[1, 2],
       [1, 2]])



In [41]:

    
np.vstack([x, x])









    Out[41]:





array([[1, 2],
       [3, 4],
       [1, 2],
       [3, 4]])



In [42]:

    
np.vstack((v, x))









    Out[42]:





array([[1, 2],
       [1, 2],
       [3, 4]])



In [43]:

    
np.hstack((v, v))









    Out[43]:





array([1, 2, 1, 2])



In [44]:

    
np.hstack((x, x))









    Out[44]:





array([[1, 2, 1, 2],
       [3, 4, 3, 4]])



In [45]:

    
np.hstack((v, x))









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-45-6d17658301dc> in <module>()
----> 1 np.hstack((v, x))

/usr/local/lib/python2.7/dist-packages/numpy/core/shape_base.pyc in hstack(tup)
    276     # As a special case, dimension 0 of 1-dimensional arrays is "horizontal"
    277     if arrs[0].ndim == 1:
--> 278         return _nx.concatenate(arrs, 0)
    279     else:
    280         return _nx.concatenate(arrs, 1)

ValueError: all the input arrays must have same number of dimensions



In [46]:

    
np.column_stack((v, x))









    Out[46]:





array([[1, 1, 2],
       [2, 3, 4]])



In [47]:

    
x









    Out[47]:





array([[1, 2],
       [3, 4]])



In [98]:

    
np.dstack((x, x))









    Out[98]:





array([[[1, 1],
        [2, 2]],

       [[3, 3],
        [4, 4]]])



In [99]:

    
np.shape(_)









    Out[99]:





(2, 2, 2)

Preoblikovanje polja:



In [50]:

    
m = np.array([[ 1,  2,  3], [77, 78,  6]])
m.reshape(3, 2)









    Out[50]:





array([[ 1,  2],
       [ 3, 77],
       [78,  6]])

4.4. Operacije s poljima (vektorske i matrične operacije)



In [51]:

    
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])



In [102]:

    
print x; print y









    



[[1 2]
 [3 4]]
[[5 6]
 [7 8]]

Operacije "po elementima" (element-wise):



In [53]:

    
x + y









    Out[53]:





array([[ 6,  8],
       [10, 12]])



In [54]:

    
x - y









    Out[54]:





array([[-4, -4],
       [-4, -4]])



In [55]:

    
x / 2.0









    Out[55]:





array([[ 0.5,  1. ],
       [ 1.5,  2. ]])



In [56]:

    
x.dtype









    Out[56]:





dtype('int64')



In [57]:

    
(x/2.0).dtype









    Out[57]:





dtype('float64')



In [58]:

    
x * y









    Out[58]:





array([[ 5, 12],
       [21, 32]])



In [59]:

    
x.dtype='float64'
y.dtype='float64'



In [60]:

    
x / y









    Out[60]:





array([[ 0.2       ,  0.33333333],
       [ 0.42857143,  0.5       ]])



In [111]:

    
np.sqrt(x)









    Out[111]:





array([[  2.22275875e-162,   3.14345557e-162],
       [  3.84993109e-162,   4.44551750e-162]])

Vektorske/matrične operacije:



In [61]:

    
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])
v = np.array([1,2])
w = np.array([5,3])

Skalarni (unutarnji, dot) umnožak vektora: $ \begin{pmatrix} 1 & 2 \\ \end{pmatrix} \cdot \begin{pmatrix} 5\\ 3\\ \end{pmatrix} = 11 $



In [113]:

    
print v.dot(w)
print w.dot(v)
print np.dot(v, w)

Umnožak matrice i vektora: $ \begin{pmatrix} 1 & 2 \\ 3 & 4 \\ \end{pmatrix} \cdot \begin{pmatrix} 1\ 2\

\end{pmatrix}

\begin{pmatrix} 5\\ 11\\ \end{pmatrix}



In [62]:

    
x.dot(v)









    Out[62]:





array([ 5, 11])



In [63]:

    
np.dot(x, v)









    Out[63]:





array([ 5, 11])

Umnožak vektora i matrice: $ \begin{pmatrix} 1 & 2\\ \end{pmatrix} \cdot \begin{pmatrix} 1 & 2 \ 3 & 4 \

\end{pmatrix}

\begin{pmatrix} 7 & 10\\ \end{pmatrix}



In [120]:

    
v.dot(x)









    Out[120]:





array([ 7, 10])



In [119]:

    
np.dot(v,x)









    Out[119]:





array([ 7, 10])

Primijetite da nema razlike između vektor-stupca i vektor-retka.

Umnožak matrice i matrice: $ \begin{pmatrix} 1 & 2\\ 3 & 4\\ \end{pmatrix} \cdot \begin{pmatrix} 5 & 6 \ 7 & 7 \

\end{pmatrix}

\begin{pmatrix} 19 & 22\\ 43 & 50\\ \end{pmatrix}



In [122]:

    
x.dot(y)









    Out[122]:





array([[19, 22],
       [43, 50]])



In [121]:

    
np.dot(x, y)









    Out[121]:





array([[19, 22],
       [43, 50]])

Vanjski umnožak vektora: $ \begin{pmatrix} 1\\ 2\\ \end{pmatrix} \times \begin{pmatrix} 5 \ 3 \

\end{pmatrix}

\begin{pmatrix} 1\\ 2\\ \end{pmatrix}

\cdot \begin{pmatrix} 5 & 3\

\end{pmatrix}

\begin{pmatrix} 5 & 3 \\ 10 & 6 \\ \end{pmatrix}



In [64]:

    
np.outer(v, w)









    Out[64]:





array([[ 5,  3],
       [10,  6]])

Ostale operacije:



In [65]:

    
x = np.array([0, 2, 4, 1])



In [66]:

    
np.max(x)









    Out[66]:





4



In [67]:

    
np.argmax(x)









    Out[67]:





2

4.5. Statističke funkcije



In [76]:

    
x = np.random.random(10); x









    Out[76]:





array([ 0.88042383,  0.22280293,  0.00769093,  0.9631947 ,  0.82314693,
        0.6021121 ,  0.42227832,  0.54826309,  0.10995267,  0.27862066])



In [69]:

    
np.mean(x)









    Out[69]:





0.41554955347505657



In [70]:

    
np.median(x)









    Out[70]:





0.28811051704517238



In [71]:

    
np.var(x)









    Out[71]:





0.09738450671168114



In [72]:

    
np.std(x)









    Out[72]:





0.31206490785040403



In [73]:

    
x = np.array([1, 2, np.nan])
np.mean(x)









    Out[73]:





nan



In [74]:

    
np.nanmean(x)









    Out[74]:





1.5



In [77]:

    
np.ptp(x)









    Out[77]:





0.95550376524614955



In [85]:

    
X = np.array([[1,2],[3,4]])
print X









    



[[1 2]
 [3 4]]



In [79]:

    
np.mean(X)









    Out[79]:





2.5



In [83]:

    
np.mean(X, axis=0)









    Out[83]:





array([ 2.,  3.])



In [86]:

    
np.cov(X)









    Out[86]:





array([[ 0.5,  0.5],
       [ 0.5,  0.5]])



In [87]:

    
x = np.random.random(10000); x









    Out[87]:





array([ 0.45896982,  0.98801242,  0.97315322, ...,  0.2251306 ,
        0.28813426,  0.77343545])



In [88]:

    
np.histogram(x)









    Out[88]:





(array([1028, 1024, 1029, 1022, 1014,  953, 1029,  960,  971,  970]),
 array([  8.52564207e-05,   1.00051953e-01,   2.00018649e-01,
          2.99985345e-01,   3.99952041e-01,   4.99918737e-01,
          5.99885433e-01,   6.99852129e-01,   7.99818825e-01,
          8.99785521e-01,   9.99752217e-01]))

Više: http://docs.scipy.org/doc/numpy/reference/routines.statistics.html

4.6. Druge često korištene funkcije



In [89]:

    
x = np.array([[1,2],[3,4]]); x









    Out[89]:





array([[1, 2],
       [3, 4]])



In [90]:

    
np.sum(x)









    Out[90]:





10



In [143]:

    
np.sum(x, axis=0)









    Out[143]:





array([4, 6])



In [144]:

    
np.sum(x, axis=1)









    Out[144]:





array([3, 7])



In [91]:

    
x.T









    Out[91]:





array([[1, 3],
       [2, 4]])



In [92]:

    
v









    Out[92]:





array([1, 2])



In [93]:

    
v.T









    Out[93]:





array([1, 2])



In [94]:

    
x.diagonal()









    Out[94]:





array([1, 4])



In [95]:

    
x.trace()  # == x.sum(x.diagonal())









    Out[95]:





5

Aplikacija funkcije na polje:



In [97]:

    
x









    Out[97]:





array([[1, 2],
       [3, 4]])



In [99]:

    
np.apply_along_axis(sum, 1, x)









    Out[99]:





array([3, 7])



In [100]:

    
np.apply_along_axis(len, 1, x)









    Out[100]:





array([2, 2])

Većina ugrađenih funkcija su vektorizirane, tj. moguće ih je primijeniti na cijelo polje tako da provode operaciju nad pojedinačnim elementima polja. Npr.:



In [101]:

    
np.sign(x)









    Out[101]:





array([[1, 1],
       [1, 1]])



In [102]:

    
np.log(x)









    Out[102]:





array([[ 0.        ,  0.69314718],
       [ 1.09861229,  1.38629436]])

Isto vrijedi i za korisnički definirane funkcije koje su definirane pomoći vektoriziranih ugrađenih funkcija:



In [103]:

    
def inc(x) : return x + 1



In [104]:

    
inc(x)









    Out[104]:





array([[2, 3],
       [4, 5]])

Složenije funkcije treba eksplicitno vektorizirati pomoću numpy.vectorize (ili jednostavno aplicirati funkciju u for petlji, što funkcija vectorize zapravo i radi).

Permutacije:



In [105]:

    
x = np.arange(0,10); x









    Out[105]:





array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])



In [108]:

    
np.random.permutation(x)









    Out[108]:





array([5, 3, 4, 0, 2, 7, 9, 8, 1, 6])



In [109]:

    
x









    Out[109]:





array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])



In [110]:

    
np.random.shuffle(x); x









    Out[110]:





array([1, 2, 9, 4, 7, 8, 0, 6, 5, 3])



In [111]:

    
x









    Out[111]:





array([1, 2, 9, 4, 7, 8, 0, 6, 5, 3])

Više: http://docs.scipy.org/doc/numpy/reference/routines.sort.html

4.7. Konverzija lista <-> polje



In [112]:

    
l = [1, 2, 3]
a = np.array(l); a









    Out[112]:





array([1, 2, 3])



In [113]:

    
list(a)









    Out[113]:





[1, 2, 3]



In [114]:

    
a.tolist()









    Out[114]:





[1, 2, 3]



In [115]:

    
l = [[1, 2, 3], [4,5,6]]
a = np.array(l); a









    Out[115]:





array([[1, 2, 3],
       [4, 5, 6]])



In [116]:

    
list(a)









    Out[116]:





[array([1, 2, 3]), array([4, 5, 6])]



In [117]:

    
a.tolist()









    Out[117]:





[[1, 2, 3], [4, 5, 6]]

5. SciPy



In [118]:

    
import scipy as sp



In [119]:

    
sp.__version__









    Out[119]:





'0.16.0'

SciPy importa NumPy. Npr.:



In [166]:

    
x = sp.array([1,2,3])

Iz biblioteke SciPy interesantni su nam moduli scipy.linalg i scipy.stats.

5.1. SciPy.linalg



In [122]:

    
from scipy import linalg

Inverz matrice:



In [123]:

    
y









    Out[123]:





array([[5, 6],
       [7, 8]])



In [124]:

    
y_inv = linalg.inv(y); y_inv









    Out[124]:





array([[-4. ,  3. ],
       [ 3.5, -2.5]])



In [125]:

    
sp.dot(y, y_inv)









    Out[125]:





array([[  1.00000000e+00,   3.55271368e-15],
       [  0.00000000e+00,   1.00000000e+00]])

Determinanta:



In [126]:

    
linalg.det(y)









    Out[126]:





-2.0000000000000036

Euklidska norma ($l_2$-norma) vektora: $\|\mathbf{x}\|_2 = \sqrt{\sum_i x_i^2}$



In [127]:

    
w









    Out[127]:





array([5, 3])



In [128]:

    
linalg.norm(w)









    Out[128]:





5.8309518948453007

Općenita $p$-norma: $\|\mathbf{x}\|_p = \big(\sum_i |x_i|^p\big)^{1/p}$



In [130]:

    
linalg.norm(w, ord=1)









    Out[130]:





8



In [131]:

    
linalg.norm(w, ord=sp.inf)









    Out[131]:





5

Više: http://docs.scipy.org/doc/scipy/reference/tutorial/linalg.html

5.2. SciPy.stats



In [132]:

    
from scipy import stats



In [133]:

    
stats.norm









    Out[133]:





<scipy.stats._continuous_distns.norm_gen at 0x7f970ba55e10>



In [136]:

    
stats.norm.pdf(0)









    Out[136]:





0.3989422804014327



In [137]:

    
xs = sp.linspace(-2, 2, 10);



In [138]:

    
stats.norm.pdf(xs)









    Out[138]:





array([ 0.05399097,  0.11897819,  0.21519246,  0.31944801,  0.38921247,
        0.38921247,  0.31944801,  0.21519246,  0.11897819,  0.05399097])



In [139]:

    
stats.norm.pdf(xs, loc=1, scale=2)









    Out[139]:





array([ 0.0647588 ,  0.08817395,  0.11427077,  0.14095594,  0.16549503,
        0.18494385,  0.19671986,  0.19916355,  0.19192205,  0.17603266])

Uzorkovanje iz normalne distribucije:



In [140]:

    
stats.norm.rvs(loc=1, scale=2, size=10)









    Out[140]:





array([-0.58147381,  3.71386122,  1.24616214,  1.79738483,  2.69992991,
       -0.01069313,  3.30819964,  1.69645732,  1.09588046, -1.44891004])

"Zamrzavanje" distribucije:



In [141]:

    
normal = stats.norm(1, 2)



In [142]:

    
normal.pdf(xs)









    Out[142]:





array([ 0.0647588 ,  0.08817395,  0.11427077,  0.14095594,  0.16549503,
        0.18494385,  0.19671986,  0.19916355,  0.19192205,  0.17603266])



In [144]:

    
normal.rvs(size=5)









    Out[144]:





array([-2.20691566,  0.9434288 , -3.30867649, -1.24524492, -0.95381916])

Multivarijatna Gaussova distribucija:



In [145]:

    
?stats.multivariate_normal



In [146]:

    
mean    = sp.array([1.0, 3.0])
cov     = sp.array([[2.0, 0.3], [0.5, 0.7]])
mnormal = stats.multivariate_normal(mean, cov)



In [148]:

    
mnormal.pdf([1, 0])









    Out[148]:





5.9244062489471738e-05



In [149]:

    
np.random.seed(42)   # Radi reproducibilnosti rezultata
mnormal.rvs(size=5)









    Out[149]:





array([[ 0.32295504,  2.7257843 ],
       [-0.19250187,  3.91415511],
       [ 1.37357915,  2.90579424],
       [-1.37198427,  3.02935659],
       [ 1.56503442,  3.5666907 ]])

Koeficijent korelacije:



In [150]:

    
x, y = np.random.random((2, 10))



In [152]:

    
y









    Out[152]:





array([ 0.45606998,  0.78517596,  0.19967378,  0.51423444,  0.59241457,
        0.04645041,  0.60754485,  0.17052412,  0.06505159,  0.94888554])



In [153]:

    
stats.pearsonr(x, y)









    Out[153]:





(0.30346130585985159, 0.39400530307438952)

Više: http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html

6. Matplotlib

matplotlib sadrži više modula: pyplot, image, matplot3d, ...



In [160]:

    
import matplotlib.pyplot as plt
import matplotlib



In [161]:

    
matplotlib.__version__









    Out[161]:





'1.4.3'



In [162]:

    
%pylab inline









    



Populating the interactive namespace from numpy and matplotlib






    



WARNING: pylab import has clobbered these variables: ['linalg', 'cov', 'normal', 'mean']
`%matplotlib` prevents importing * from pylab and numpy

pylab kombinira pyplot i numpy. Gornja naredba (ipython magic) osigurava da pplotovi budu renderirani direktno u bilježnicu, umjesto da otvoaraju zaseban prozor.

6.1. Funkcija `plot`



In [165]:

    
plt.plot([1,2,3,4,5], [4,5,5,7,3])
plt.show()



In [262]:

    
plt.plot([4,5,5,7,3]);



In [263]:

    
plt.plot([4,5,5,7,3], 'ro');



In [166]:

    
def f(x) : return x**2



In [167]:

    
xs = linspace(0,100); xs









    Out[167]:





array([   0.        ,    2.04081633,    4.08163265,    6.12244898,
          8.16326531,   10.20408163,   12.24489796,   14.28571429,
         16.32653061,   18.36734694,   20.40816327,   22.44897959,
         24.48979592,   26.53061224,   28.57142857,   30.6122449 ,
         32.65306122,   34.69387755,   36.73469388,   38.7755102 ,
         40.81632653,   42.85714286,   44.89795918,   46.93877551,
         48.97959184,   51.02040816,   53.06122449,   55.10204082,
         57.14285714,   59.18367347,   61.2244898 ,   63.26530612,
         65.30612245,   67.34693878,   69.3877551 ,   71.42857143,
         73.46938776,   75.51020408,   77.55102041,   79.59183673,
         81.63265306,   83.67346939,   85.71428571,   87.75510204,
         89.79591837,   91.83673469,   93.87755102,   95.91836735,
         97.95918367,  100.        ])



In [266]:

    
f(xs)









    Out[266]:





array([  0.00000000e+00,   4.16493128e+00,   1.66597251e+01,
         3.74843815e+01,   6.66389005e+01,   1.04123282e+02,
         1.49937526e+02,   2.04081633e+02,   2.66555602e+02,
         3.37359434e+02,   4.16493128e+02,   5.03956685e+02,
         5.99750104e+02,   7.03873386e+02,   8.16326531e+02,
         9.37109538e+02,   1.06622241e+03,   1.20366514e+03,
         1.34943773e+03,   1.50354019e+03,   1.66597251e+03,
         1.83673469e+03,   2.01582674e+03,   2.20324865e+03,
         2.39900042e+03,   2.60308205e+03,   2.81549354e+03,
         3.03623490e+03,   3.26530612e+03,   3.50270721e+03,
         3.74843815e+03,   4.00249896e+03,   4.26488963e+03,
         4.53561016e+03,   4.81466056e+03,   5.10204082e+03,
         5.39775094e+03,   5.70179092e+03,   6.01416077e+03,
         6.33486047e+03,   6.66389005e+03,   7.00124948e+03,
         7.34693878e+03,   7.70095793e+03,   8.06330696e+03,
         8.43398584e+03,   8.81299459e+03,   9.20033319e+03,
         9.59600167e+03,   1.00000000e+04])



In [168]:

    
plt.plot(xs, f(xs));



In [268]:

    
plt.plot(xs, f(xs), 'bo');



In [269]:

    
plt.plot(xs, f(xs), 'r+');



In [169]:

    
plt.plot(xs, 1 - f(xs), 'b', xs, f(xs)/2 - 1000, 'r--');



In [170]:

    
plt.plot(xs, f(xs), label='f(x)')
plt.plot(xs, 1 - f(xs), label='1-f(x)')
plt.legend()
plt.show()



In [171]:

    
xs = linspace(-5,5)
plt.plot(xs, stats.norm.pdf(xs), 'g--');
plt.plot(xs, stats.norm.pdf(xs, loc=1, scale=2), 'r', linewidth=3);

6.2. Funkcija `scatter`



In [173]:

    
plt.scatter([0, 1, 2, 0], [4, 5, 2, 1])
plt.show()



In [274]:

    
plt.scatter([0,1,2,0], [4, 5, 2, 1], s=200, marker='s');



In [174]:

    
np.random.random(10)









    Out[174]:





array([ 0.96563203,  0.80839735,  0.30461377,  0.09767211,  0.68423303,
        0.44015249,  0.12203823,  0.49517691,  0.03438852,  0.9093204 ])



In [175]:

    
for c in 'rgb':
  plt.scatter(sp.random.random(100), sp.random.random(100), s=200, alpha=0.5, marker='o', c=c)

6.3. Grafikon konture i gustoće



In [178]:

    
x = np.linspace(1,5,5); x









    Out[178]:





array([ 1.,  2.,  3.,  4.,  5.])



In [179]:

    
X, Y = np.meshgrid(x, x)



In [180]:

    
X









    Out[180]:





array([[ 1.,  2.,  3.,  4.,  5.],
       [ 1.,  2.,  3.,  4.,  5.],
       [ 1.,  2.,  3.,  4.,  5.],
       [ 1.,  2.,  3.,  4.,  5.],
       [ 1.,  2.,  3.,  4.,  5.]])



In [181]:

    
Y









    Out[181]:





array([[ 1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.,  4.],
       [ 5.,  5.,  5.,  5.,  5.]])



In [182]:

    
Z = 10 * X + Y
Z









    Out[182]:





array([[ 11.,  21.,  31.,  41.,  51.],
       [ 12.,  22.,  32.,  42.,  52.],
       [ 13.,  23.,  33.,  43.,  53.],
       [ 14.,  24.,  34.,  44.,  54.],
       [ 15.,  25.,  35.,  45.,  55.]])



In [183]:

    
plt.pcolormesh(X, Y, Z, cmap='gray')
plt.show()

Više: http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.pcolormesh, http://matplotlib.org/users/colormaps.html



In [184]:

    
mnormal = stats.multivariate_normal([0, 1], [[1, 1], [0.2, 3]])



In [185]:

    
mnormal.pdf([1,1])









    Out[185]:





0.055730458106194758



In [186]:

    
x = np.linspace(-1, 1)
y = np.linspace(-2, 2)
X, Y = np.meshgrid(x, y)



In [187]:

    
shape(X)









    Out[187]:





(50, 50)



In [188]:



In [189]:

    
shape(XY)









    Out[189]:





(50, 50, 2)



In [190]:

    
mnormal.pdf(XY)









    Out[190]:





array([[ 0.01492383,  0.01541297,  0.01589129, ...,  0.01095132,
         0.01044738,  0.00994981],
       [ 0.01610377,  0.01663533,  0.01715544, ...,  0.01194288,
         0.01139588,  0.01085559],
       [ 0.01733793,  0.01791426,  0.01847852, ...,  0.01299494,
         0.01240254,  0.01181718],
       ..., 
       [ 0.04679266,  0.04884039,  0.05089173, ...,  0.05646055,
         0.0544354 ,  0.05239434],
       [ 0.04542256,  0.04742101,  0.04942386, ...,  0.05539039,
         0.05341564,  0.0514244 ],
       [ 0.04399343,  0.04593934,  0.04789039, ...,  0.0542183 ,
         0.05229712,  0.05035891]])



In [191]:

    
plt.pcolormesh(X, Y, mnormal.pdf(XY))
plt.show()



In [291]:

    
plt.contourf(X, Y, mnormal.pdf(XY));



In [292]:

    
plt.contourf(X, Y, mnormal.pdf(XY), levels=[0,0.06, 0.07]);



In [193]:

    
plt.contour(X, Y, mnormal.pdf(XY));



In [194]:

    
x = linspace(-10,10)
X, Y = np.meshgrid(x, x)
Z = X*3 + Y



In [195]:

    
plt.contour(X, Y, Z);



In [296]:

    
plt.contour(X, Y, Z, levels=[0]);

Kombinacija više grafikona:



In [297]:

    
plt.contour(X, Y, Z, levels=[0])
plt.scatter([-5,-3,2,5], [4, 5, 2, 1])
plt.show()

6.4. Histogram



In [298]:

    
np.random.seed(42)
x = stats.norm.rvs(size=1000)



In [299]:

    
plt.hist(x);

Više-manje istovjetno s:



In [300]:

    
hist, bins = np.histogram(x)
centers = (bins[:-1] + bins[1:]) / 2
plt.bar(centers, hist);

6.5. Podgrafikoni

TODO

7. Pandas



In [301]:

    
import pandas as pd
pd.__version__









    Out[301]:





u'0.17.0'

TODO

8. Sklearn



In [302]:

    
import sklearn
sklearn.__version__









    Out[302]:





'0.15.2'

TODO

Strojno učenje

Bilježnica 0: Uvod u SciPy

1. SciPy stack

2. IPython notebook

3. Python

3.1. Varijable i vrijednosti

3.2. Matematičke funkcije

3.3. Lista

3.4. Rječnik (mapa)

3.5. Funkcije

3.6. Klase

4. Numpy

4.1. Polja

4.2. Stvaranje polja

4.3. Napredno indeksiranje

4.4. Širenje i naslagivanje

4.4. Operacije s poljima (vektorske i matrične operacije)

\end{pmatrix}

\end{pmatrix}

\end{pmatrix}

\end{pmatrix}

\end{pmatrix}

4.5. Statističke funkcije

4.6. Druge često korištene funkcije

4.7. Konverzija lista <-> polje

5. SciPy

5.1. SciPy.linalg

5.2. SciPy.stats

6. Matplotlib

6.1. Funkcija plot

6.2. Funkcija scatter

6.3. Grafikon konture i gustoće

6.4. Histogram

6.5. Podgrafikoni

7. Pandas

8. Sklearn

6.1. Funkcija `plot`

6.2. Funkcija `scatter`