Sveučilište u Zagrebu
Fakultet elektrotehnike i računarstva

Strojno učenje

http://www.fer.unizg.hr/predmet/su

Ak. god. 2015./2016.

Bilježnica 0: Uvod u SciPy

(c) 2015 Jan Šnajder

Verzija: 0.5 (2015-10-15)

NEPOTPUNO

1. SciPy stack

Glavni paketi (core packages):

  • Python
  • NumPy
  • biblioteka SciPy
  • IPython
  • Matplotlib
  • SymPy
  • pandas
  • nose

Dodatni paketi:

  • Cython
  • SciKits paketi: scikit-learn, scikit-multilearn, scikit-image, ...

2. IPython notebook

Ćelije se evaluiraju sa SHIFT+ENTER

Markdown tekst s posebnim formatiranjem i kodom u $\LaTeX$-u: $f(\mathbf{x}) = \sum_{i=1}^n \ln \frac{P(x)P(y)}{P(x, y)}$


In [1]:
10


Out[1]:
10

In [2]:
_


Out[2]:
10

In [3]:



Out[3]:
55

In [4]:



Out[4]:
10

In [5]:
?

In [6]:
%quickref

3. Python

3.1. Varijable i vrijednosti


In [7]:
x = 5

In [8]:
x


Out[8]:
5

In [9]:
print(x)


5

In [10]:
print x


5

In [11]:
type(x)


Out[11]:
int

In [12]:
(x + 1) ** 2


Out[12]:
36

In [13]:
x += 1; x


Out[13]:
6

In [14]:
?x

In [15]:
del x

In [16]:
x


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-16-401b30e3b8b5> in <module>()
----> 1 x

NameError: name 'x' is not defined

In [ ]:
X=7; varijabla_s_vrlo_dugackim_imenom = 747

In [ ]:
x=1; y=-2

In [ ]:
x==y

In [ ]:
(x==y)==False

In [ ]:
x!=y

In [ ]:
x==y or (x>0 and not y>0)

In [ ]:
z = 42 if x==y else 66

In [ ]:
z

In [ ]:
moj_string = 'ringe ringe'

In [ ]:
'hopa' + ' ' + "cupa"

In [ ]:
moj_string += ' raja'; moj_string

In [ ]:
len(moj_string)

In [ ]:
print "X=%0.2f y=%d, s='%s'" % (x, y, moj_string)

In [ ]:
1/2

In [ ]:
1/2.0

In [ ]:
1/float(2)

In [ ]:
round(0.5)

3.2. Matematičke funkcije


In [ ]:
import math

In [ ]:
math.sqrt(68)

In [ ]:
math.exp(1)

In [ ]:
math.log(_)

In [ ]:
math.log(100, 2)

3.3. Lista


In [ ]:
xs = [5, 6, 2, 3]   # Stvara listu

In [ ]:
xs

In [ ]:
xs[0]  # Zero-based indeksiranje

In [ ]:
xs[-1]  # Negativni indeksi broje od kraja liste

In [ ]:
xs[1] = 100  # Ažuriranje liste
xs

In [ ]:
xs[1] = 'foo'  # Liste mogu biti heterogene
xs

In [ ]:
xs[3] = [1,2]
xs

In [ ]:
xs.append(x)  # Dodaje na kraj
xs

In [ ]:
xs + [77, 88]

In [ ]:
xs.extend([77, 88]); xs

In [ ]:
xs.pop()  # Skida zadnji element liste

In [ ]:
xs

In [ ]:
xs[0:2]

In [ ]:
xs[1:]

In [ ]:
xs[:3]

In [ ]:
xs[:]

In [ ]:
xs[:-2]  # Sve osim zadnja dva

In [ ]:
xs[0:2] = [1,2]
xs

In [ ]:
range(10)

In [ ]:
range(1, 10)

In [ ]:
range(0, 51, 5)

In [ ]:
for x in range(5):
    print x

In [ ]:
for x in xs: print x

In [ ]:
for ix, x in enumerate(range(0, 51, 5)):
  print ix, x

In [ ]:
xs = []
for x in range(10):
    xs.append(x ** 2)
xs

In [ ]:
[x ** 2 for x in range(10)]

In [ ]:
[x ** 2 for x in range(10) if x % 2 == 0]

In [ ]:
[(x, x ** 2) for x in range(10)]

In [ ]:
zip([1, 2, 3], [4, 5, 6])

In [ ]:
zip(*[(1, 4), (2, 5), (3, 6)])

In [ ]:
xs, ys = zip(*[(1, 4), (2, 5), (3, 6)])

In [ ]:
xs

In [ ]:
map(lambda x : x + 1, xs)

In [ ]:
[ x + 1 for x in xs ]

In [ ]:
ys = []
for x in xs :
    ys.append(x + 1)
ys

In [ ]:
sum(ys)

3.4. Rječnik (mapa)


In [17]:
d = {'zagreb' : 790017, 'split' : 178102, 'rijeka' : 128624}

In [18]:
d['split']


Out[18]:
178102

In [19]:
d['osijek']


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-19-b6288821d7a0> in <module>()
----> 1 d['osijek']

KeyError: 'osijek'

In [20]:
d.get('osijek', 0)


Out[20]:
0

In [21]:
d['osijek'] = 108048; d


Out[21]:
{'osijek': 108048, 'rijeka': 128624, 'split': 178102, 'zagreb': 790017}

In [22]:
'rijeka' in d


Out[22]:
True

In [23]:
d['zagreb'] = 790200; d


Out[23]:
{'osijek': 108048, 'rijeka': 128624, 'split': 178102, 'zagreb': 790200}

In [24]:
del d['rijeka']; d


Out[24]:
{'osijek': 108048, 'split': 178102, 'zagreb': 790200}

Iteriranje po rječniku:


In [25]:
for grad in d:
    print 'Grad %s ima %d stanovnika' % (grad, d[grad])


Grad osijek ima 108048 stanovnika
Grad split ima 178102 stanovnika
Grad zagreb ima 790200 stanovnika

Iteriranje po ključevima i po vrijednostima:


In [26]:
for grad, stanovnici in d.iteritems():
    print 'Grad %s ima %d stanovnika' % (grad, stanovnici)


Grad osijek ima 108048 stanovnika
Grad split ima 178102 stanovnika
Grad zagreb ima 790200 stanovnika

Ugniježđeni rječnici:


In [27]:
d2 = {'zagreb' : {'trešnjevka' : 120240, 'centar' : 145302}}
d2 ['zagreb']['trešnjevka']


Out[27]:
120240

3.5. Funkcije


In [28]:
def inc(x): return x + 1

In [29]:
def sign(x):
    if x > 0:
        return 'pozitivno'
    elif x < 0:
        return 'negativno'
    else:
        return 'nula'

for x in [-1, 0, 1]:
    print sign(x)


negativno
nula
pozitivno

Podrazumijevani argumenti:


In [30]:
def broj_stanovnika(grad, godina=2015):
    if grad in d:
        return d[grad] + round((godina - 2015) * 10000 * (-1.2))
    else: 
        raise ValueError('Nepoznat neki grad')

In [31]:
broj_stanovnika('zagreb')


Out[31]:
790200.0

In [32]:
broj_stanovnika('zagreb', godina=2020)


Out[32]:
730200.0

In [33]:
broj_stanovnika('zadar')


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-33-ed4c8aabd867> in <module>()
----> 1 broj_stanovnika('zadar')

<ipython-input-30-4ea6caac6059> in broj_stanovnika(grad, godina)
      3         return d[grad] + round((godina - 2015) * 10000 * (-1.2))
      4     else:
----> 5         raise ValueError('Nepoznat neki grad')

ValueError: Nepoznat neki grad

3.6. Klase


In [34]:
class RegistarStanovnika:

    # Konstruktor
    def __init__(self, drzava, d):
        self.drzava = drzava  # Varijabla instance (drugačija za svaku instancu)
        self.d = d  
        
    prirast = -1.2  # Varijabla klase (dijele ju sve instance)

    # Metoda
    def broj_stanovnika(self, grad, godina=2015):
        if grad in self.d:
            return self.d[grad] + round((godina - 2015) * 10000 * self.prirast)
        else: 
            raise ValueError('Nepoznat neki grad')
    
    def ukupno_stanovnika(self):
        return sum(self.d.values())

In [35]:
reg = RegistarStanovnika('Hrvatska', {'zagreb' : 790017, 'split' : 178102, 'rijeka' : 128624})

In [36]:
reg.broj_stanovnika('split')


Out[36]:
178102.0

In [37]:
reg.ukupno_stanovnika()


Out[37]:
1096743

4. Numpy


In [1]:
import numpy as np

In [39]:
?np

In [40]:
np.__version__


Out[40]:
'1.10.0.post2'

4.1. Polja

Jednodimenzijsko polje (polje ranga 1):


In [2]:
a = np.array([1, 2, 3])

In [42]:
a


Out[42]:
array([1, 2, 3])

In [43]:
print a


[1 2 3]

In [44]:
type(a)


Out[44]:
numpy.ndarray

In [45]:
a = np.array([1, 2, 3], dtype=np.float64)

In [46]:
a


Out[46]:
array([ 1.,  2.,  3.])

In [47]:
a[0]


Out[47]:
1.0

In [48]:
a[0] = 100; a


Out[48]:
array([ 100.,    2.,    3.])

In [49]:
a.shape


Out[49]:
(3,)

In [50]:
len(a)


Out[50]:
3

In [51]:
np.array([1,'a',2])


Out[51]:
array(['1', 'a', '2'], 
      dtype='|S21')

Matrica (dvodimenzijsko polje, polje ranga 2):


In [52]:
m = np.array([[1,2,3],[4,5,6]])

In [53]:
print m


[[1 2 3]
 [4 5 6]]

In [54]:
m[1]


Out[54]:
array([4, 5, 6])

In [55]:
m[1,1]


Out[55]:
5

In [56]:
m[1][1]


Out[56]:
5

In [57]:
m.shape


Out[57]:
(2, 3)

In [58]:
m2 = np.array([[1,2,3],[4,5]])

In [59]:
print m2


[[1, 2, 3] [4, 5]]

Izrezivanje (engl. slicing):


In [60]:
print m


[[1 2 3]
 [4 5 6]]

In [61]:
m[:,1]


Out[61]:
array([2, 5])

In [62]:
m[0,1:3]


Out[62]:
array([2, 3])

In [63]:
m[1,:2] = [77, 78]

In [64]:
m


Out[64]:
array([[ 1,  2,  3],
       [77, 78,  6]])

Uočiti razliku:


In [65]:
m[:,0]  # daje polje ranga 1


Out[65]:
array([ 1, 77])

In [66]:
m[:,0:1]  # daje polje ranga 2


Out[66]:
array([[ 1],
       [77]])

Trodimenzijsko polje (tenzor ranga 3):


In [67]:
t = np.array([[[1,2],[3,4]],[[4,5],[6,7]]])

In [68]:
t.shape


Out[68]:
(2, 2, 2)

In [69]:
t[0,1,1]


Out[69]:
4

In [70]:
t[0]


Out[70]:
array([[1, 2],
       [3, 4]])

In [71]:
t[0,:,1]


Out[71]:
array([2, 4])

4.2. Stvaranje polja


In [6]:
np.zeros((5,5))


Out[6]:
array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

In [7]:
np.ones((3,1))


Out[7]:
array([[ 1.],
       [ 1.],
       [ 1.]])

In [10]:
np.full((5,5), 55)


Out[10]:
array([[ 55.,  55.,  55.,  55.,  55.],
       [ 55.,  55.,  55.,  55.,  55.],
       [ 55.,  55.,  55.,  55.,  55.],
       [ 55.,  55.,  55.,  55.,  55.],
       [ 55.,  55.,  55.,  55.,  55.]])

In [11]:
np.eye(6)


Out[11]:
array([[ 1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.]])

In [14]:
np.random.random((4,4))


Out[14]:
array([[ 0.34294388,  0.2407478 ,  0.54200906,  0.85047566],
       [ 0.84029626,  0.60056098,  0.84275663,  0.40206247],
       [ 0.59492988,  0.69943282,  0.67430892,  0.7806004 ],
       [ 0.07980246,  0.92217663,  0.35797981,  0.67351464]])

In [18]:
np.arange(1, 10)


Out[18]:
array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [19]:
np.arange(1, 10, 2)


Out[19]:
array([1, 3, 5, 7, 9])

In [20]:
np.linspace(1, 10, 5)


Out[20]:
array([  1.  ,   3.25,   5.5 ,   7.75,  10.  ])

In [23]:
np.linspace(1, 10)


Out[23]:
array([  1.        ,   1.18367347,   1.36734694,   1.55102041,
         1.73469388,   1.91836735,   2.10204082,   2.28571429,
         2.46938776,   2.65306122,   2.83673469,   3.02040816,
         3.20408163,   3.3877551 ,   3.57142857,   3.75510204,
         3.93877551,   4.12244898,   4.30612245,   4.48979592,
         4.67346939,   4.85714286,   5.04081633,   5.2244898 ,
         5.40816327,   5.59183673,   5.7755102 ,   5.95918367,
         6.14285714,   6.32653061,   6.51020408,   6.69387755,
         6.87755102,   7.06122449,   7.24489796,   7.42857143,
         7.6122449 ,   7.79591837,   7.97959184,   8.16326531,
         8.34693878,   8.53061224,   8.71428571,   8.89795918,
         9.08163265,   9.26530612,   9.44897959,   9.63265306,
         9.81632653,  10.        ])

4.3. Napredno indeksiranje

Indeksiranje poljem brojeva:


In [26]:
a = np.array([[1,2], [3, 4], [5, 6]]); a


Out[26]:
array([[1, 2],
       [3, 4],
       [5, 6]])

In [28]:
a[0,1]


Out[28]:
2

In [29]:
a[[0,2]]   # Nije isto kao a[0,2] !


Out[29]:
array([[1, 2],
       [5, 6]])

In [30]:
a[[0,1,2], [0,1,0]]   # Isto kao: np.array([a[0,0], a[1,1], a[2,0]])


Out[30]:
array([1, 4, 5])

Indeksiranje Booleovim poljem:


In [31]:
a


Out[31]:
array([[1, 2],
       [3, 4],
       [5, 6]])

In [32]:
bool_ix = a > 2
bool_ix


Out[32]:
array([[False, False],
       [ True,  True],
       [ True,  True]], dtype=bool)

In [33]:
a[bool_ix]


Out[33]:
array([3, 4, 5, 6])

In [34]:
a[a > 2]


Out[34]:
array([3, 4, 5, 6])

4.4. Širenje i naslagivanje

Širenje (eng. broadcasting):


In [35]:
x = np.array([[1, 2], [3, 4]])
v = np.array([1, 2])

In [37]:
print x


[[1 2]
 [3 4]]

In [89]:
x + v


Out[89]:
array([[2, 4],
       [4, 6]])

In [90]:
np.ones((2,2,3)) * 5


Out[90]:
array([[[ 5.,  5.,  5.],
        [ 5.,  5.,  5.]],

       [[ 5.,  5.,  5.],
        [ 5.,  5.,  5.]]])

Naslagivanje (engl. stacking):


In [38]:
v


Out[38]:
array([1, 2])

In [40]:
np.vstack([v, v])


Out[40]:
array([[1, 2],
       [1, 2]])

In [41]:
np.vstack([x, x])


Out[41]:
array([[1, 2],
       [3, 4],
       [1, 2],
       [3, 4]])

In [42]:
np.vstack((v, x))


Out[42]:
array([[1, 2],
       [1, 2],
       [3, 4]])

In [43]:
np.hstack((v, v))


Out[43]:
array([1, 2, 1, 2])

In [44]:
np.hstack((x, x))


Out[44]:
array([[1, 2, 1, 2],
       [3, 4, 3, 4]])

In [45]:
np.hstack((v, x))


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-45-6d17658301dc> in <module>()
----> 1 np.hstack((v, x))

/usr/local/lib/python2.7/dist-packages/numpy/core/shape_base.pyc in hstack(tup)
    276     # As a special case, dimension 0 of 1-dimensional arrays is "horizontal"
    277     if arrs[0].ndim == 1:
--> 278         return _nx.concatenate(arrs, 0)
    279     else:
    280         return _nx.concatenate(arrs, 1)

ValueError: all the input arrays must have same number of dimensions

In [46]:
np.column_stack((v, x))


Out[46]:
array([[1, 1, 2],
       [2, 3, 4]])

In [47]:
x


Out[47]:
array([[1, 2],
       [3, 4]])

In [98]:
np.dstack((x, x))


Out[98]:
array([[[1, 1],
        [2, 2]],

       [[3, 3],
        [4, 4]]])

In [99]:
np.shape(_)


Out[99]:
(2, 2, 2)

Preoblikovanje polja:


In [50]:
m = np.array([[ 1,  2,  3], [77, 78,  6]])
m.reshape(3, 2)


Out[50]:
array([[ 1,  2],
       [ 3, 77],
       [78,  6]])

4.4. Operacije s poljima (vektorske i matrične operacije)


In [51]:
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

In [102]:
print x; print y


[[1 2]
 [3 4]]
[[5 6]
 [7 8]]

Operacije "po elementima" (element-wise):


In [53]:
x + y


Out[53]:
array([[ 6,  8],
       [10, 12]])

In [54]:
x - y


Out[54]:
array([[-4, -4],
       [-4, -4]])

In [55]:
x / 2.0


Out[55]:
array([[ 0.5,  1. ],
       [ 1.5,  2. ]])

In [56]:
x.dtype


Out[56]:
dtype('int64')

In [57]:
(x/2.0).dtype


Out[57]:
dtype('float64')

In [58]:
x * y


Out[58]:
array([[ 5, 12],
       [21, 32]])

In [59]:
x.dtype='float64'
y.dtype='float64'

In [60]:
x / y


Out[60]:
array([[ 0.2       ,  0.33333333],
       [ 0.42857143,  0.5       ]])

In [111]:
np.sqrt(x)


Out[111]:
array([[  2.22275875e-162,   3.14345557e-162],
       [  3.84993109e-162,   4.44551750e-162]])

Vektorske/matrične operacije:


In [61]:
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])
v = np.array([1,2])
w = np.array([5,3])

Skalarni (unutarnji, dot) umnožak vektora: $ \begin{pmatrix} 1 & 2 \\ \end{pmatrix} \cdot \begin{pmatrix} 5\\ 3\\ \end{pmatrix} = 11 $


In [113]:
print v.dot(w)
print w.dot(v)
print np.dot(v, w)


11
11
11

Umnožak matrice i vektora: $ \begin{pmatrix} 1 & 2 \\ 3 & 4 \\ \end{pmatrix} \cdot \begin{pmatrix} 1\ 2\

\end{pmatrix}

\begin{pmatrix} 5\\ 11\\ \end{pmatrix}

$


In [62]:
x.dot(v)


Out[62]:
array([ 5, 11])

In [63]:
np.dot(x, v)


Out[63]:
array([ 5, 11])

Umnožak vektora i matrice: $ \begin{pmatrix} 1 & 2\\ \end{pmatrix} \cdot \begin{pmatrix} 1 & 2 \ 3 & 4 \

\end{pmatrix}

\begin{pmatrix} 7 & 10\\ \end{pmatrix}

$


In [120]:
v.dot(x)


Out[120]:
array([ 7, 10])

In [119]:
np.dot(v,x)


Out[119]:
array([ 7, 10])

Primijetite da nema razlike između vektor-stupca i vektor-retka.

Umnožak matrice i matrice: $ \begin{pmatrix} 1 & 2\\ 3 & 4\\ \end{pmatrix} \cdot \begin{pmatrix} 5 & 6 \ 7 & 7 \

\end{pmatrix}

\begin{pmatrix} 19 & 22\\ 43 & 50\\ \end{pmatrix}

$


In [122]:
x.dot(y)


Out[122]:
array([[19, 22],
       [43, 50]])

In [121]:
np.dot(x, y)


Out[121]:
array([[19, 22],
       [43, 50]])

Vanjski umnožak vektora: $ \begin{pmatrix} 1\\ 2\\ \end{pmatrix} \times \begin{pmatrix} 5 \ 3 \

\end{pmatrix}

\begin{pmatrix} 1\\ 2\\ \end{pmatrix}

\cdot \begin{pmatrix} 5 & 3\

\end{pmatrix}

\begin{pmatrix} 5 & 3 \\ 10 & 6 \\ \end{pmatrix}

$


In [64]:
np.outer(v, w)


Out[64]:
array([[ 5,  3],
       [10,  6]])

Ostale operacije:


In [65]:
x = np.array([0, 2, 4, 1])

In [66]:
np.max(x)


Out[66]:
4

In [67]:
np.argmax(x)


Out[67]:
2

4.5. Statističke funkcije


In [76]:
x = np.random.random(10); x


Out[76]:
array([ 0.88042383,  0.22280293,  0.00769093,  0.9631947 ,  0.82314693,
        0.6021121 ,  0.42227832,  0.54826309,  0.10995267,  0.27862066])

In [69]:
np.mean(x)


Out[69]:
0.41554955347505657

In [70]:
np.median(x)


Out[70]:
0.28811051704517238

In [71]:
np.var(x)


Out[71]:
0.09738450671168114

In [72]:
np.std(x)


Out[72]:
0.31206490785040403

In [73]:
x = np.array([1, 2, np.nan])
np.mean(x)


Out[73]:
nan

In [74]:
np.nanmean(x)


Out[74]:
1.5

In [77]:
np.ptp(x)


Out[77]:
0.95550376524614955

In [85]:
X = np.array([[1,2],[3,4]])
print X


[[1 2]
 [3 4]]

In [79]:
np.mean(X)


Out[79]:
2.5

In [83]:
np.mean(X, axis=0)


Out[83]:
array([ 2.,  3.])

In [86]:
np.cov(X)


Out[86]:
array([[ 0.5,  0.5],
       [ 0.5,  0.5]])

In [87]:
x = np.random.random(10000); x


Out[87]:
array([ 0.45896982,  0.98801242,  0.97315322, ...,  0.2251306 ,
        0.28813426,  0.77343545])

In [88]:
np.histogram(x)


Out[88]:
(array([1028, 1024, 1029, 1022, 1014,  953, 1029,  960,  971,  970]),
 array([  8.52564207e-05,   1.00051953e-01,   2.00018649e-01,
          2.99985345e-01,   3.99952041e-01,   4.99918737e-01,
          5.99885433e-01,   6.99852129e-01,   7.99818825e-01,
          8.99785521e-01,   9.99752217e-01]))

4.6. Druge često korištene funkcije


In [89]:
x = np.array([[1,2],[3,4]]); x


Out[89]:
array([[1, 2],
       [3, 4]])

In [90]:
np.sum(x)


Out[90]:
10

In [143]:
np.sum(x, axis=0)


Out[143]:
array([4, 6])

In [144]:
np.sum(x, axis=1)


Out[144]:
array([3, 7])

In [91]:
x.T


Out[91]:
array([[1, 3],
       [2, 4]])

In [92]:
v


Out[92]:
array([1, 2])

In [93]:
v.T


Out[93]:
array([1, 2])

In [94]:
x.diagonal()


Out[94]:
array([1, 4])

In [95]:
x.trace()  # == x.sum(x.diagonal())


Out[95]:
5

Aplikacija funkcije na polje:


In [97]:
x


Out[97]:
array([[1, 2],
       [3, 4]])

In [99]:
np.apply_along_axis(sum, 1, x)


Out[99]:
array([3, 7])

In [100]:
np.apply_along_axis(len, 1, x)


Out[100]:
array([2, 2])

Većina ugrađenih funkcija su vektorizirane, tj. moguće ih je primijeniti na cijelo polje tako da provode operaciju nad pojedinačnim elementima polja. Npr.:


In [101]:
np.sign(x)


Out[101]:
array([[1, 1],
       [1, 1]])

In [102]:
np.log(x)


Out[102]:
array([[ 0.        ,  0.69314718],
       [ 1.09861229,  1.38629436]])

Isto vrijedi i za korisnički definirane funkcije koje su definirane pomoći vektoriziranih ugrađenih funkcija:


In [103]:
def inc(x) : return x + 1

In [104]:
inc(x)


Out[104]:
array([[2, 3],
       [4, 5]])

Složenije funkcije treba eksplicitno vektorizirati pomoću numpy.vectorize (ili jednostavno aplicirati funkciju u for petlji, što funkcija vectorize zapravo i radi).

Permutacije:


In [105]:
x = np.arange(0,10); x


Out[105]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [108]:
np.random.permutation(x)


Out[108]:
array([5, 3, 4, 0, 2, 7, 9, 8, 1, 6])

In [109]:
x


Out[109]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [110]:
np.random.shuffle(x); x


Out[110]:
array([1, 2, 9, 4, 7, 8, 0, 6, 5, 3])

In [111]:
x


Out[111]:
array([1, 2, 9, 4, 7, 8, 0, 6, 5, 3])

4.7. Konverzija lista <-> polje


In [112]:
l = [1, 2, 3]
a = np.array(l); a


Out[112]:
array([1, 2, 3])

In [113]:
list(a)


Out[113]:
[1, 2, 3]

In [114]:
a.tolist()


Out[114]:
[1, 2, 3]

In [115]:
l = [[1, 2, 3], [4,5,6]]
a = np.array(l); a


Out[115]:
array([[1, 2, 3],
       [4, 5, 6]])

In [116]:
list(a)


Out[116]:
[array([1, 2, 3]), array([4, 5, 6])]

In [117]:
a.tolist()


Out[117]:
[[1, 2, 3], [4, 5, 6]]

5. SciPy


In [118]:
import scipy as sp

In [119]:
sp.__version__


Out[119]:
'0.16.0'

SciPy importa NumPy. Npr.:


In [166]:
x = sp.array([1,2,3])

Iz biblioteke SciPy interesantni su nam moduli scipy.linalg i scipy.stats.

5.1. SciPy.linalg


In [122]:
from scipy import linalg

Inverz matrice:


In [123]:
y


Out[123]:
array([[5, 6],
       [7, 8]])

In [124]:
y_inv = linalg.inv(y); y_inv


Out[124]:
array([[-4. ,  3. ],
       [ 3.5, -2.5]])

In [125]:
sp.dot(y, y_inv)


Out[125]:
array([[  1.00000000e+00,   3.55271368e-15],
       [  0.00000000e+00,   1.00000000e+00]])

Determinanta:


In [126]:
linalg.det(y)


Out[126]:
-2.0000000000000036

Euklidska norma ($l_2$-norma) vektora: $\|\mathbf{x}\|_2 = \sqrt{\sum_i x_i^2}$


In [127]:
w


Out[127]:
array([5, 3])

In [128]:
linalg.norm(w)


Out[128]:
5.8309518948453007

Općenita $p$-norma: $\|\mathbf{x}\|_p = \big(\sum_i |x_i|^p\big)^{1/p}$


In [130]:
linalg.norm(w, ord=1)


Out[130]:
8

In [131]:
linalg.norm(w, ord=sp.inf)


Out[131]:
5

5.2. SciPy.stats


In [132]:
from scipy import stats

In [133]:
stats.norm


Out[133]:
<scipy.stats._continuous_distns.norm_gen at 0x7f970ba55e10>

In [136]:
stats.norm.pdf(0)


Out[136]:
0.3989422804014327

In [137]:
xs = sp.linspace(-2, 2, 10);

In [138]:
stats.norm.pdf(xs)


Out[138]:
array([ 0.05399097,  0.11897819,  0.21519246,  0.31944801,  0.38921247,
        0.38921247,  0.31944801,  0.21519246,  0.11897819,  0.05399097])

In [139]:
stats.norm.pdf(xs, loc=1, scale=2)


Out[139]:
array([ 0.0647588 ,  0.08817395,  0.11427077,  0.14095594,  0.16549503,
        0.18494385,  0.19671986,  0.19916355,  0.19192205,  0.17603266])

Uzorkovanje iz normalne distribucije:


In [140]:
stats.norm.rvs(loc=1, scale=2, size=10)


Out[140]:
array([-0.58147381,  3.71386122,  1.24616214,  1.79738483,  2.69992991,
       -0.01069313,  3.30819964,  1.69645732,  1.09588046, -1.44891004])

"Zamrzavanje" distribucije:


In [141]:
normal = stats.norm(1, 2)

In [142]:
normal.pdf(xs)


Out[142]:
array([ 0.0647588 ,  0.08817395,  0.11427077,  0.14095594,  0.16549503,
        0.18494385,  0.19671986,  0.19916355,  0.19192205,  0.17603266])

In [144]:
normal.rvs(size=5)


Out[144]:
array([-2.20691566,  0.9434288 , -3.30867649, -1.24524492, -0.95381916])

Multivarijatna Gaussova distribucija:


In [145]:
?stats.multivariate_normal

In [146]:
mean    = sp.array([1.0, 3.0])
cov     = sp.array([[2.0, 0.3], [0.5, 0.7]])
mnormal = stats.multivariate_normal(mean, cov)

In [148]:
mnormal.pdf([1, 0])


Out[148]:
5.9244062489471738e-05

In [149]:
np.random.seed(42)   # Radi reproducibilnosti rezultata
mnormal.rvs(size=5)


Out[149]:
array([[ 0.32295504,  2.7257843 ],
       [-0.19250187,  3.91415511],
       [ 1.37357915,  2.90579424],
       [-1.37198427,  3.02935659],
       [ 1.56503442,  3.5666907 ]])

Koeficijent korelacije:


In [150]:
x, y = np.random.random((2, 10))

In [152]:
y


Out[152]:
array([ 0.45606998,  0.78517596,  0.19967378,  0.51423444,  0.59241457,
        0.04645041,  0.60754485,  0.17052412,  0.06505159,  0.94888554])

In [153]:
stats.pearsonr(x, y)


Out[153]:
(0.30346130585985159, 0.39400530307438952)

6. Matplotlib

matplotlib sadrži više modula: pyplot, image, matplot3d, ...


In [160]:
import matplotlib.pyplot as plt
import matplotlib

In [161]:
matplotlib.__version__


Out[161]:
'1.4.3'

In [162]:
%pylab inline


Populating the interactive namespace from numpy and matplotlib
WARNING: pylab import has clobbered these variables: ['linalg', 'cov', 'normal', 'mean']
`%matplotlib` prevents importing * from pylab and numpy

pylab kombinira pyplot i numpy. Gornja naredba (ipython magic) osigurava da pplotovi budu renderirani direktno u bilježnicu, umjesto da otvoaraju zaseban prozor.

6.1. Funkcija plot


In [165]:
plt.plot([1,2,3,4,5], [4,5,5,7,3])
plt.show()



In [262]:
plt.plot([4,5,5,7,3]);



In [263]:
plt.plot([4,5,5,7,3], 'ro');



In [166]:
def f(x) : return x**2

In [167]:
xs = linspace(0,100); xs


Out[167]:
array([   0.        ,    2.04081633,    4.08163265,    6.12244898,
          8.16326531,   10.20408163,   12.24489796,   14.28571429,
         16.32653061,   18.36734694,   20.40816327,   22.44897959,
         24.48979592,   26.53061224,   28.57142857,   30.6122449 ,
         32.65306122,   34.69387755,   36.73469388,   38.7755102 ,
         40.81632653,   42.85714286,   44.89795918,   46.93877551,
         48.97959184,   51.02040816,   53.06122449,   55.10204082,
         57.14285714,   59.18367347,   61.2244898 ,   63.26530612,
         65.30612245,   67.34693878,   69.3877551 ,   71.42857143,
         73.46938776,   75.51020408,   77.55102041,   79.59183673,
         81.63265306,   83.67346939,   85.71428571,   87.75510204,
         89.79591837,   91.83673469,   93.87755102,   95.91836735,
         97.95918367,  100.        ])

In [266]:
f(xs)


Out[266]:
array([  0.00000000e+00,   4.16493128e+00,   1.66597251e+01,
         3.74843815e+01,   6.66389005e+01,   1.04123282e+02,
         1.49937526e+02,   2.04081633e+02,   2.66555602e+02,
         3.37359434e+02,   4.16493128e+02,   5.03956685e+02,
         5.99750104e+02,   7.03873386e+02,   8.16326531e+02,
         9.37109538e+02,   1.06622241e+03,   1.20366514e+03,
         1.34943773e+03,   1.50354019e+03,   1.66597251e+03,
         1.83673469e+03,   2.01582674e+03,   2.20324865e+03,
         2.39900042e+03,   2.60308205e+03,   2.81549354e+03,
         3.03623490e+03,   3.26530612e+03,   3.50270721e+03,
         3.74843815e+03,   4.00249896e+03,   4.26488963e+03,
         4.53561016e+03,   4.81466056e+03,   5.10204082e+03,
         5.39775094e+03,   5.70179092e+03,   6.01416077e+03,
         6.33486047e+03,   6.66389005e+03,   7.00124948e+03,
         7.34693878e+03,   7.70095793e+03,   8.06330696e+03,
         8.43398584e+03,   8.81299459e+03,   9.20033319e+03,
         9.59600167e+03,   1.00000000e+04])

In [168]:
plt.plot(xs, f(xs));



In [268]:
plt.plot(xs, f(xs), 'bo');



In [269]:
plt.plot(xs, f(xs), 'r+');



In [169]:
plt.plot(xs, 1 - f(xs), 'b', xs, f(xs)/2 - 1000, 'r--');



In [170]:
plt.plot(xs, f(xs), label='f(x)')
plt.plot(xs, 1 - f(xs), label='1-f(x)')
plt.legend()
plt.show()



In [171]:
xs = linspace(-5,5)
plt.plot(xs, stats.norm.pdf(xs), 'g--');
plt.plot(xs, stats.norm.pdf(xs, loc=1, scale=2), 'r', linewidth=3);


6.2. Funkcija scatter


In [173]:
plt.scatter([0, 1, 2, 0], [4, 5, 2, 1])
plt.show()



In [274]:
plt.scatter([0,1,2,0], [4, 5, 2, 1], s=200, marker='s');



In [174]:
np.random.random(10)


Out[174]:
array([ 0.96563203,  0.80839735,  0.30461377,  0.09767211,  0.68423303,
        0.44015249,  0.12203823,  0.49517691,  0.03438852,  0.9093204 ])

In [175]:
for c in 'rgb':
  plt.scatter(sp.random.random(100), sp.random.random(100), s=200, alpha=0.5, marker='o', c=c)


6.3. Grafikon konture i gustoće


In [178]:
x = np.linspace(1,5,5); x


Out[178]:
array([ 1.,  2.,  3.,  4.,  5.])

In [179]:
X, Y = np.meshgrid(x, x)

In [180]:
X


Out[180]:
array([[ 1.,  2.,  3.,  4.,  5.],
       [ 1.,  2.,  3.,  4.,  5.],
       [ 1.,  2.,  3.,  4.,  5.],
       [ 1.,  2.,  3.,  4.,  5.],
       [ 1.,  2.,  3.,  4.,  5.]])

In [181]:
Y


Out[181]:
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.,  4.],
       [ 5.,  5.,  5.,  5.,  5.]])

In [182]:
Z = 10 * X + Y
Z


Out[182]:
array([[ 11.,  21.,  31.,  41.,  51.],
       [ 12.,  22.,  32.,  42.,  52.],
       [ 13.,  23.,  33.,  43.,  53.],
       [ 14.,  24.,  34.,  44.,  54.],
       [ 15.,  25.,  35.,  45.,  55.]])

In [183]:
plt.pcolormesh(X, Y, Z, cmap='gray')
plt.show()



In [184]:
mnormal = stats.multivariate_normal([0, 1], [[1, 1], [0.2, 3]])

In [185]:
mnormal.pdf([1,1])


Out[185]:
0.055730458106194758

In [186]:
x = np.linspace(-1, 1)
y = np.linspace(-2, 2)
X, Y = np.meshgrid(x, y)

In [187]:
shape(X)


Out[187]:
(50, 50)

In [188]:


In [189]:
shape(XY)


Out[189]:
(50, 50, 2)

In [190]:
mnormal.pdf(XY)


Out[190]:
array([[ 0.01492383,  0.01541297,  0.01589129, ...,  0.01095132,
         0.01044738,  0.00994981],
       [ 0.01610377,  0.01663533,  0.01715544, ...,  0.01194288,
         0.01139588,  0.01085559],
       [ 0.01733793,  0.01791426,  0.01847852, ...,  0.01299494,
         0.01240254,  0.01181718],
       ..., 
       [ 0.04679266,  0.04884039,  0.05089173, ...,  0.05646055,
         0.0544354 ,  0.05239434],
       [ 0.04542256,  0.04742101,  0.04942386, ...,  0.05539039,
         0.05341564,  0.0514244 ],
       [ 0.04399343,  0.04593934,  0.04789039, ...,  0.0542183 ,
         0.05229712,  0.05035891]])

In [191]:
plt.pcolormesh(X, Y, mnormal.pdf(XY))
plt.show()



In [291]:
plt.contourf(X, Y, mnormal.pdf(XY));



In [292]:
plt.contourf(X, Y, mnormal.pdf(XY), levels=[0,0.06, 0.07]);



In [193]:
plt.contour(X, Y, mnormal.pdf(XY));



In [194]:
x = linspace(-10,10)
X, Y = np.meshgrid(x, x)
Z = X*3 + Y

In [195]:
plt.contour(X, Y, Z);



In [296]:
plt.contour(X, Y, Z, levels=[0]);


Kombinacija više grafikona:


In [297]:
plt.contour(X, Y, Z, levels=[0])
plt.scatter([-5,-3,2,5], [4, 5, 2, 1])
plt.show()


6.4. Histogram


In [298]:
np.random.seed(42)
x = stats.norm.rvs(size=1000)

In [299]:
plt.hist(x);


Više-manje istovjetno s:


In [300]:
hist, bins = np.histogram(x)
centers = (bins[:-1] + bins[1:]) / 2
plt.bar(centers, hist);


6.5. Podgrafikoni

TODO

7. Pandas


In [301]:
import pandas as pd
pd.__version__


Out[301]:
u'0.17.0'

TODO

8. Sklearn


In [302]:
import sklearn
sklearn.__version__


Out[302]:
'0.15.2'

TODO