Numpy 笔记



In [1]:

    
%matplotlib inline

import matplotlib
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np

plt.style.use('ggplot')

多维数组的创建

从 Python 的 list 中创建



In [2]:

    
np.array([1, 2, 3, 4])









    Out[2]:





array([1, 2, 3, 4])

从 Python 的 tuple 中创建



In [3]:

    
np.array((1, 2, 3, 4))









    Out[3]:





array([1, 2, 3, 4])

从 Python 的 str 中创建



In [4]:

    
np.fromstring('1 2 3 4', dtype=int, sep=' ')









    Out[4]:





array([1, 2, 3, 4])

从可迭代对象(iterable object)中创建



In [5]:

    
def count_generator():
    for i in range(4):
        yield i

print 'from list: %r' % np.fromiter([1, 2, 3, 4], int)
print 'from tuple: %r' % np.fromiter([1, 2, 3, 4], int)
print 'from string: %r' % np.fromiter('1234', int)
print 'from unicode: %r' % np.fromiter(u'白日依山尽', 'U1')
print 'from generator: %r' % np.fromiter(count_generator(), int)









    



from list: array([1, 2, 3, 4])
from tuple: array([1, 2, 3, 4])
from string: array([1, 2, 3, 4])
from unicode: array([u'\u767d', u'\u65e5', u'\u4f9d', u'\u5c71', u'\u5c3d'], 
      dtype='<U1')
from generator: array([0, 1, 2, 3])

diag 用给定的值来创建对角矩阵



In [6]:

    
print repr(np.diag([1, 1, 2]))
print repr(np.diag([3, 4]))
print repr(np.diag([3, 4], k=1))









    



array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 2]])
array([[3, 0],
       [0, 4]])
array([[0, 3, 0],
       [0, 0, 4],
       [0, 0, 0]])

diagflat 和 diag 类似，也用给定的值来创建对角矩阵，但不同的是，diagflat 会将参数值转换成一维数组，再用之建立对角矩阵



In [7]:

    
print repr(np.diag([[1, 2], [3, 4]]))
print repr(np.diagflat([[1, 2], [3, 4]]))









    



array([1, 4])
array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])

以上方法都是从已有的数据中创建多维数组，下面一些方法则不依赖已有数据。

ones 方法建立多维数组并用 1 填充



In [8]:

    
np.ones((3, 4))









    Out[8]:





array([[ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.]])

zeros 方法建立多维数组并用 0 填充



In [9]:

    
np.zeros((3, 4))









    Out[9]:





array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

full 方法建立多维数组并用用户指定的值来填充



In [10]:

    
np.full((3, 4), 17)









    



/home/linusp/Projects/panic-notebook/venv/local/lib/python2.7/site-packages/numpy/core/numeric.py:301: FutureWarning: in the future, full((3, 4), 17) will return an array of dtype('int64')
  format(shape, fill_value, array(fill_value).dtype), FutureWarning)






    Out[10]:





array([[ 17.,  17.,  17.,  17.],
       [ 17.,  17.,  17.,  17.],
       [ 17.,  17.,  17.,  17.]])

eye 用来创建对角矩阵(二维数组)



In [11]:

    
print repr(np.eye(2))
print repr(np.eye(2, 3))









    



array([[ 1.,  0.],
       [ 0.,  1.]])
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.]])

identity 用来创建单位矩阵



In [12]:

    
print repr(np.identity(2))
print repr(np.identity(3))









    



array([[ 1.,  0.],
       [ 0.,  1.]])
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

arange 创建一个一维数组，其中的值是有序的数字序列，用法和 Python 的内建方法 range 一样



In [13]:

    
print repr(np.arange(10))
print repr(np.arange(0, 10))
print repr(np.arange(0, 10, 2))









    



array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
array([0, 2, 4, 6, 8])

linspace 用给定的区间中的 N 等分点组成一个一维数组



In [14]:

    
np.linspace(0, 4, num=9)









    Out[14]:





array([ 0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ])

在绘制曲线的时候经常用到这个方法



In [15]:

    
x = np.linspace(0, 10, num=100)
plt.plot(x, np.sin(x))









    Out[15]:





[<matplotlib.lines.Line2D at 0x46718d0>]

logspace 与 linspace 类似，但它取的区间是 对数尺度(log scale)区间 ，返回的就结果是 线性尺度(linear scale) 上的数值



In [16]:

    
np.logspace(1, 3, num=3)









    Out[16]:





array([   10.,   100.,  1000.])

meshgrid 用给定的坐标向量创建坐标矩阵



In [17]:

    
x, y = np.meshgrid([-1, 0, 1], [0, 1])
print repr(x)
print repr(y)









    



array([[-1,  0,  1],
       [-1,  0,  1]])
array([[0, 0, 0],
       [1, 1, 1]])

在上面的例子中，x 由两行 [-1, 0, 1] 组成得到，而 y 则由三列 [0, 1] 组成得到。

下面是一个利用 meshgrid 来绘图的例子。



In [18]:

    
x, y = np.meshgrid(np.linspace(-3.0, 3.0, 100), np.linspace(-3.0, 3.0, 100))
z = np.sqrt(x ** 2 + y ** 2)

cp = plt.contour(x, y, z)
plt.clabel(cp, inline=True, colors='k', fontsize=12)
cl = plt.contourf(x, y, z)
plt.colorbar(cl)
plt.title('Contour Plot')
plt.xlabel('x')
plt.ylabel('y')









    Out[18]:





<matplotlib.text.Text at 0x4698ad0>

以上的方法都会给多维数组填充确定的值，而 Numpy 的 random 模块还提供了很多的方法来在创建多维数组的时候填充随机数值。

random_sample 可以生成服从 [0.0, 1.0) 区间内均匀分布的随机数值



In [19]:

    
print repr(np.random.random_sample())
print repr(np.random.random_sample((3, 4)))









    



0.41940727346503204
array([[ 0.5901671 ,  0.93731165,  0.02541034,  0.99196221],
       [ 0.59594147,  0.45825893,  0.43615225,  0.18177226],
       [ 0.56498427,  0.59021779,  0.24237034,  0.62344351]])

需要注意的是， random, ranf 和 sample 这三个方法都是 random_sample 的别名。

randint 生成服从给定的整数区间内的离散均匀分布的随机整数值，生成的多维数组形状用参数 size 表示



In [20]:

    
print repr(np.random.randint(0, 9))
print repr(np.random.randint(0, 9, size=(3, 4)))









    



7
array([[6, 7, 8, 8],
       [6, 1, 6, 6],
       [1, 3, 6, 5]])

下面的代码进行 10000 次采样，可以看到绘制出来的直方图上，0-9 这 10 个数的样本数量差异是不大的。



In [21]:

    
sample = np.random.randint(0, 10, size=10000)
plt.hist(sample)
plt.title('Samples of numpy.random.randint')
plt.xlabel('sample')
plt.ylabel('count')









    Out[21]:





<matplotlib.text.Text at 0x4a4e090>

normal 可以生成服从特定正态分布的随机数值



In [22]:

    
mu, sigma = 0, 0.1
samples = np.random.normal(mu, sigma, size=10000)

count, bins, ignored = plt.hist(samples, 30, normed=True)
plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *
         np.exp( - (bins - mu)**2 / (2 * sigma**2) ),
         linewidth=2, color='b')

plt.title('Samples of numpy.random.normal')
plt.xlabel('sample')
plt.ylabel('count')









    Out[22]:





<matplotlib.text.Text at 0x4dc1d10>

multivariate_normal 可以生成服从多元正态分布的随机数值， randn 则可以生成服从多元标准正态分布的随机数值。



In [23]:

    
def hist2d_from_array(array):
    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')

    hist, e = np.histogramdd(array)
    xe, ye = e

    num_of_elem = (len(xe) - 1) * (len(ye) - 1)
    xpos, ypos = np.meshgrid(xe[:-1]+0.25, ye[:-1]+0.25)

    xpos = xpos.flatten()
    ypos = ypos.flatten()
    zpos = np.zeros(num_of_elem)
    dx = 0.5 * np.ones_like(zpos)
    dy = dx.copy()
    dz = hist.flatten()

    ax.bar3d(xpos, ypos, zpos, dx, dy, dz, color='#E24A33')
    ax.set_title('Samples of Normal Distribution')
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.set_zlabel('count')

用上面定义的方法，将视化由 randn 生成的多维数组的直方图可视化如下



In [24]:

    
r = np.random.randn(10000, 2)
hist2d_from_array(r)

multivariate_normal 生成的多维数组的直方图可视化如下(注意 x 轴和 y 轴的数值分布)



In [25]:

    
mean = (10, 20)
cov = [[1, 0], [0, 1]]
r = np.random.multivariate_normal(mean, cov, 10000)
hist2d_from_array(r)