numpy 进阶

1 内存模块


In [11]:
import numpy as np
x = np.array([1,2,3],dtype=np.int32)
x.data


Out[11]:
<read-write buffer for 0x109974990, size 12, offset 0 at 0x10997fab0>

In [12]:
str(x.data)


Out[12]:
'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00'

数值类型为np.int32,表明一个数字由四个字节表示,一个字节有8bit,而0x的十六进制中,一个数字代表了4个bit。

查看数据在内存中的位置


In [13]:
x.__array_interface__['data'][0]


Out[13]:
4358785056

查看数据详细信息


In [16]:
x.__array_interface__


Out[16]:
{'data': (4358785056, False),
 'descr': [('', '<i4')],
 'shape': (3,),
 'strides': None,
 'typestr': '<i4',
 'version': 3}

2 Data types

  • type scalar type of the data,int8, int16
  • itemsize
    data size of data block
  • byteorder
    byte order big-endian > litte-endian < not applicabel
  • fields
    sub-dtypes

  • shape
    shape of the array


In [35]:
np.dtype(int).type


Out[35]:
numpy.int64

In [36]:
np.dtype(int).byteorder


Out[36]:
'='

In [37]:
np.dtype(int).shape


Out[37]:
()

3 Strides

多少字节从一个元素跳到下一个元素


In [46]:
x = np.array([[1,2,3],
             [4,5,6],
             [7,8,9]],dtype=np.int8)
str(x.data)
x.strides


Out[46]:
(3, 1)

表明:跳转到下一行需要3个字节,跳转到下一列需要1个字节。


In [48]:
byte_offset = 3*1 + 1*2
x.flat[byte_offset]


Out[48]:
6

In [49]:
x[1,2]


Out[49]:
6

4 改变数据类型


In [41]:
x = np.array([1,2,3,4],dtype=np.uint8)
str(x.data)


Out[41]:
'\x01\x02\x03\x04'

In [42]:
x.dtype='<i2'
x


Out[42]:
array([ 513, 1027], dtype=int16)

In [44]:
0x0201,0x0403


Out[44]:
(513, 1027)

little-endian: least significant byte is on the left in memory

5 CPU 缓冲机制


In [57]:
x = np.zeros((10000,))
y = np.zeros((10000*67,))[::67]
%timeit x.sum()


The slowest run took 5.01 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 6.19 µs per loop

In [58]:
%timeit y.sum()


The slowest run took 116.58 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 25.2 µs per loop

In [59]:
x.strides,y.strides


Out[59]:
((8,), (536,))

In [ ]: