In [28]:
from __future__ import print_function
import numpy as np
from keras.preprocessing import image

class Iterator() 

生成一个 index_generator,通过调用 next() 方法该生成器可以产生一串index,生成的数据结构为

(index_array, current_index_value, current_batch_size)

class Iterator() 中参数:

  1. seed: 对随机数生成器设置种子,np.random.seed(seed + self.total_batches_seen),可以保证每次调用class Iterator()时使用新种子

  2. shuffle=True表示每次 epoch 开始前会先 shuffle 所有的 index, 即np.random.permutation(N),(仅当N个样本全部采样完后,才会使用此方法. 每次采样个数为current_batch_size


In [42]:
n_samples = 50
batch_size = 14
iterator = image.Iterator(N=n_samples, batch_size=batch_size, shuffle=True, seed=123)
for ii in range(5):
    data = next(iterator.index_generator)
    print ('current_batch_size: {0}'.format(data[-1]), data)


current_batch_size: 14 (array([10, 13, 30, 46, 18,  0, 40, 12, 29,  8, 21, 47, 11, 41]), 0, 14)
current_batch_size: 14 (array([ 5,  1,  6, 27, 49, 24, 31, 15, 35, 26,  7, 20, 48,  3]), 14, 14)
current_batch_size: 14 (array([23, 44,  4, 16, 36, 14, 43, 25, 37, 39,  9, 32, 33, 22]), 28, 14)
current_batch_size: 8 (array([42, 19, 17, 38, 34, 28,  2, 45]), 42, 8)
current_batch_size: 14 (array([ 7, 15,  8, 24, 11, 21, 48,  6, 33,  0, 20, 22, 35, 43]), 0, 14)

class NumpyArrayGenerator() 

每次生成样本数据和样本标签,这些样本是根据 class Iterator() 生成的 index_array采集的。 方法 class ImageDataGenerator().flow 会返回一个 class NumpyArrayGenerator() 实例


In [47]:
trn_images = np.random.uniform(size=(n_samples, 32, 32, 3))
trn_labels = np.random.randint(10, size=(n_samples))
img_generator = image.ImageDataGenerator()
numpy_iterator = image.NumpyArrayIterator(trn_images, trn_labels, img_generator, batch_size=batch_size, shuffle=True, seed=123)

for ii in range(5):
    batch_x, batch_y = numpy_iterator.next()
    print (batch_x.shape, batch_y.shape)


(14, 32, 32, 3) (14,)
(14, 32, 32, 3) (14,)
(14, 32, 32, 3) (14,)
(8, 32, 32, 3) (8,)
(14, 32, 32, 3) (14,)

class ImageDataGenerator()

methods:

flow(): 此方法是 class NumpyArrayGenerator() 的实例

flow_from_directory(): 此方法是 class DirectoryIterator() 的实例

standardize(x): 对输入矩阵 x 做标准化操作, 其中 x 代表一张图片矩阵

random_transform(x): 对输入矩阵 x 做随机变换,变换参数在实例化 class ImageDataGenerator() 时给出, 包括(rotation_range,
height_shift_range, width_shift_range, shear_range, zoom_range, channel_shift_range, horizontal_flip, vertical_flip

fit(X, rounds=1): 对输入训练样本矩阵 X 计算其统计数值 (mean, std), X 是所有训练样本图片的矩阵数据. rounds 表示对输入 X 进行扩增, rounds=2 表示扩增至原数据两倍,扩增方法采用 random_transform().

In [54]:
img_generator = image.ImageDataGenerator(featurewise_center=True, featurewise_std_normalization=True, zca_whitening=True)
img_generator.fit(trn_images)
img_flow = img_generator.flow(trn_images, trn_labels)
for ii in range(5):
    batch_x, batch_y = img_flow.next() # 这一步对每一批图片中的每一张进行 random_transform() 和 standardize() 操作
    print (batch_x.shape, batch_y.shape)


(32, 32, 32, 3) (32,)
(18, 32, 32, 3) (18,)
(32, 32, 32, 3) (32,)
(18, 32, 32, 3) (18,)
(32, 32, 32, 3) (32,)

In [57]:
print (img_generator.mean.shape, img_generator.std.shape, img_generator.principal_components.shape)


(32, 32, 3) (32, 32, 3) (3072, 3072)

class DirectoryIterator()


In [64]:
import getpass
directory = '/home/'+getpass.getuser()+'/git_test/test_data/examples/dogs_cats'
## 
img_generator = image.ImageDataGenerator(samplewise_center=True, samplewise_std_normalization=True)
dir_iterator = image.DirectoryIterator(directory, img_generator, batch_size=batch_size, shuffle=True)


Found 53 images belonging to 2 classes.

In [65]:
for ii in range(5):
    batch_x, batch_y = dir_iterator.next()
    print (batch_x.shape, batch_y.shape)


(14, 256, 256, 3) (14, 2)
(14, 256, 256, 3) (14, 2)
(14, 256, 256, 3) (14, 2)
(11, 256, 256, 3) (11, 2)
(14, 256, 256, 3) (14, 2)

In [ ]: