In [1]:
from __future__ import print_function

# to be able to see plots
%matplotlib inline  
import matplotlib.pyplot as plt

import numpy as np

import sys
sys.path.append("../tools")

from tools import collage

# just to use a fraction of GPU memory 
# This is not needed on dedicated machines.
# Allows you to share the GPU.
# This is specific to tensorflow.
gpu_memory_usage=0.8 
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = gpu_memory_usage
set_session(tf.Session(config=config))


Using TensorFlow backend.

Read dataset

Good corpus to start with is WIKI. A easy starting version can be downloaded from http://www.cs.upc.edu/~nlp/wikicorpus/.


In [2]:
with open('./en.wiki.txt', 'r') as f:
    data = f.read()
data = data[:50000000]
pos = np.random.randint(len(data)-500)
print(data[pos:pos+1000]) 

dataT = np.asarray(bytearray(data), dtype=np.uint8)


 worn by the Javanese people of Indonesia.
Barong Tagalog, an embroidered formal garment of the Philippines;
Kimono, the traditional garments of Japan;
Ao dai, traditional garments of Vietnam;
Morning dress, a particular category of men's formal dress;
Topor, a type of conical headgear;
Seelai,Tamil Brides traditional formal wear;
traditionally worn by grooms as part of the Bengali Hindu wedding ceremony
Tuxedo;
Black tie, indicating dinner jacket in the UK;
White tie, indicating evening dress in the UK;
Sherwani, a long coat-like garment worn in South Asia;
Wedding veil;
Wedding dress;

Music.
Western weddings.
Music often played at western weddings includes a processional song for walking down the aisle (ex: Wedding March) and reception dance music. More at wedding music.

African weddings.
In traditional African weddings its a combining of two families.Traditional music throughout Africa is almost always functional; in other words, it is performed to mark a ritual such as a wedding.

Create data generator

The generator creates mini-batches which of continuous sequences - that is: sequence 1 in batch t starts where sequence 1 ended in batch t-1.

reset() method restarts the sequences from new random positions.

The behavior is used in statefull behavior of the network during training.


In [6]:
class myGenerator(object):
    def __init__(self, data, length=48, batchSize=32):
        self.length = length
        self.batchSize = batchSize
        self.data = data
        self.positions = np.arange(0, data.shape[0], data.shape[0] / batchSize)
    
    def __iter__(self):
        return self
    
    def __next__(self):
        return self.next()
    
    def next(self):
        d = []
        l = []
        for i in range(self.batchSize):
            p = self.positions[i]
            d.append(self.data[p:p+length])
            l.append(self.data[p+1:p+length+1])
            self.positions[i] = (self.positions[i] + length) % (self.data.shape[0] - 2*length)
            
        return np.stack(d), np.stack(l).reshape(self.batchSize, self.length, 1)
    
    def reset(self):
        self.positions = np.random.randint(self.data.shape[0]-10*self.length, size=self.positions.size)
    
batchSize = 32    
length = 48
generator = myGenerator(dataT, length, batchSize)

Define net


In [7]:
from keras.layers import Embedding, CuDNNGRU, SimpleRNN, CuDNNLSTM
from keras.layers import Activation, BatchNormalization, Dense, Average, Maximum, Concatenate
from keras.models import Model
from keras import regularizers, initializers


def get_GRU_network(input_data, layer_cout, layer_dim, stateful=False):
    net = Embedding(256, output_dim=32)(input_data)
    net = BatchNormalization()(net)
    previous = []
    for i in range(layer_cout):
        net = CuDNNGRU(layer_dim, return_sequences=True, stateful=stateful, 
                       kernel_regularizer=regularizers.l2(0.000001),
                       recurrent_regularizer=regularizers.l2(0.000001))(net)
        previous.append(net)
        if len(previous) > 1:
            net1 = Average()(previous)
            net2 = Maximum()(previous)
            net = Concatenate()([net1, net2, net])
            
        net = Dense(layer_dim)(net)
        net = BatchNormalization()(net)
        net = Activation('relu')(net)
        
    net = Dense(256)(net)
    
    net = Activation('softmax')(net)

    return net


def get_GRU_simple_network(input_data, layer_cout, layer_dim, stateful=False):
    net = Embedding(256, output_dim=32)(input_data)
    net = BatchNormalization()(net)
    for i in range(layer_cout):
        net = CuDNNGRU(layer_dim, return_sequences=True, stateful=stateful)(net)
        net = BatchNormalization()(net)
       
    net = Dense(256)(net)
    net = Activation('softmax')(net)

    return net

In [8]:
from keras import optimizers
from keras.models import Model
from keras import losses
from keras import metrics
from keras.layers import Input

layerCount = 2
layerSize = 720

input_data = Input(batch_shape=(batchSize, length), name='data')
net = get_GRU_simple_network(input_data, layerCount, layerSize, stateful=True)
model = Model(inputs=[input_data], outputs=[net])

# model for text generation
input_data = Input(batch_shape=(1,1), name='data')
predictNet = get_GRU_simple_network(input_data, layerCount, layerSize, stateful=True)
predModel = Model(inputs=[input_data], outputs=[predictNet])

print('Model')
model.summary()


Model
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
data (InputLayer)            (32, 48)                  0         
_________________________________________________________________
embedding_1 (Embedding)      (32, 48, 32)              8192      
_________________________________________________________________
batch_normalization_1 (Batch (32, 48, 32)              128       
_________________________________________________________________
cu_dnngru_1 (CuDNNGRU)       (32, 48, 720)             1628640   
_________________________________________________________________
batch_normalization_2 (Batch (32, 48, 720)             2880      
_________________________________________________________________
cu_dnngru_2 (CuDNNGRU)       (32, 48, 720)             3114720   
_________________________________________________________________
batch_normalization_3 (Batch (32, 48, 720)             2880      
_________________________________________________________________
dense_1 (Dense)              (32, 48, 256)             184576    
_________________________________________________________________
activation_1 (Activation)    (32, 48, 256)             0         
=================================================================
Total params: 4,942,016
Trainable params: 4,939,072
Non-trainable params: 2,944
_________________________________________________________________

In [10]:
model.compile(
    loss=losses.sparse_categorical_crossentropy, 
    optimizer=optimizers.Adam(lr=0.0002, clipnorm=5., clipvalue=1), # no good reason for clipnorm and clipvalue just experimenting
    metrics=[metrics.sparse_categorical_accuracy])

In [12]:
import keras
# This callback resets sequence with probability 10% after each batch
class My_Callback(keras.callbacks.Callback):
    def on_batch_end(self, batch, logs={}):
        if np.random.rand() > 0.9:
            self.model.reset_states()
            generator.reset()
        return
 
model.fit_generator(generator=generator, steps_per_epoch=1000, epochs=30, verbose=1, callbacks=[My_Callback()])


Epoch 1/30
1000/1000 [==============================] - 39s 39ms/step - loss: 2.3082 - sparse_categorical_accuracy: 0.4172
Epoch 2/30
1000/1000 [==============================] - 38s 38ms/step - loss: 1.8571 - sparse_categorical_accuracy: 0.4916
Epoch 3/30
1000/1000 [==============================] - 37s 37ms/step - loss: 1.7399 - sparse_categorical_accuracy: 0.5163
Epoch 4/30
1000/1000 [==============================] - 38s 38ms/step - loss: 1.6557 - sparse_categorical_accuracy: 0.5322
Epoch 5/30
1000/1000 [==============================] - 38s 38ms/step - loss: 1.6031 - sparse_categorical_accuracy: 0.5448
Epoch 6/30
1000/1000 [==============================] - 38s 38ms/step - loss: 1.5812 - sparse_categorical_accuracy: 0.5491
Epoch 7/30
1000/1000 [==============================] - 37s 37ms/step - loss: 1.5405 - sparse_categorical_accuracy: 0.5596
Epoch 8/30
1000/1000 [==============================] - 39s 39ms/step - loss: 1.5332 - sparse_categorical_accuracy: 0.5607
Epoch 9/30
1000/1000 [==============================] - 39s 39ms/step - loss: 1.5161 - sparse_categorical_accuracy: 0.5642
Epoch 10/30
1000/1000 [==============================] - 39s 39ms/step - loss: 1.4923 - sparse_categorical_accuracy: 0.5697
Epoch 11/30
1000/1000 [==============================] - 39s 39ms/step - loss: 1.4837 - sparse_categorical_accuracy: 0.5721
Epoch 12/30
1000/1000 [==============================] - 39s 39ms/step - loss: 1.4759 - sparse_categorical_accuracy: 0.5737
Epoch 13/30
1000/1000 [==============================] - 39s 39ms/step - loss: 1.4425 - sparse_categorical_accuracy: 0.5814
Epoch 14/30
1000/1000 [==============================] - 40s 40ms/step - loss: 1.4390 - sparse_categorical_accuracy: 0.5831
Epoch 15/30
1000/1000 [==============================] - 40s 40ms/step - loss: 1.4254 - sparse_categorical_accuracy: 0.5862
Epoch 16/30
1000/1000 [==============================] - 39s 39ms/step - loss: 1.4293 - sparse_categorical_accuracy: 0.5861
Epoch 17/30
1000/1000 [==============================] - 39s 39ms/step - loss: 1.4191 - sparse_categorical_accuracy: 0.5880
Epoch 18/30
1000/1000 [==============================] - 39s 39ms/step - loss: 1.4069 - sparse_categorical_accuracy: 0.5914
Epoch 19/30
1000/1000 [==============================] - 39s 39ms/step - loss: 1.4030 - sparse_categorical_accuracy: 0.5915
Epoch 20/30
1000/1000 [==============================] - 39s 39ms/step - loss: 1.3973 - sparse_categorical_accuracy: 0.5925
Epoch 21/30
1000/1000 [==============================] - 39s 39ms/step - loss: 1.3954 - sparse_categorical_accuracy: 0.5941
Epoch 22/30
1000/1000 [==============================] - 39s 39ms/step - loss: 1.3888 - sparse_categorical_accuracy: 0.5955
Epoch 23/30
1000/1000 [==============================] - 39s 39ms/step - loss: 1.3845 - sparse_categorical_accuracy: 0.5971
Epoch 24/30
1000/1000 [==============================] - 39s 39ms/step - loss: 1.3794 - sparse_categorical_accuracy: 0.5981
Epoch 25/30
1000/1000 [==============================] - 39s 39ms/step - loss: 1.3686 - sparse_categorical_accuracy: 0.6009
Epoch 26/30
1000/1000 [==============================] - 38s 38ms/step - loss: 1.3680 - sparse_categorical_accuracy: 0.6010
Epoch 27/30
1000/1000 [==============================] - 38s 38ms/step - loss: 1.3737 - sparse_categorical_accuracy: 0.5990
Epoch 28/30
1000/1000 [==============================] - 38s 38ms/step - loss: 1.3554 - sparse_categorical_accuracy: 0.6042
Epoch 29/30
1000/1000 [==============================] - 38s 38ms/step - loss: 1.3600 - sparse_categorical_accuracy: 0.6032
Epoch 30/30
1000/1000 [==============================] - 38s 38ms/step - loss: 1.3484 - sparse_categorical_accuracy: 0.6056
Out[12]:
<keras.callbacks.History at 0x7fc57bb75090>

Generate text


In [13]:
model.save_weights('model.mod')
predModel.load_weights('model.mod', by_name=False)
predModel.reset_states()

In [19]:
last = np.zeros(1, dtype=int).reshape(1,1)
last[0,0] = dataT[10]
predModel.reset_states()
startString = u'Start the string with this'
if startString:
    startString = [ord(x) for x in startString]
    last[0,0] = startString[0]
    print(chr(last[0,0]), end='')
    for i in startString[1:]:
        pred = predModel.predict(last)
        last[0,0] = i
        print(chr(last[0,0]), end='')

for i in range(10000):
    pred = predModel.predict(last)

    #print(pred)
    p = pred[0,0]**1.2
    p /= p.sum()    

    last[0,0] = np.random.choice(p.size, p=p)
    print(chr(last[0,0]), end='')
    
print()


Start the string with this new material objects or course critically attractive declarations and early powers.  The state of his army in Nova Scotia.

In 1942 the Language Hockey Days (420 km). Later, a clothing company based on the race of unebury. She has become or training pressure cartoons  up with Odin as a punk rocket.

On 1 Seoul Credits also studied lapse stations in the first time almost entire that where participated in the Bay and pretended the early 2001">
The also set up fuse of the passing rights and the working system entity. There were he published the wing in Port Star Wars who were ramined the string of the Katah (San Duble).







ENDOFARTICLE.
</doc>
<doc id="6606142" title="Nevakiah District" nonfiltered="923" processed="720" dbindex="1007056">

In October 1823 substrict has been streaking off.  Colonel and land visit the result for the two sites of Savoy broadcasts her to grow freedom of the painting and accts authorized quick product.

In 2003 the cartoon XI is remarkable. The Sun aged for Station 2, The Yenolon Sardt, Andrews of Charleston Bar, the strings in the All-Potter site rivers rather over 100 sects, as she telephered he won the station on how some received justifications of Azti, which he was accompting no recreating Carnivers.

At the end of 1993, having concentration on the option was working on an engineer politician.  Your Roman Power We Hot win economic and the another three left band friends who introduces the Air Forces.

In 1962 he was ten years afterwards. At the north time- almost 2006 incubation of 1540 hours. The town home bank.

The bank climps is the infantry stage, and can raises have been a fleet to any modern dictator.  Feudal goes to Marian the new English addition was subsidians (second basement year).

The title.
Middle Gandman's allegation in St. Buffing College, Tokyo learn a vapid rock and human (which then a vanion solving for his brother), the Lakes (also known as Michaelan) was the founder of Handeball, and then Columbia University ethelesson Natural History.

The press division proposed stations such as Leley Gardner, the first time, expanded the Sein City Promotion attacks.  Following the first time he follows the main critic of the season campaign to be found in the suburb of Puerto Rico fails. He has been also by her subsequent text on power.

 See also .
Gardners   a deceivation of series a series of has a coverage new various years.

In 1988 he was set up to the 2004 Canadian Association of the Army Advanced Maska concepts for driving Alfred Atta in the Earth. 

At the World Cadam was selected under 2,008 to the United States River. Catholics attracts Macera also provides "Source Fantastic After Saint-Secret" and "non-power one" and "Banhide".

A population was small printed manifold.  Shortly forsified the same time.

The destruction location was created by many marious work, close to part out of Biolight. Some paid is previously part of the rest of the teachers.

In the 1960s King Watch was making a particular collaboration.  The Black Air Mesos at the scan offense that was distinctively to have cheer at the court of the Korean early vote in Greek citizens of the Chinese.

Students at the early 1990s.
On the 20th century is the end of 2002 .

The Live Games.

Video San Savals-bests created by Maurice Hans National Annuars friends.

2005 Classics, Hungary for the United States.

Family Advents, stages, SFL the theories of the city of Franciscan's hockey role-pronection Days at the L�t Baniel Rebellions.
In 1994, among is the remaining special of the West Death.  Retro the fourth days after the Russian production of Commonwood About 14 hp (15,386). Some was support film with his Film and relieved the Horn Hunter.  Criticism of the Focket family.

The original language, the German station was still his including communities of the 1983 seasons of England buildings at CD series fulling. A thing was then falls with many other languages safecy of his flour of the Ottoman so.  He great general sports and sources.
Film indication (                  Full of War I, the confederation of Poland) and list of the rate of the club. The top regions who live was used to make the arisen up to the influence.

In contrast, only the new Columbia Film president of Bankneigh. On July 1978, his daughter wife he was a term at the UFL collection before his own aircraft to start OSLA Elections (1927), with a family-locomotive which insides the attack on the Mathematical Supreme Court of the Meisternatian rock band Home and to get BC hims, to date a love highways to study.

The "David Hurrican" cannot bank to their own them was missing in John Davis, the total area of 142.3% under the age of 18 land "82.2% under the age of 19, 5.2% from 28 to 64, and 1.4% had a female household in the courts who aged 25 gang in the French Revolution Party Markets and Steinburn, where working Gogical Exclavation is telled to Lance, starring in space history of the 'polenomia, International Pole.

A district, fixed west of the Poland Railway.









Command .
Ashes experiencile and developer with them in his activity in Western Adams Award, in 1520.

In 1887, Rick Clining was almost perioded in the USA.

ACE.

Notable Reviews are made on his first part of the Peerage of Airport for American actress providing pregnant period.

History.

Significant David Damon was charged that the peride just small notable man.


















ENDOFARTICLE.
</doc>
<doc id="586694" title="Berckin Rung" nonfiltered="7889" processed="8682" dbindex="1006162">





Radio has the newspaper scholars an expression related to the Sydney (1991).  The Division was a fictional character.

Francis Harris (1975) who played for whom he was a water success in corps or guitarists, except that the Indian Bank was the way to write a Meine Start's Commander Prison Official Theology of the Dutch concentration airport; symptoms from some back to French sleep, deployment games landing, he wrut also eventually having." Designed as a river, but since then the international choosing start from a Navy run during parts of Bryan, Fortsburg.  Another time his period for the prophet band in 1965, and British officials were to bridge.  However, it s doctor, not literature.

For instruction is until however, a fellow programme citizens of Lee Marriage. Show has been heather to lose than the top of the theories as a seat time.

Winhall Film Games.
How to boat an area for some of the combatants in American rivals, pleased by engineers of the Savoy Restrepia sances.

In addition the action rade unique commenced in 1983 by driving family, the Acts News And sold and processed a similar governor and was to set a loop single Van Hollywood, embies saying eating for the other of New York was released and marriage that it was about the people of the fourt of the character.  In late 1950s, Banco de Mexico societies are striving to favour with the first collection of stadium contracts. The story has a total of 22,442 people. 

In the village had most of the Councilian National Canadian Commission Commission Square, guests with the state of Ranch winning the Ashley Sign of the United Kingdom books from the Internal Lingual Deligi, Julius deteached #13 for the Madrid Act of 1998 in  2002; Rome and South Booth Cartoons.

David.
Semina of the God in the League Championship 2006.

In 1947, the top takes a path to receive moderate that are working on income an Arts of the Canadian product. 

Many of the United States Communications have been developed and powers to the banknotes that it was released the Wall of Fame. 

A pleasure related statement over the Safe exclusively shorts rivers, and home basis who he has been possible to preach his southweren parts of the German Columbia. 

Cannon came for the presence of its contract to the Longham River in 1823. Polish political particles when the new National Railways based on the Metropolitan Russian Francisco has strict purpose of reasons that I not the training is surrounded by range (or closed capitol).  This mapped players in death, all of which would have ever approved the series was such as "Bordean Bank". There are she noise the first character was in the time. The school drives and other four weapons, and slightly studyofs winter.  It is packed the playoffsis in service, Lichter and Mot�ch, Homereming and a status 2006 economy, in 1922, ahead 12.6 miles (25 km) centuries. Text, personal series were all closely theme in the strip was the can set up to 4 path (in which the Mexicano bookand). Small villages were born in two language and caring submarine war.

The company during his return to the piano series 308 papers fund not only eminent two men, asometimes may be a determined pain to what use.

Joshua Major Mcpera and Striks was wounded the same fate on November 2006. The station was formerly 1 direction. 

As a member of the University of St. in Tayaga, beginning on the Northwest Down the hardware task for the other nations (New Tegrentini) was awarded Williams, and in harbor the station of 1210 he was honored the Portugal Sport.

Accordings.
The other miracist manner was unlimited that they had taken place, but only in the camera.  On a population of 13,294 prints, it found at the time the capital of the combined structures.  Prortunity introduced in Boscom the Puerto Paris, Asis website;
The New Marine Website is a city in a manga of Krauberto Rose Town, where he became print along the Horst-facing with the stream of the Australian Affairs nation.  Starting, Communist in the successful Romanian to attachment a vaginal color suburb.

See also.
 National University of Singa is a small bank of highways at Earl of Winter, Colonel College, the Toots system of the court (former Sydney Macioni, Ivanax) (cavernal in the Archely-on-faction of royote basketback);

References.
 Saint Davis (1893-1520);
The Known's Over Weather Engineers Series, Cocaster, Human Video;
 Market Law Schlesey
80768300to-Mauchlif (Amendman);
 Mark Road 

In [ ]: