OVERVIEW

Deep LSTM language model trained on a large dataset of tweets by presenting the next character at each (time)step of a string.

The input is a sequence of class labels representing the letters (cf. twitter preprocessing ipynb) which gets transformed into a dense embedding by the first layer. Variable length sequences are masked by the zero index. The embeddings are then fed into five stacked LSTM layers. No regularization like dropout or batch normalization is applied as our dataset contains over 420 million sample tweets. Overfitting is not a potential problem.

The final layer consists of one neuron per symbol of the alphabet followed by a softmax activation. This provides a posterior distribution over character probabilities (for a character at position i+1) given the string (up to position i) for all positions in the string. We can directly sample from this distribution and thus generate novel tweets.

For future experiments it could generally be interesting to inspect closer whether the hidden distributions generated by comparable symbol level networks can be used in related tasks - either as pure embeddings or in joint neural architectures. Also the embedding is chosen rather arbitrarily; various sizes should be evaluated in comparison to just one hot vectors, although I would not expect to find a large impact either way there as it is already chosen on the rather large side given the alphabet size.


In [38]:
import linereader
import numpy as np
import random
import string

In [39]:
datafile = 'dataset-twitter/cleaned_tweets.txt' # File with 1 "document", i.e. tweet, per line.
n_lines = 420012443 # Set to -1 to recount.

class DataGenerator(object):
    def __init__(self, f, test_size=512, cache_size=100000, n_lines=-1):
        self.f = linereader.dopen(f)
        if n_lines < 1:
            self.n_lines = 0
            for _ in self.f:
                self.n_lines += 1
        else:
            self.n_lines = n_lines
        assert self.n_lines > cache_size
        self.cache = []
        self.cache_size = cache_size
        self.test_size = test_size
        self.test_set = self.f.getlines(1, test_size)
        self.test_set = map(lambda s: '\t' + s, self.test_set) # Prefix start symbol.
        self.__cache()
    def __del__(self):
        self.f.close()
    def __iter__(self):
        return self
    def __cache(self):
        linenum = random.randint(self.test_size + 1, self.n_lines - self.cache_size)
        lines = self.f.getlines(linenum, linenum + self.cache_size)
        self.cache.extend(lines)
        random.shuffle(self.cache)
    def next(self):
        if len(self.cache) == 0:
            self.__cache()
        return '\t' + self.cache.pop()
 
gen = DataGenerator(datafile, n_lines=n_lines)

In [41]:
tweet = gen.next()
print tweet
print len(gen.cache)


	black pretty blonde 25 years old looking for hot sex this evening http://tinyurl.com/oyvwb5

100000

In [42]:
alphabet = set(string.printable) - set(string.ascii_uppercase) - set(string.whitespace) - set(['`'])
alphabet = list(alphabet) + [' ', '\n', '\t'] # EOL and SOL
alphabet.sort()
alphabet = ['ZEROVECTOR'] + alphabet # The zero vector is used for masking sequences.
print alphabet, len(alphabet)

index_to_char = alphabet
char_to_index = {}
for i in xrange(len(alphabet)):
    char_to_index[alphabet[i]] = i
print char_to_index


['ZEROVECTOR', '\t', '\n', ' ', '!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '=', '>', '?', '@', '[', '\\', ']', '^', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '{', '|', '}', '~'] 71
{'\t': 1, '\n': 2, '!': 4, ' ': 3, '#': 6, '"': 5, '%': 8, '$': 7, "'": 10, '&': 9, ')': 12, '(': 11, '+': 14, '*': 13, '-': 16, ',': 15, '/': 18, '.': 17, '1': 20, '0': 19, '3': 22, '2': 21, '5': 24, '4': 23, '7': 26, '6': 25, '9': 28, '8': 27, ';': 30, ':': 29, '=': 32, '<': 31, '?': 34, '>': 33, '@': 35, 'x': 64, '[': 36, ']': 38, '\\': 37, '_': 40, '^': 39, 'a': 41, 'c': 43, 'b': 42, 'e': 45, 'd': 44, 'g': 47, 'f': 46, 'i': 49, 'h': 48, 'k': 51, 'j': 50, 'm': 53, 'l': 52, 'o': 55, 'n': 54, 'q': 57, 'p': 56, 's': 59, 'r': 58, 'u': 61, 't': 60, 'w': 63, 'v': 62, 'y': 65, 'ZEROVECTOR': 0, '{': 67, 'z': 66, '}': 69, '|': 68, '~': 70}

In [43]:
# string -> one hot vectors
def encode_vectors(tweet):
    arr = np.zeros((len(tweet), len(alphabet)), dtype='uint8')
    for i in xrange(len(tweet)):
        arr[i, char_to_index[tweet[i]]] = 1
    return arr

# string -> vocabulary (=alphabet) indices
def encode_indices(tweet):
    arr = np.zeros(len(tweet), dtype='uint8')
    for i in xrange(len(tweet)):
        arr[i] = char_to_index[tweet[i]]
    return arr

# index -> char
def _decode(idx):
    if idx != 0:
        return index_to_char[idx]
    else:
        return ''

# indices -> string
def decode_indices(arr):
    return ''.join(map(_decode, arr))

# one hot vectors -> string
def decode_vectors(arr):
    string = []
    for i in xrange(arr.shape[0]):
        c = arr[i]
        idx = np.argmax(arr[i])
        if idx == 0:
            continue
        else:
            c = index_to_char[idx]
        if c == '\n':
            # We only use this function to visualize predictions during training. 
            # So we put a special symbol here to keep it optically aligned with the target.
            c = 'Ç' 
        string.append(c)    
    return ''.join(string)    

tweet = gen.next()
print tweet, len(tweet)
arr = encode_indices(tweet)
tweet = decode_indices(arr)
print tweet, len(tweet)
arr = encode_vectors(tweet)
tweet = decode_vectors(arr)
print tweet, len(tweet)


	live feed - news of the absurd crew filming their 100th episode...they're on a boat! http://www.stickam.com/newsoftheabsurd
125
	live feed - news of the absurd crew filming their 100th episode...they're on a boat! http://www.stickam.com/newsoftheabsurd
125
	live feed - news of the absurd crew filming their 100th episode...they're on a boat! http://www.stickam.com/newsoftheabsurdÇ 126

In [6]:
import keras
from keras.layers import Dense, Activation, LSTM, Embedding
from keras.layers.wrappers import TimeDistributed
from keras.models import Sequential, load_model
from keras.optimizers import RMSprop
import bisect


Using Theano backend.
Using gpu device 0: GeForce GTX 950 (CNMeM is enabled with initial size: 1450 MB, cuDNN 5105)
/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
  warnings.warn(warn)

In [7]:
load = False
# It would probably not hurt to go even wider and maybe deeper, but training is already taking quite a bit and 
# we are not trying to beat state-of-the-art results on prominent datasets here.
n_lstm_cells = 512
# Picked pretty randomly. Should be investigated closer, although I guess it's rather on the (too?) large side.
embedding_size = 32 
max_tweet_len = 162 # longest tweet is 161 + 1 for start symbol

if load:
    model = load_model("filename")
else:
    model = Sequential()
    model.add(Embedding(len(alphabet), embedding_size, mask_zero=True))
    # TODO: parameterize this
    model.add(LSTM(n_lstm_cells, return_sequences=True))
    model.add(LSTM(n_lstm_cells, return_sequences=True))
    model.add(LSTM(n_lstm_cells, return_sequences=True))
    model.add(LSTM(n_lstm_cells, return_sequences=True))
    model.add(LSTM(n_lstm_cells, return_sequences=True))
    # TimeDistributed as we want to predict the next char at each step.
    model.add(TimeDistributed(Dense(len(alphabet)))) 
    model.add(Activation('softmax'))
    opt = RMSprop()
    model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])


/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py:368: UserWarning: The `regularizers` property of layers/models is deprecated. Regularization losses are now managed via the `losses` layer/model property.
  warnings.warn('The `regularizers` property of '

In [44]:
batch_size = 32

def make_batch(gen=gen, batch_size=batch_size, sample_len=max_tweet_len):
    xs = np.zeros((batch_size, sample_len), dtype='uint8')
    ys = np.zeros((batch_size, sample_len, len(alphabet)), dtype='uint8')
    for i in xrange(batch_size):
        tweet = gen.next()
        length = len(tweet) - 1
        xs[i,:length] = encode_indices(tweet[:-1])
        ys[i,:length] = encode_vectors(tweet[1:])
    return xs,ys

def make_testset(s=gen.test_set, sample_len=max_tweet_len):
    ns = len(s)
    xs = np.zeros((ns, sample_len), dtype='uint8')
    ys = np.zeros((ns, sample_len, len(alphabet)), dtype='uint8')
    for i in xrange(ns):
        tweet = s[i]
        # TODO: refactor common code
        length = len(tweet) - 1
        xs[i,:length] = encode_indices(tweet[:-1])
        ys[i,:length] = encode_vectors(tweet[1:])
        ###
    return xs,ys

bx,by = make_batch(batch_size=1)
print bx
bx = bx[0]
by = by[0]
dx = decode_indices(bx)
dy = decode_vectors(by)
print len(dx), len(dy)
print dx
print dy

test_set = make_testset()
print test_set[0].shape
print test_set[1].shape
print decode_indices(test_set[0][0])
print decode_vectors(test_set[1][0])


[[ 1  6 63 45 42 21 17 19  3  6 63 45 42 22 17 19  3 41 58 43 48 55 59  3
  28 56 43 60 41 42 52 45 60  3 63 49 54 44 55 63 59  3 26  3 61 53 56 43
   3 52 41 61 54 43 48 45 59 29  3 60 48 61 58 59 44 41 65 15  3 50 61 54
  45  3 20 20 15  3 21 19 19 28  3 48 60 60 56 29 18 18 60 49 54 65 61 58
  52 17 43 55 53 18 53 24 61 48 66 25  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]]
108 109
	#web2.0 #web3.0 archos 9pctablet windows 7 umpc launches: thursday, june 11, 2009 http://tinyurl.com/m5uhz6
#web2.0 #web3.0 archos 9pctablet windows 7 umpc launches: thursday, june 11, 2009 http://tinyurl.com/m5uhz6Ç
(512, 162)
(512, 162, 71)
	@pensblog true! this series is not over we need a strong 3rd period get some goals & head back 2 da burg with confidence!
@pensblog true! this series is not over we need a strong 3rd period get some goals & head back 2 da burg with confidence!Ç

In [25]:
output_every_n = 500 # Give status update and eval on test data every n batches.
n_batches = 20000 
n_examples = 4

# Note that I do not keep track of total seen samples in the code.
# The model producing the following output actually trained on 30k batches of 32 samples each, i.e. 960k samples.

mu_acc_train = 0.0

for i in xrange(n_batches):
    xs,ys = make_batch()
    _,acc_train = model.train_on_batch(xs, ys)
    mu_acc_train += acc_train
    if (i+1) % output_every_n == 0:
        print "Batch %d of %d" % (i+1, n_batches)
        _,acc_test = model.evaluate(test_set[0], test_set[1], batch_size=batch_size)
        # Take some random examples from the test set and show the net output.
        # At each prediction step the net sees the string up to then.
        # As expected one can soon see that guessing a follow-up word is much trickier than word remainders.
        print "Example outputs:"
        examples = np.zeros((n_examples, max_tweet_len), dtype='uint8')
        for i in xrange(n_examples):
            example = random.randint(0, test_set[0].shape[0] - 1)
            examples[i] = test_set[0][example]
        preds = model.predict(examples, batch_size=n_examples)
        for i in xrange(n_examples):
            print 'Target:     ' + decode_indices(examples[i])[1:]
            print 'Prediction: ' + decode_vectors(preds[i])
        print 'Train accuracy: %.3f' % (mu_acc_train / (1. * output_every_n))
        print 'Test  accuracy: %.3f' % (acc_test)
        print
        mu_acc_train = 0.0


Batch 500 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     i swear this cat is going though the "terrible twos" or something. she definitely has a vendetta against charmin.
Prediction: @ jeear thes sor tn aoing torugh.ihe bmheminle soi t in some hing .hoe iocinitely sav t sertor e snain.t toinling.................................................
Target:     @onlymikomi for who?
Prediction: @jneinone   ior tea                                                                                                                                               
Target:     joshua redman - boogielastic - 04:49 pm visit www.radiotagr.com/wumr to tag this song
Prediction: @uieua aes on s hlskle    ic c h0010 am hidit hww.rodionare.com/ etbo#o bhk toes sung                                                                             
Target:     @paulafanx13 ohhhh rite. klkl. wonder what it is. if u find out please let me know. :) xxx
Prediction: @jarlihint 1 ih hh iegt .iio  .ihwteriihat i  ws .i  y gend iut traase rot me know  i)

Train accuracy: 0.288
Test  accuracy: 0.254

Batch 1000 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     @letmesowlove i listened to that on the way to school!!
Prediction: @miehansn eneri wokten d to thet tnethe soy to sehool  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Target:     has a new found respect for cs ppl, how does one balance engagement with your users while not creating a new channel to monitor?
Prediction: @tv a sew bolnd oeatoct tor aoaarl  ttu to s tu  oadlnce hndige ent hith tour csers hiile iot aoeating t sew soancel ho mavtcor                                   
Target:     has 4 mins to downing tools & starting long weekend @ centre parks, yay! :-d
Prediction: @tv a monu to to nlng th  s a toart ng tiog tiek nd h holtealcark   to   h))

Target:     @champjones nothin
Prediction: @mhrrbeanes ho  ingggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggg
Train accuracy: 0.287
Test  accuracy: 0.254

Batch 1500 of 20000
512/512 [==============================] - 8s     
Example outputs:
Target:     i had a bunch of tiny succulents, now i have a fun way to use them: http://bit.ly/hjydv
Prediction: @ jav a bisch of thmy torkestnt   bot i have a srn tis to gse the   http://bit.ly/1blis

Target:     politico: new polls suggest deeds surge in va.: a surveyusa poll shows deeds leading the democratic pri.. http://tinyurl.com/ljjp53
Prediction: @hricicsl hew bosi  aepgest hesps tervercn tan  h bebpiy re hrsi heop  hoap  tiad ng ahe waaocratic haoc. http://tinyurl.com/mlkkky

Target:     @sebduggan ooh ooh, i haven't watched it yet...don't give it away :)
Prediction: @maaaieierdhho iho  i wave 't banched tt aot  . on't wove mt a ay a)

Target:     listini to miley cyrus the climb love that song and love her
Prediction: @osteng to tasey cyrus ahe sooebitaoe thet ioug wnd tooe ter                                                                                                      
Train accuracy: 0.288
Test  accuracy: 0.256

Batch 2000 of 20000
512/512 [==============================] - 8s     
Example outputs:
Target:     @clareberry "ninja warrior" the japanese show on g4 is even better. it's the original that "whipe out" ripped off.
Prediction: @shaiksoaty ihogka datnior  ioe sucon se atow if tr hs tver teiter  h  s ahe sniginal aoat isietp tft" heg er an  

Target:     durmindo 9 pra acordar agora o.o
Prediction: @omiing  t meo p hrdar d ora e                                                                                                                                    
Target:     rt @apeshit: @deathwishinc is having a killer sale... cds from $1 and 50% off vinyl. some converge cds too! http://www.deathwishinc.com/ ...
Prediction: @t @sllnhor: rsannheesh ng is teving a sidler thyes . hhs aoom t10ttd t   off tidt   ho e pomtertesoo  ao   http://bww.taatheesheng.com

Target:     rt @land_line_now: retail sales are up...home foreclosures and unemployment filings are down. more tonight on land line now.
Prediction: @t @saucomavg fiw  @tawil stles ane tpd . twe for vaosures tnd tnimployment toneng  tne toin  hare th ight hn tisd aave.how                                       
Train accuracy: 0.291
Test  accuracy: 0.257

Batch 2500 of 20000
512/512 [==============================] - 8s     
Example outputs:
Target:     second day of the academy, great session by richard baraniuk, founder of connexions #moa09
Prediction: @onrnd tet af the sltdemi  toeat ttrsion ou tedhard clnrc  n  arrnd r af tomtectcn  htovn9                                                                        
Target:     @kimcchung this weather makes me wanna nyada mop a person with a mole on they face!
Prediction: @markohindeihas ieether iones me lanta boc   tovaonbirson tith t bovlyof the  hice 

Target:     hey is mattie out there?
Prediction: @ty @s tykten tnt ohe e                                                                                                                                           
Target:     laura's painting was returned!! also, boomerangs jp will open at 11 am this monday (6/15) instead of 10 am. thanks!
Prediction: @osre s crrnting tit aeaurned  !hnso  iutk ns de auetitl bpen tn h0:mt ooes sortay.ah 21) h  pead of t0 mt  hhetks 

Train accuracy: 0.293
Test  accuracy: 0.258

Batch 3000 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     @lessallan i just played "give it away" in response to your tweet
Prediction: @mieliiiesdi hust wuaned tmrre tt a ay  in teapondi to tou  cwietsssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
Target:     @xcarliex i'm only hiding it because they paid for my phone bill the other day :-/ don't think they'd be impressed if they knew about it lol
Prediction: @mahreeem i m sn y aating an aetause ihay hlyn tor te srone iutl ihe sther day t))

Target:     what a night, did not get @ sleep till 5am & had no sweet dreams, i wonder what my sub-concious is trying to tell me, hmmmmm
Prediction: @hat irgecht  tod yot got ttmoeep ahml t m a ttr ao soeet aaisms  i wander ihat iy sopm hmfesus as thuing to bhll me  iommm                                       
Target:     to nervosa x_x don't want too. kdding, i really want to.
Prediction: @hdtewvou  ad  han't want to   hik cg  i weally dant to                                                                                                           
Train accuracy: 0.294
Test  accuracy: 0.260

Batch 3500 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     @i_am_music_ que esta pasando lo siento
Prediction: @mnmm_aacic liuesmstabdert do ea qen te                                                                                                                           
Target:     @intentionalfoul lol thx for the rt :)
Prediction: @mnsertesnaleerrdiol iha wor the be!@)                                                                                                                            
Target:     joshua redman - boogielastic - 04:49 pm visit www.radiotagr.com/wumr to tag this song
Prediction: @uih a tev on c hlskae en ea c h0 30 pm aiait hww.radiotarr.com/ arboho thk thes mong                                                                             
Target:     @i_am_music_ que esta pasando lo siento
Prediction: @mnmm_aacic liuesmstabdert do ea qen te                                                                                                                           
Train accuracy: 0.294
Test  accuracy: 0.259

Batch 4000 of 20000
512/512 [==============================] - 9s     
Example outputs:
Target:     bus is pulling out now. we gotta be in la by 8 to check into the paragon.
Prediction: @esids artling aut tow .hh wot a be an ta ta t mo toeck otto the brrtden                                                                                          
Target:     @jessicaadrew yes!!!! give more grass!
Prediction: @sonsicasseewshoa  ! !iove mene treds                                                                                                                             
Target:     ponder this idea.. its not mine but it blew me away. http://tinyurl.com/oq5ob2
Prediction: @rlter shes sseas .h ' aot aene tut it weaw ae tnay  http://bwnyurl.com/mxvdw7

Target:     i just deposited money into my swiss bank account. #spymaster http://bit.ly/playspy
Prediction: @ just roaosited aoney into ay siins bank ancount. #spymaster http://bit.ly/playspy

Train accuracy: 0.293
Test  accuracy: 0.260

Batch 4500 of 20000
512/512 [==============================] - 8s     
Example outputs:
Target:     @shootxo hey brother i need to give you a ring
Prediction: @saawteashey saather isweed to beve you t legg                                                                                                                    
Target:     @beautyfulsoul doubt it...pharrell gave her big break, she was in two of his music video as the lead...you aint gotta be in love for sex!
Prediction: @srnutylullont ionbl t   . oetlill iome mer aeg braak  ioe was tn thi mn tes sosic fideo.on ahe siade

Target:     http://twitpic.com/74jkx - i almost nutted on myself looking at these, robroy jumpoffs. wont find these at your mall lol
Prediction: @ttp://twitpic.com/74pyg - t wmsost demeed tn ty plf tolking ft the e  ieaeas auspssfe 

Target:     wafting in the air--the draw of caffeine--calling out from your vein. percolator--is full of promise--my thirsting lips--open with desire
Prediction: @otf ng tn the sir p-he-caemiaf toreeine  -orteng tnt toom tour sern  hlrfenater  -nttonl of teojose a-a swindtycg aiks  -fen tith tamige                         
Train accuracy: 0.292
Test  accuracy: 0.262

Batch 5000 of 20000
512/512 [==============================] - 8s     
Example outputs:
Target:     avocado score! 1.25 from the fruit guy!
Prediction: @nhiado aehre  h0   hoem the siest oai                                                                                                                            
Target:     rt @jessielouise: vote for @iamlittleboots holla!! http://bit.ly/tnmjh
Prediction: @t @jossicnavg:e: rite for ttnmcistee ook  htwoan !http://bit.ly/1yhsh                                                                                            
Target:     help me get to 10,000 downloads. http://www.zshare.net/audio/586534638f14e250/
Prediction: @typ te aot th s00 00 palnload   http://bww.tshare.net/audio/61991779997a3fa8/

Target:     @lessallan i just played "give it away" in response to your tweet
Prediction: @jaeeiuie di hust goayid tmore at a ay  i  teatonse to tour pwietsssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
Train accuracy: 0.295
Test  accuracy: 0.261

Batch 5500 of 20000
512/512 [==============================] - 9s     
Example outputs:
Target:     senior/principal geotechnical engineer - sussex http://twurl.nl/y1wfgh
Prediction: @oeaor caontipll conrsnh ochl sxtineer h haphed http://tiurl.nl/gh0uqf

Target:     is random , but tired...
Prediction: @  teidom t iut ihmed  .                                                                                                                                          
Target:     @jessicaadrew yes!!!! give more grass!
Prediction: @sossicalaae  hoa  !!!iove yere toeds !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Target:     i need ctrl-f on paper articles.
Prediction: @ jeed aor   rin trrer sntisle                                                                                                                                    
Train accuracy: 0.298
Test  accuracy: 0.261

Batch 6000 of 20000
512/512 [==============================] - 8s     
Example outputs:
Target:     #stackb"50s of piffer the hood smell like harlem"(driveway)
Prediction: @jqark u    tf tect r soe sold ihollsoike ttrdey  aoineray                                                                                                        
Target:     10 things you can do with wordpress besides blogging http://cli.gs/xym1x8
Prediction: @0 mrings tou can go tith thrkpress bustdes taog ing ottp://bli.gs/zennmu

Target:     @tbusbey thanks! i will have to check that out.
Prediction: @mholiyl ihank   i wall bave th soeck ohet out                                                                                                                    
Target:     is it unethical to publish that "neural network" analysis concluded x when all you did was look at the data? my brain is a neural network...
Prediction: @  wn aplmtecal sh srtlichethet ssowton sewwork  hrdlysts oomteudes t hain irl toursod ths tiokiat the saya  ha floin is a sewtol mewwork .                       
Train accuracy: 0.300
Test  accuracy: 0.262

Batch 6500 of 20000
512/512 [==============================] - 8s     
Example outputs:
Target:     @edyoung that is well put! 'beautiful collision' - love it!
Prediction: @mminnng ihat ws thll tlt  ise utyful somlenion so hioe tt!

Target:     @jbmendoza hey dont give me any ideas on the juicer sales. but i think it would be fun, squeezing the last dollar out of eveyone. paul sends
Prediction: @moaarte   iay to e teve ye a   odeas fn the bonce  stne  .iut i whink i 'wauld be aun  iouaezeng the cist taclar tft of tver ne  ilrl mtrs                       
Target:     now chilling in a random bar in t.wells. only on a coke tho as driving in a bit... is it bad that i feel at home here already?? haha!
Prediction: @ew phallin  an t ceidom tinrfn th . l   hnly tnet bome hoauin ioineng tn t cit  . h  tt tec aoet i weel lb tome.ttre inleady 

Target:     vision starting to shakin'
Prediction: @iditn ttartsng to teareng                                                                                                                                        
Train accuracy: 0.298
Test  accuracy: 0.261

Batch 7000 of 20000
512/512 [==============================] - 8s     
Example outputs:
Target:     @jessicaadrew yes!!!! give more grass!
Prediction: @sassicasgai shoa, ! !iore mere toeds                                                                                                                             
Target:     http://twitpic.com/74fef - @valkeff nan c sur lol, ds le genre impossibl de choisir donc jpren tout mdrrr
Prediction: @ttp://twitpic.com/74lii - imineeri hohgaoiap ee   io so cutte dn orsiolehe loaccen de dehu  s ao r he  rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Target:     asda story video game launches the new faction warfare player vs ... http://cli.gs/jxby9j
Prediction: @n   teare oideo -ames-annch s the sew sove on hisnare hranershi.s.. http://bli.gs/gqdnne

Target:     say it right | fashion translations - new york times http://viigo.im/rzk
Prediction: @oysi  ieght n trition soaisfation  a how york times http://biigo.im/rgh

Train accuracy: 0.297
Test  accuracy: 0.262

Batch 7500 of 20000
512/512 [==============================] - 8s     
Example outputs:
Target:     @dougbenson doug benson - well you're wrong i'm totally high right now #thingsfamouspeoplesayaftersex
Prediction: @sangier tn io bhteae n i to l tou re neong wnm sroally aalh seght now atwengstamous erple yys terscx

Target:     @laisalves http://twitpic.com/74jfi - hahaha... they're beautiful, lah!!
Prediction: @sinrenea  http://twitpic.com/74otj - iahahah.. iha  re setutiful  aos 

Target:     rt @williamlyon_nca: for sale: 3br/2.5ba condo in elk grove, ca, $199,990 - http://postlets.com/res/2313237
Prediction: @t @minliemsenn ceaa tor tale  t0r 1   i conti sn tne haoue  ia  u209 000 h http://bistlets.com/res/201000

Target:     @youngq aaack! alright maybe i can catch it later 2nite at home - if u do it again (pretty please)....
Prediction: @soungq hwaah  inleght aonbe i wan solch tt aoser t dte tn ttme t h  y wontt alain iilosty slaase 

Train accuracy: 0.301
Test  accuracy: 0.261

Batch 8000 of 20000
512/512 [==============================] - 8s     
Example outputs:
Target:     is it unethical to publish that "neural network" analysis concluded x when all you did was look at the data? my brain is a neural network...
Prediction: @  tt aptmtecal to trtlich thet ttowto iaewworks irdlysts aomteuses t tiin trl tourcod tht aickiat the saye  ha booin is a bewtol aewwork 

Target:     @msali_sobb how work was today? miss me?
Prediction: @jisnlnbal yhtw aeuk iis ahoay? hyns ye 

Target:     competitions movie http://shoturl.us/5817/
Prediction: @hneutition  aavie http://thoturl.us/5815/

Target:     rt @buzzfeed: megan fox has toe thumbs. http://bit.ly/i6z1r - i have been complaining about this since first seeing her! vindication.
Prediction: @t @jrtzered: ryean fox nos ah nfherbs  http://tit.ly/1vrss

Train accuracy: 0.299
Test  accuracy: 0.261

Batch 8500 of 20000
512/512 [==============================] - 8s     
Example outputs:
Target:     8th graders in graduation suits and dresses. precious overload!!!
Prediction: @ h soede s an teeneation atpt  fnd soias d  hlotious afer ord  !                                                                                                 
Target:     rt @indyweek: today's picks: [...] ipas doc "not yet rain" [...] -- yah! come out to the *free* screening in durham (ipas.org/ for info)
Prediction: @t @tnsinaek: rh ay s crct   t  .  h hn ae uatot tot aeini a  ." h  hoho hhme tut oo the sbaee  hhreensng an tariam ahnont.rg)shor atto.

Target:     @kait_o and i meant u worship shanx; sum1 who seeks style counsel for a tie and handkerchief by his own volition. hardly a "rough" character
Prediction: @matlea_ind i wean  t saukeip tookk  iocm ihe waems toale tomltelioor t fem tnd tevgsal oics ae tis swn tittgion  itvd y ansmewght aoengcterssssssssssssssssssssss
Target:     the #bible is full of god's promises.not one is dependent upon our performance. jerry bridges, holiness day by day http://tinyurl.com/d5kwsk
Prediction: @he stotle is ainl of too s arodose   o  tn  on aoarndidt tp n tnt crrsormance  hussy toinge   aawidg   ioysau tan attp://binyurl.com/yhjcsyeeeeeeeeeeeeeeeeeeeeee
Train accuracy: 0.320
Test  accuracy: 0.259

Batch 9000 of 20000
512/512 [==============================] - 9s     
Example outputs:
Target:     @ramit last update was in 2006. nothing new to rant about ? btw, agree with ur comment about rappers and waiters
Prediction: @mocenaiost nn ate iis tn t 09  io  ing ioe mh teid anout t

Target:     baby advertising. http://twitpic.com/74jne
Prediction: @ecy indirtiseng  ittp://bwitpic.com/p9gae

Target:     i have broken 1,000 twitter updates. paaaartaaaay!
Prediction: @ wave teeke  t0000 foetter asdates  ilraaa  saaa                                                                                                                 
Target:     to all rappers: no more dances plz!!!
Prediction: @h t l tedper   to more taycer tra                                                                                                                                
Train accuracy: 0.330
Test  accuracy: 0.259

Batch 9500 of 20000
512/512 [==============================] - 9s     
Example outputs:
Target:     @aolradio is this even legal?! can it be burned?
Prediction: @jntaedio i  thas wveritiaal   ion'y  we aesnid                                                                                                                   
Target:     june 14-19 you can attend vbs at either mt. hermon baptist 6:30 - 8:30 or berlin baptist 6-9.
Prediction: @use 11 10 -eu can sntend ti  an tnnher aav htreanetrn ast h  0 p h 00 pn 8e lin hor ost h06                                                                      
Target:     watching alladin wit the nephew
Prediction: @htch ng t l  ongtithmhe bewhew                                                                                                                                   
Target:     to do this friday: retro game night at orange county regional history center.
Prediction: @hdto thes wrieay  heawo hame ieght at tuange aonnty hesisnal ptgtory honter                                                                                      
Train accuracy: 0.331
Test  accuracy: 0.260

Batch 10000 of 20000
512/512 [==============================] - 8s     
Example outputs:
Target:     i had a bunch of tiny succulents, now i have a fun way to use them: http://bit.ly/hjydv
Prediction: @ hat t basch of thmy ahncesonts  iot i have a ben aiy to gse the   http://bit.ly/1emsn

Target:     yes they make those lol rt @ramabama: @lovejones83 leather shorts wtf. didnt know they made those. that dont even sound cool.
Prediction: @os ihe  aake mhese wio it @secinana: isivelanas 3 iotvher toopt  aif  iod ' wnow wha  hake mhese  ihat ioe' bven ke nd lool 

Target:     i had a bunch of tiny succulents, now i have a fun way to use them: http://bit.ly/hjydv
Prediction: @ hat t basch of thmy ahncesonts  iot i have a ben aiy to gse the   http://bit.ly/1emsn

Target:     aaahh!!! 2 1/2 hour left on my double (trouble) shift!! let's work this!!
Prediction: @naa h ! r m 2 aoorstift af ty bagbte sahauble  hoott  !hot s gark ooas !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Train accuracy: 0.335
Test  accuracy: 0.260

Batch 10500 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     @83inches @potatosays this morning i was fondly remembering representing spot @ the rochester pride festival! good times!!!
Prediction: @s03tbhestimapttonh   ihas iorning iswas tiltie aeaimber ng tealesentang toet t the seakelter arone aost val  ho d lhme  

Target:     @descargaoficial o mionzinho eh a britney!! uhuuuu!!!
Prediction: @saniandernfcial ihsas  adga is snpiodney  !:  hhuu!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Target:     heehee ur funny
Prediction: @tylee i  drn y                                                                                                                                                   
Target:     all tyranny needs to gain a foothold is for people of good conscience to remain silent. thomas jefferson
Prediction: @nl thpa sy aewds to govn tnsertb lo tn tor teople tf tood tomtiionce to seaern tonvnt  theups iusf rson                                                          
Train accuracy: 0.334
Test  accuracy: 0.258

Batch 11000 of 20000
512/512 [==============================] - 8s     
Example outputs:
Target:     @diamondxgirl omg i know the water was awful. and then the tech was like go void it all i can't see anything but water.
Prediction: @sasnonda irl ihg i wnow iha soyer ias a eul  ind ihey iheysrahnias aike iootetc an all dnhan t wte t ything tet ihscr 

Target:     rt igoddess_day26 love all but willie &rob first! they are mad funny, keeps me cracking up(my favs. .willie, rob, que, mike & brian)
Prediction: @t @fnt yss :ov 9 iole tnl tut ihll a m eleaonst  hhe  are aadeaon y  biep  te toazk ng tp ty bavo  i.hnl a  iebe auis iane s teinn 

Target:     @wpstudios i had it working for the past 2 months , and then the error appeared yesterday, sent a ticket to realmac to get a fix.
Prediction: @simtadios i wav at tirk ng oor the wart t hinths a and ihey iheysnior inplarsd totterday  boed y lrmket to teadlaneth tet a brg 

Target:     am i a bad person if i unfriend in facebook anyone i don't recognize and who isn't attractive?
Prediction: @nai t bud aarson t  y wsdoeend wt trctbook fndmne?wnwon't keaornize tnd iee is 't i  eacteve 

Train accuracy: 0.336
Test  accuracy: 0.260

Batch 11500 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     rt a new mental agility quiz could help detect alzheimer's disease more accurately than the traditional test, ca.. http://tinyurl.com/nc7o7p
Prediction: @t @nfew bomtil tnelety tuec aomnd balp mosactit l eam r s cescase have hncosately ahetkthe bruietional hoat  aol. http://binyurl.com/nhyyyy

Target:     visit paris and its people here http://tiny.cc/r1rpc!
Prediction: @iait marts hnd s   srrple aale http://binyucc/jqdhq

Target:     politico: new polls suggest deeds surge in va.: a surveyusa poll shows deeds leading the democratic pri.. http://tinyurl.com/ljjp53
Prediction: @rsicica  aew yosi  atpaestiiesps atpverhn tal. h feppiy l  hariuhtow  hesp  aiadeng the beaocratic croc. http://binyurl.com/nxuyyy

Target:     #throwbackthursday ok so who remembers zoobilee zoo???? http://www.antoniogenna.net/doppiaggio/telefilm/zoobileezoo.jpg :)
Prediction: @dhaowback hursday ih io ihe ieaimber  tom  ei  aom    ?http://bww.zndenic.rteencet/

Train accuracy: 0.321
Test  accuracy: 0.259

Batch 12000 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     how to cheat on any test easy http://bit.ly/10zdwr swine flu h1n1 iphone air france flight father's day dallas
Prediction: @tw to dract tn t   oiat fvri tttp://bit.ly/16myyf

Target:     vision starting to shakin'
Prediction: @iaitn otar eng to teoreng                                                                                                                                        
Target:     @elypereiraa your not alone
Prediction: @mrlnhriyla  iou  mot t lne                                                                                                                                       
Target:     just in case you missed my celebrity retail therapy du jour. go shop yourselves sane! http://ow.ly/czdx
Prediction: @ust gn tare iou cags d me follbrity aecuilethe epy arebonrn hooteowptou  elfes aoyd 

Train accuracy: 0.305
Test  accuracy: 0.259

Batch 12500 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     @stro215 lol
Prediction: @maeen0  iol                                                                                                                                                      
Target:     tried #stardefense last night - ngmoco published another great game. now waiting for #rolando2
Prediction: @hypd tmeartodfade tiut night a hoa  i aobliched tndther sreat srme  how ietting for tceblndo 00000000000000000000000000000000000000000000000000000000000000000000
Target:     egg knob passage oil rubbed bronze http://bit.ly/qtxvb
Prediction: @vg ieowoaarsede anl henier hyikce http://bit.ly/1iewf

Target:     @kidhum and if an act has pull and they don't promote the show...how will their people ever know the show exists?
Prediction: @mamdenaind t  ynyimcuoas brtleo d ihe  ao 't waobite the stow  . ow aall yheyr waople wner tnow ihaysaow ivcsts?

Train accuracy: 0.308
Test  accuracy: 0.258

Batch 13000 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     extra bay music festival tickets available: email me: matt@wceiradio.com
Prediction: @vcra sen aosic aoatival homkets hnailable  hxail ae  hont coalnonio com                                                                                          
Target:     aww love ya@ashleykeith
Prediction: @nw iove yo snhley isn                                                                                                                                            
Target:     good morning/afternoon tweets. hope all is well. what's new? fill me in?
Prediction: @ood norning snfer oon toiets  itwe inl ds toll  ihat s tow  ionmemy tn                                                                                           
Target:     from giving 2.0: nten webinar series roundup part 4 -- http://tinyurl.com/l7gz7t
Prediction: @rom toring a    te   tibsnar heaves aectd p arrt o h  http://binyurl.com/lusa37

Train accuracy: 0.310
Test  accuracy: 0.259

Batch 13500 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     @toilethumorok haha yeah right. thats not going to happen.
Prediction: @shnletherbre eitha ieah ieght .ihet  wot aoong to bavpen                                                                                                         
Target:     one day i will get a piece of art i commissioned. just not any time soon.
Prediction: @m  oay o wall bot a bacce of s tionwaueetsion d  iust tet tl  oime to n.                                                                                         
Target:     laura's painting was returned!! also, boomerangs jp will open at 11 am this monday (6/15) instead of 10 am. thanks!
Prediction: @otre s srrnting ais aeaurned   hnlo  iutk r   e au aill bnen tn t0 an ioes wortay aa02)) hn tead of s00hm..hhenks                                                
Target:     @davejmatthews the video with you was great! always love to see women pour paint on your head.
Prediction: @savijmatthews iha fideo ialh tou ahs aoeat! inlays wooe th see thren aasnsoarntian tout sear 

Train accuracy: 0.310
Test  accuracy: 0.260

Batch 14000 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     si ammedusa su fanfulla (robe da nulla)
Prediction: @ondnaer a  de dal ac o ehec)rae moeua                                                                                                                            
Target:     my nigga verse start "i got a family to support part time wasnt cuttin it i knew for the rest of life i wasnt strugglin" lmfao ahah oh ish
Prediction: @y fegga iiryi torrtiti dot t brcily oi serport"tartyoome tit ' tot ingtt"tnwnow tor the seat of tike iswas ' touacgleng toaaooinhhaih m                          
Target:     @joeymcintyre can we send ya books from the uk ?
Prediction: @moelmacntyre ian ye aaed mo tey?s aoom the ss a                                                                                                                  
Target:     fiddling with my blog post: ( http://bit.ly/qu0q0 )
Prediction: @indling with my flog post: thhttp://bit.ly/1kyrd )

Train accuracy: 0.312
Test  accuracy: 0.260

Batch 14500 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     :d thank you mami! that made me smile rt @chichiglacierz: #shoutout to @accidentaldiva for being my pumpkin pie thru thick n thin & bein ...
Prediction: @  ihetksyou fyna  ihet wake me teile ai @mhacaacaebee: : @foautout to tmlhooentell ca aor teing ao fhbp in aiccaoeo thesk aotheski se ngs..

Target:     lunch with abby then make plans with me.
Prediction: @onch with @ oy aoe  tyke mean  tith ty                                                                                                                           
Target:     hey guys your always welc0me in my room iam s3xy girl and h0t http://feeloncam.com
Prediction: @ty @oys iou  f lays ga lo u t  ty hoom an  ao   torl ind se  ietp://bfeliwtel.com////////////////////////////////////////////////////////////////////////////////
Target:     has no electricity!
Prediction: @th aotcxectric ty                                                                                                                                                
Train accuracy: 0.312
Test  accuracy: 0.260

Batch 15000 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     @garethslee oh well, thats life :(
Prediction: @mrbythehie ih yell  ihet  wike i)                                                                                                                                
Target:     looking to go to national museum of mexican art in chicago this weekend: http://tiny.cc/6itve
Prediction: @olking fo bo to tosional aesium of ty ican antion thinago aoes yeekend  http://binyucc/llbbl

Target:     @bostonmaggie we loooove donuts. that's what gives us the motivation to run.
Prediction: @mroson_ari r ihlaovkooe towets  ihet s whyt iores ip the sovioation th ten                                                                                       
Target:     buying a hardwood floor this year? make informed decisions. attend hardwood 101! wine, cheese & wisdom! july 24th 6-8pm. rsvp! 303-293-8600
Prediction: @ot ng a sord aod siior ahes wear  haye mt ormad feaisions  hn rnd ttsd ard a0   hil   aaecse a celhom  husy 20 h a 1 m  ht p  h   21222120                       
Train accuracy: 0.307
Test  accuracy: 0.261

Batch 15500 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     ataque do msn 2.0 d=
Prediction: @n auande man a 0 he                                                                                                                                              
Target:     @descargaoficial o mionzinho eh a britney!! uhuuuu!!!
Prediction: @saltarrer ocial ihian  a  a ds snmiitaey  !h hhhuu!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Target:     not sure about the walls of jericho (rebecca put it on). as jo just put it: "i don't think i'd drive to it. apart from off a cliff."
Prediction: @ow aure tbout the soyl  af torrcho aaeuelca aab tt on   hn iueoust tut tt  hw don't khink itl loive to tt "hnprt ooom tuf t loaef 

Target:     rt @ewerickson: holocaust shooter, like left wing bloggers, hates bush, israel, the war, christians, capitalism. the list goes on and on.
Prediction: @t @mdalrcastn: @twycaust mtowt r  aose tiat titdstaog er   atve  aysi  a  ael  ahe soym aaeistman   aanttal tm. hhe sift ooes tn t   tne                         
Train accuracy: 0.309
Test  accuracy: 0.260

Batch 16000 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     aww love ya@ashleykeith
Prediction: @newiove to brhleysirn                                                                                                                                            
Target:     fiddling with my blog post: ( http://bit.ly/qu0q0 )
Prediction: @indling with ty blog post: hahttp://bit.ly/1rkad )

Target:     @ms_b_osazuwa i wanna ride jet ski's
Prediction: @ma_mrm_sn  a i dasta gege tur aoins                                                                                                                              
Target:     jfk john fucking kennick ! hahaha (:
Prediction: @u  auhn hock ng aieneek a hthahahiv                                                                                                                              
Train accuracy: 0.312
Test  accuracy: 0.260

Batch 16500 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     @83inches @potatosays this morning i was fondly remembering representing spot @ the rochester pride festival! good times!!!
Prediction: @s00tgoartimaltto t   ihas iavning iswan tortie aiaembering tiaoesentang toetsa the seakelter oaone aomtival  hood nime   !

Target:     done with all of my finals for junior year :d
Prediction: @on' with t l tf ty fargl  aor tusior aearsa)                                                                                                                     
Target:     ouch!
Prediction: @nth!                                                                                                                                                             
Target:     @msali_sobb how work was today? miss me?
Prediction: @sa_nlnbanryiew daud ies thoay? iyss ye                                                                                                                           
Train accuracy: 0.313
Test  accuracy: 0.261

Batch 17000 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     ..and in other news, swine flu declared a level 6 pandemic. symptoms: fever, coughing, sore throat, rhinorrhea. hang on! that's me! *oink*
Prediction: @..nd i  tnher paw   btene flu peliires anfoael o hereamic  hesbtoms  hoeera aanna ng  aeur foaout. aeyno  aoad.hthdiou 

Target:     hello everyone!! eating pizza!!! love it
Prediction: @tylo tveryone   rvring aocza  ! hole tt!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Target:     @letmesowlove i listened to that on the way to school!!
Prediction: @maatesiu y e i wokten d to thet snethe bay th sehool  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Target:     @greeneyedjessie i was referring to all women like that.. cheese mouth has been promoted to the official mascot! lol!
Prediction: @maegnly     s c i was teaerring to snl thran wike that  .iaecrecaavnh aav been arebitid to sae ctficeal sontat  iol

Train accuracy: 0.313
Test  accuracy: 0.259

Batch 17500 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     @xcarliex i'm only hiding it because they paid for my phone bill the other day :-/ don't think they'd be impressed if they knew about it lol
Prediction: @mohnlon  i m sn y tageng tn oefause ihe  aryd tor te fhone augl aoe lther cayso))

Target:     @shoebags hey, what channel's it on? i keep meaning to post schedules....
Prediction: @maaqsob  iay  ihat aaangel s cn in  i wnep tysn ng to slst tohodule   . .........................................................................................
Target:     @nexusz ik spacede em gisteren ook hard op dat broodje kipkerrie.. nice! chillze in de shop..
Prediction: @miwtst i  toene  ras aaatar n an  tthi dnetinakaiok ankan in a  ..hogk  haelii  an ta leiw ......................................................................
Target:     p.s the yeah yeah yeah's concert was pretty ballin' but mucho hot!
Prediction: @hs.ihe bear ioah ioah s somgert ias aretty sadl n  tut iysh  taw                                                                                                 
Train accuracy: 0.313
Test  accuracy: 0.261

Batch 18000 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     dont stop dancing
Prediction: @ow' feop totcing                                                                                                                                                 
Target:     i swear this cat is going though the "terrible twos" or something. she definitely has a vendetta against charmin.
Prediction: @ juear thes won is aoing torugh.ihe wtharitle she   in something  hoe iocinitely aas t lerdor e tnain.t toanging                                                 
Target:     just kicked off her first project - 24/7 business dickaa!!
Prediction: @ust sidked aff tor brrst trofect t h0 2 hotiness cesk sa !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Target:     @greeneyedjessie i was referring to all women like that.. cheese mouth has been promoted to the official mascot! lol!
Prediction: @maegnbres   s c h was aeaerring to t l teran woke that  .ioecse iavsh ias aeen arebited to she stficeal sensat  hol

Train accuracy: 0.295
Test  accuracy: 0.264

Batch 18500 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     blog: pirates of the caribbean 4 before the lone ranger: jerry bruckheimer who is currently working on the .. http://tinyurl.com/m29pdr
Prediction: @eog  trcates af the dor nbean h hesore hhe sangloetkers hutsy aeineeead r hii hs torrently sarking on the s. http://tinyurl.com/lwkngg

Target:     @urbanslang16 how was that? that's how it played out in my smutty mind anyway!
Prediction: @mnbanatonde  itw aas thet  ihet s wow i  woayid int tn ty hoag y aand.indways

Target:     @vivimero my son, christopher, studied in florence, and traveled to milano and loved it.
Prediction: @miciaela hy bon  ioeis opher  ioadyos in trorince  bnd ihyiel d to tenln  and tive  it.                                                                          
Target:     @ebongray im not doing anything that i know of on friday
Prediction: @mmonyaon i  got aoing t ything aoat i hnow if tuetaiday                                                                                                          
Train accuracy: 0.296
Test  accuracy: 0.266

Batch 19000 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     @joeymcintyre can we send ya books from the uk ?
Prediction: @manymcintyre ian ye geed yo bayk  aoom the sn a                                                                                                                  
Target:     @dannymcfly im blaming you for being so cute and make me love mcfly!! :p
Prediction: @mavnygafly h  sooming mou aor teing ao sote tnd iake me sooe tecly  !i)

Target:     finally dun with school! :) yay me! summer summer summertime!!
Prediction: @inally gomntith tohool  h)

Target:     add rtm
Prediction: @nd te                                                                                                                                                            
Train accuracy: 0.299
Test  accuracy: 0.267

Batch 19500 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     another positive article from a social network people love to hate simply because of its name: http://digg.com/d1tuku
Prediction: @nyther srsttive trticle ooom t ctnial metworkifrrple tioe th save honple aeiause if tt  boted http://bigg.com/u1tytt

Target:     well, i'm all caught up with my work and now have nothing to do. thank god for my ipod and this big window, lmao
Prediction: @hlc  i m s wooorght tp tith ty bork and iot iave ao hing to do  ihetksyod tor te bphd tnd thes wet saldow  boao

Target:     well, i'm all caught up with my work and now have nothing to do. thank god for my ipod and this big window, lmao
Prediction: @hlc  i m s wooorght tp tith ty bork and iot iave ao hing to do  ihetksyod tor te bphd tnd thes wet saldow  boao

Target:     @exotic damn homie i thought we was tight like the uniform on a top flight..lol
Prediction: @mmptic iovn iemee iswhiught ie han ahrht toke the ssivorm of t crt sroght  .ol

Train accuracy: 0.299
Test  accuracy: 0.268

Batch 20000 of 20000
512/512 [==============================] - 7s     
Example outputs:
Target:     legitimate home business idea for cooks. http://bit.ly/811on via @addthis
Prediction: @otasimati come susiness ineashor aomli  http://bit.ly/1iwpn

Target:     @carolineidw haha ik heb nog wel plaats als ik de kasten eronder leegmaak :)
Prediction: @marlline neaithahi  deb da  kee deaat  dl  nn ha lanaen dn t e  diu ean eh)

Target:     hello view my special profile at http://tinyurl.com/nrvzfg laterz (you must register first)
Prediction: @tylo teew ty biecial paojile pt http://binyurl.com/m834jz

Target:     working on some fun stuff for the show and tampa bay on demand right now - details to come soon!! :)
Prediction: @hwking on tome oun ttaff tor the waow ond ihkea san an tamind tight now!h hoaails oo some oe n  !h)

Train accuracy: 0.302
Test  accuracy: 0.269

It should be noted that the accuracy on twitter data should not be compared with accuracies on datasets containing more structured or domain-specific text. As it exhibits lots of unusual characteristics, e.g. many tags, URLs, misspellings, noise (e.g. non-english data), smilies, colloquial language, one can expect a significantly lower result. Many of those (and other) errors are also irrelevant from a generation perspective, e.g. choosing the wrong one of somewhat equally possible followups.


In [47]:
text_len = 500 # Upper bound. The net should output the end token much earlier as the longest tweet is of length 162.
n_samples = 5

# Draws a random sample from the net's softmax output and returns the corresponding char.
# beta can, in an analogy to statistical mechanics, be seen as the inverse of thermodynamic temperature.
# Setting it higher (> 1.0), i.e. lower temp/entropy, tilts the sampling towards the more likely options.
# Setting it lower (< 1.0), i.e. higher entropy, gives a relatively higher weight to less likely options.
# Theoretical extreme cases are beta->inf which is identical to argmax, i.e. it always produces the most likely 
# sequence, and beta=0 which is like sampling from a uniform distribution, i.e. it produces each sequence equally 
# likely.
def tochar_prob(output, beta):
    summed = []
    assert beta >= 0
    if beta != 1.0:
        output = output**beta
        output /= sum(output)
    summed = np.cumsum(output)
    i = bisect.bisect(summed, random.random())
    if i == 0:
        return 'Ž' # Just such that we can catch it in the output. The net should never do that.
    return index_to_char[i]

# Sample tweets by feeding the net output back into itself.
# The net can also be seeded with an arbitrary string and continue from there.
# TODO: Support batch processing.
def sample_tweet(length, seed='\t', model=model, beta=1.5):
    assert length - len(seed) > 0
    seq = np.zeros((1, length), dtype='uint8')
    for i, char in enumerate(seed):
        seq[:,i] = char_to_index[seed[i]]
    string = list(seed)
    for i in xrange(len(seed), length):
        c = model.predict(seq[:,:i])[0,-1]
        c = tochar_prob(c, beta=beta)
        if c == '\n':
            break
        string.append(c)
        seq[0,i] = char_to_index[c]
    return ''.join(string).strip()

# 5 general samples.
for i in xrange(n_samples):
    print sample_tweet(text_len)
    print
    
# 5 samples given a seed.
for i in xrange(n_samples):
    print sample_tweet(text_len, "\tyo yo yo")
    print
    
# Feel free to experiment with different betas.


hanging out of the internet of the last day of my head

happy bday to my life in the end of the street market launches on a car (wk) http://plurk.com/p/106dys

http://bit.ly/ko1t - 2006 more info. @modernick that gets a little trip

say i am seriously one of the fan in the blog is still working on the scary of the same thing i have to drink them. that was own a hair.

@joelopen i think my day was up there... thanks da vaccination to having a drum thing along the computer.

yo yo you comment. just read that tattoo. won't see the storage on this shit!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

yo yo you guys never wonder what i had that way to meet and i will be done. what are you thinking? :( sent the strange ben. please did not come home shit with the practice for the feeds let's get your commenten

yo yo yo much lost in musica un een idea! estar a journo es me que sem para el poder hayar la ray todos not relecaste o me la salin o santos me... http://bit.ly/gfzqe

yo yo yo wat do saw ya back work!

yo yo you just won my damn in the mail i want to live a thing down the call of your tweet this summer.


In [48]:
model.save('model-charlevel-twitter/32emb-5x512lstm-1M.h5')