markov



In [1]:
from __future__ import print_function, division

import sys
import string
import random

In [8]:
def process_file(filename, order=2):
    """Reads a file and performs Markov analysis.

    filename: string
    order: integer number of words in the prefix

    returns: map from prefix to list of possible suffixes.
    """
    for line in open(filename):
        for word in line.rstrip().split():
            process_word(word, order)

In [9]:
def process_word(word, order=2):
    """Processes each word.

    word: string
    order: integer

    During the first few iterations, all we do is store up the words; 
    after that we start adding entries to the dictionary.
    """
    global prefix
    if len(prefix) < order:
        prefix += (word,)
        return

    try:
        suffix_map[prefix].append(word)
    except KeyError:
        # if there is no entry for this prefix, make one
        suffix_map[prefix] = [word]

    prefix = shift(prefix, word)

In [16]:
def random_text(n=100, start=None):
    """Generates random wordsfrom the analyzed text.

    Starts with a random prefix from the dictionary.

    n: number of words to generate
    """
    # choose a random prefix (not weighted by frequency)
    if start is None:
        start = random.choice(list(suffix_map.keys()))
    
    for i in range(n):
        suffixes = suffix_map.get(start, None)
        if suffixes == None:
            # if the start isn't in map, we got to the end of the
            # original text, so we have to start again.
            random_text(n-i)
            return

        # choose a random suffix
        word = random.choice(suffixes)
        print(word, end=' ')
        start = shift(start, word)

In [17]:
def shift(t, word):
    """Forms a new tuple by removing the head and adding word to the tail.

    t: tuple of strings
    word: string

    Returns: tuple of strings
    """
    return t[1:] + (word,)

In [18]:
suffix_map = {}        # map from prefixes to a list of suffixes
prefix = ()            # current tuple of words

In [19]:
process_file('corpus2.txt')

In [22]:
random_text()


what we had. And we have sort of interesting, but when people make misstatements somebody has some, you know, I went through some that weren’t so elegant. But all I’m asking is one thing, you know Obama felt—President Obama felt it was his biggest problem is going to be Dreamers also. But there’s a big difference—first of all, there’s a big problem, and they were only going to be solved. China has been fantastic. The governor of Wisconsin has been a problem left on my desk, but it is, and I probably have a meeting, and then he’ll leave. He 

In [15]:
random_text()


for it, OK? There are not a large amount. I mean—think of this—I hate to say it but it’s the same wall that we’re always talking about. It’s—you know, wherever we need, we don’t make a good chance to make a deal on DACA, I really have gotten to like. And I know it’s a hoax. And yet, they are milking it to a fare-thee-well and I said, no, I can’t see. Now on the west side that killed eight people and badly, you heard me say yesterday, badly, badly wounded about 12. I mean some are in their presentations 

In [21]:
random_text()


media dislikes me. I mean he was such a lightning rod. And Steve, in the next election or the other, that problem is North Korea. I have a natural barrier that’s far stronger than even what I did, I was successful, successful, successful. I was successful, successful, successful. I was right because many things that we’ve done, including being a cheerleader for the infrastructure. That will include many things. One of the military. We’re making the military budget very substantially; you know politics, perhaps better than anybody, they’re unbelievable. They both endorsed me, the only time they’ve ever been in 

In [23]:
random_text()


Yeah, I do, I would—I would say—I don’t know who’s over here. You’re here, you’ve got a phone call from Russia. I didn’t fire him. But just so we understand. We need regulation, but we don’t make a fair deal for the infrastructure. That will include bridges, roadways, tunnels; it will be actually $1.8 trillion of investment in our infrastructure which will largely rebuild our infrastructure. That gets it built on time, on budget and the bottom line is that the people that don’t know who’s over here. You’re here, you’ve got a phone call from Russia. I didn’t fire 

In [24]:
random_text()


by Comcast and another one. Many are announcing today—and the ones that—that was never anticipated. That was just one of these. But—because I do ask you this, treat me fairly. We’ll do it—every month we’ll do one of the lottery has to be bipartisan. Well, I’d rather talk to you about that later because honestly, we’re doing separate of that will be actually $1.8 trillion spent. We have to have a lot of people working it’s going to be so—I’ve had another pledge that I’m bending. I think we have a thing called trade. And South Korea—brilliantly makes—we have a 

In [25]:
random_text()


country is all about. Yeah, Rex and I think we’ll have something on that. We’ll find out. But people do leave. You guys may leave but I don’t know of one politician in Washington—if you’re a politician and somebody called up that they have phony sources, when the sources don’t exist, yeah I think would be frankly a positive for our country made wealthy, but we have violent rivers that nobody goes near, we have cameras and we have sort of interesting, but when people make misstatements, like yourselves, but when people make misstatements somebody has some, you know, I 

In [26]:
random_text()


don’t know who’s over here. You’re here, you’ve got the wall is the same wall I’ve always talked about. I think we have companies pouring back into this country and you don’t know who’s there, you’ve got the wall will happen. And if you look—point, after point, after point—now we’ve had in the Oval Office who I never said the wall’s going to build them here. We have a very old report. Business, generally, manufacturing the same wall that we’re talking about or whatever it may be. Nobody even knows what it is, and I said, “he’s fired.” But the 

In [27]:
random_text()


want to tell the story of what’s happening with North Korea all of them put together, but they can pay for it? Mexico. Well they will make sure that no country including Russia can have anything to do with my win. Hope, just out of the most elegant debate—I thought it was a dead meeting. No, I never forget, when I fired, all these people, they all wanted him fired until I said, ‘We got to get worse. The cutting of regulation and all but— and we’re having exercises on the polls. You know, the United States has been extremely 

In [28]:
random_text()


all I see of these people that had just said, “You’ve got fire Comey”, they said, “oh, he’s wonderful—he’s wonderful, how could there be—now, I haven’t even heard that they’re looking at all these people that really want to make it smaller. The wall was always the best athlete, people don’t know me, that were never interviewed by—to me, you know. The man with the ICE agents and they should not have left me with that problem. That should have been found out if I were them I would like to ask a couple of questions like we want to 

In [29]:
random_text()


it’s the time. But look, hey, Gary may leave but I can fully understand why I have one meeting with—about Russia? And... Well allow—let me—(inaudible). So, they make up a television show. As you know, I went to the—I went to the—I went to the—I went to the employees—to millions and millions of employees. And AT&T started it, but I will terminate Nafta. OK? You know, we only have a thing called trade. And South Korea—brilliantly makes—we have a chance of making a deal on DACA, I would try. But the deputy, Rosenstein, who is in charge, he wrote a 

In [ ]: