In [4]:
import pandas as pd
import cPickle as pickle
import sqlite3
import numpy as np
import nltk.tokenize as tok

In [5]:
pathway = '../../data/labeledRedditComments.p'
df = pickle.load(open(pathway, 'rb'))

In [21]:
randNums = np.random.randint(low=0,high=len(df.index),size=(20,1))

tokenizer = tok.PunktSentenceTokenizer()

for row in randNums:
    comment = df.ix[int(row),:]
    body = comment['body']
    sentenceList = tokenizer.tokenize(body)
    wordList = []
    for sentence in sentenceList:
        wordList.extend(sentence.split(" "))
    print body
    print wordList
    print ""


Headline: "GAMERGATE DEFACES FEDERAL RESERVE NOTE"
[u'Headline:', u'"GAMERGATE', u'DEFACES', u'FEDERAL', u'RESERVE', u'NOTE"']

Prader is scary and is one of the few conditions that people can't manage on their own. Of course you'd think the parents would be focused on being super nutritious & managing the food in the house and not only locking it up. They look like they could use some restriction themselves.
[u'Prader', u'is', u'scary', u'and', u'is', u'one', u'of', u'the', u'few', u'conditions', u'that', u'people', u"can't", u'manage', u'on', u'their', u'own.', u'Of', u'course', u"you'd", u'think', u'the', u'parents', u'would', u'be', u'focused', u'on', u'being', u'super', u'nutritious', u'&', u'managing', u'the', u'food', u'in', u'the', u'house', u'and', u'not', u'only', u'locking', u'it', u'up.', u'They', u'look', u'like', u'they', u'could', u'use', u'some', u'restriction', u'themselves.']

>Because it's not easier to just tell everyone himself?

Ok I see how it is. If He just tells everyone Himself He's kinda limiting the choices they have, which is limiting free will, which He can't do. Other than that I can't really answer that at this current time.

>Murder and adultery? That was not horrific. No shit.

Not at that specific point in history, no.

>which must have made god look like an extra gigantic asshole for slowly killing David's baby as revenge for his sin.

On the contrary. Even though some translations might say that God *caused* Davids baby to slowly die, or whatever, the Hebrew can easily be translated as "God *predicted* that Davids *actions* would cause his baby to die." Again there is a lot of debate on that subject but we're talking about prophets.

>I believe you. Because you said it. Whatever you say is true without any need to actually logically justify it. I can see why you enjoy the copy and pasting, helps ya get all the truth out there at once.

Well golly gee thanks buddy.

>Oh yea, speculatular job. The Israelites rejected Jesus splitting some of the Israelites off into a new religion all together. They have been scattered all across the world, they have undergone tragedy aftet tragedy, attempted genocide and recently a Jewish person on this subreddit referred to the state of Judaism dying as a culture as a "silent holocaust."

That's right. Just focus on the negative. I could list off all the bad things that has happened to any religion or ideology and it wouldn't help this discussion at all.

Back to you /u/potzdamn
[u'>Because', u"it's", u'not', u'easier', u'to', u'just', u'tell', u'everyone', u'himself?', u'Ok', u'I', u'see', u'how', u'it', u'is.', u'If', u'He', u'just', u'tells', u'everyone', u'Himself', u"He's", u'kinda', u'limiting', u'the', u'choices', u'they', u'have,', u'which', u'is', u'limiting', u'free', u'will,', u'which', u'He', u"can't", u'do.', u'Other', u'than', u'that', u'I', u"can't", u'really', u'answer', u'that', u'at', u'this', u'current', u'time.', u'>Murder', u'and', u'adultery?', u'That', u'was', u'not', u'horrific.', u'No', u'shit.', u'Not', u'at', u'that', u'specific', u'point', u'in', u'history,', u'no.', u'>which', u'must', u'have', u'made', u'god', u'look', u'like', u'an', u'extra', u'gigantic', u'asshole', u'for', u'slowly', u'killing', u"David's", u'baby', u'as', u'revenge', u'for', u'his', u'sin.', u'On', u'the', u'contrary.', u'Even', u'though', u'some', u'translations', u'might', u'say', u'that', u'God', u'*caused*', u'Davids', u'baby', u'to', u'slowly', u'die,', u'or', u'whatever,', u'the', u'Hebrew', u'can', u'easily', u'be', u'translated', u'as', u'"God', u'*predicted*', u'that', u'Davids', u'*actions*', u'would', u'cause', u'his', u'baby', u'to', u'die."', u'Again', u'there', u'is', u'a', u'lot', u'of', u'debate', u'on', u'that', u'subject', u'but', u"we're", u'talking', u'about', u'prophets.', u'>I', u'believe', u'you.', u'Because', u'you', u'said', u'it.', u'Whatever', u'you', u'say', u'is', u'true', u'without', u'any', u'need', u'to', u'actually', u'logically', u'justify', u'it.', u'I', u'can', u'see', u'why', u'you', u'enjoy', u'the', u'copy', u'and', u'pasting,', u'helps', u'ya', u'get', u'all', u'the', u'truth', u'out', u'there', u'at', u'once.', u'Well', u'golly', u'gee', u'thanks', u'buddy.', u'>Oh', u'yea,', u'speculatular', u'job.', u'The', u'Israelites', u'rejected', u'Jesus', u'splitting', u'some', u'of', u'the', u'Israelites', u'off', u'into', u'a', u'new', u'religion', u'all', u'together.', u'They', u'have', u'been', u'scattered', u'all', u'across', u'the', u'world,', u'they', u'have', u'undergone', u'tragedy', u'aftet', u'tragedy,', u'attempted', u'genocide', u'and', u'recently', u'a', u'Jewish', u'person', u'on', u'this', u'subreddit', u'referred', u'to', u'the', u'state', u'of', u'Judaism', u'dying', u'as', u'a', u'culture', u'as', u'a', u'"silent', u'holocaust."', u"That's", u'right.', u'Just', u'focus', u'on', u'the', u'negative.', u'I', u'could', u'list', u'off', u'all', u'the', u'bad', u'things', u'that', u'has', u'happened', u'to', u'any', u'religion', u'or', u'ideology', u'and', u'it', u"wouldn't", u'help', u'this', u'discussion', u'at', u'all.', u'Back', u'to', u'you', u'/u/potzdamn']

Your coil isn't a monthly fee it was a one time cost, i was talking specifically about reoccurring monthly fees.  So for a one time expense then....save your money?  Again why do you expect the NHS to pay for it.  Pay for it yourself.  When you become an adult stop expecting mommy and daddy to pay for you don't go running to the rest of the responsible people or the government.  Show some independence, pay for it yourself.  

It's funny you only look at it like you're entitled to have your stuff paid for by the NHS.  Do you have a cellphone? I bet you do, and I bet it costs hmmm around $50 a month.   You could come up with that money.  But you CHOOSE not to and you DEMAND that others pay for your luxuries.  You can afford to pay for it you just choose not to prioritize your spending towards it.  HUGE DIFFERENCE.   If you are piss poor, then it's more understandable to work something out but you're not.  You don't even know what it's like to really struggle or show restraint you live in a bubble.
[u'Your', u'coil', u"isn't", u'a', u'monthly', u'fee', u'it', u'was', u'a', u'one', u'time', u'cost,', u'i', u'was', u'talking', u'specifically', u'about', u'reoccurring', u'monthly', u'fees.', u'So', u'for', u'a', u'one', u'time', u'expense', u'then....save', u'your', u'money?', u'Again', u'why', u'do', u'you', u'expect', u'the', u'NHS', u'to', u'pay', u'for', u'it.', u'Pay', u'for', u'it', u'yourself.', u'When', u'you', u'become', u'an', u'adult', u'stop', u'expecting', u'mommy', u'and', u'daddy', u'to', u'pay', u'for', u'you', u"don't", u'go', u'running', u'to', u'the', u'rest', u'of', u'the', u'responsible', u'people', u'or', u'the', u'government.', u'Show', u'some', u'independence,', u'pay', u'for', u'it', u'yourself.', u"It's", u'funny', u'you', u'only', u'look', u'at', u'it', u'like', u"you're", u'entitled', u'to', u'have', u'your', u'stuff', u'paid', u'for', u'by', u'the', u'NHS.', u'Do', u'you', u'have', u'a', u'cellphone?', u'I', u'bet', u'you', u'do,', u'and', u'I', u'bet', u'it', u'costs', u'hmmm', u'around', u'$50', u'a', u'month.', u'You', u'could', u'come', u'up', u'with', u'that', u'money.', u'But', u'you', u'CHOOSE', u'not', u'to', u'and', u'you', u'DEMAND', u'that', u'others', u'pay', u'for', u'your', u'luxuries.', u'You', u'can', u'afford', u'to', u'pay', u'for', u'it', u'you', u'just', u'choose', u'not', u'to', u'prioritize', u'your', u'spending', u'towards', u'it.', u'HUGE', u'DIFFERENCE.', u'If', u'you', u'are', u'piss', u'poor,', u'then', u"it's", u'more', u'understandable', u'to', u'work', u'something', u'out', u'but', u"you're", u'not.', u'You', u"don't", u'even', u'know', u'what', u"it's", u'like', u'to', u'really', u'struggle', u'or', u'show', u'restraint', u'you', u'live', u'in', u'a', u'bubble.']

Tell my girlfriend so we can laugh as they dejectedly slink away.
[u'Tell', u'my', u'girlfriend', u'so', u'we', u'can', u'laugh', u'as', u'they', u'dejectedly', u'slink', u'away.']

The hamburgler is one of the worst people on the show. She had already saw her husband die of obesity at an early age. Then her son reaches 748 Ilbs. The whole time she tries to say she cooks healthy food. First meal back after James gets weight lost surgery, nothing like 3 sausages and BBQ sauce. 
[u'The', u'hamburgler', u'is', u'one', u'of', u'the', u'worst', u'people', u'on', u'the', u'show.', u'She', u'had', u'already', u'saw', u'her', u'husband', u'die', u'of', u'obesity', u'at', u'an', u'early', u'age.', u'Then', u'her', u'son', u'reaches', u'748', u'Ilbs.', u'The', u'whole', u'time', u'she', u'tries', u'to', u'say', u'she', u'cooks', u'healthy', u'food.', u'First', u'meal', u'back', u'after', u'James', u'gets', u'weight', u'lost', u'surgery,', u'nothing', u'like', u'3', u'sausages', u'and', u'BBQ', u'sauce.', u'']

Goes back to what confuses me the most...

When I go up a pant size, I have an "Oh Shit" moment and immediately start cutting. These planets have this happen repeatedly and DO NOTHING.

Granted, gaining muscle is a slow process, but I would argue becoming planetary is only slightly faster. 
[u'Goes', u'back', u'to', u'what', u'confuses', u'me', u'the', u'most...\n\nWhen', u'I', u'go', u'up', u'a', u'pant', u'size,', u'I', u'have', u'an', u'"Oh', u'Shit"', u'moment', u'and', u'immediately', u'start', u'cutting.', u'These', u'planets', u'have', u'this', u'happen', u'repeatedly', u'and', u'DO', u'NOTHING.', u'Granted,', u'gaining', u'muscle', u'is', u'a', u'slow', u'process,', u'but', u'I', u'would', u'argue', u'becoming', u'planetary', u'is', u'only', u'slightly', u'faster.', u'']

[deleted]
[u'[deleted]']

What have you got to loose by seeing a therapist? A few bucks and time? 
[u'What', u'have', u'you', u'got', u'to', u'loose', u'by', u'seeing', u'a', u'therapist?', u'A', u'few', u'bucks', u'and', u'time?', u'']

>  if cruelty is the means by which israel will survive then so be it

What is this talk of "survive", Israelis have some of the easiest lives in the world, top rate medical care, incredible economy, facing virtually no threat from the primitive rockets the neighbors launch.  Israelis have some of the longest easiest lives in the world, how do you manage to maintain a victim mentality through it all?  You live in a bubble and breathe your own fumes.
[u'>', u'', u'if', u'cruelty', u'is', u'the', u'means', u'by', u'which', u'israel', u'will', u'survive', u'then', u'so', u'be', u'it\n\nWhat', u'is', u'this', u'talk', u'of', u'"survive",', u'Israelis', u'have', u'some', u'of', u'the', u'easiest', u'lives', u'in', u'the', u'world,', u'top', u'rate', u'medical', u'care,', u'incredible', u'economy,', u'facing', u'virtually', u'no', u'threat', u'from', u'the', u'primitive', u'rockets', u'the', u'neighbors', u'launch.', u'Israelis', u'have', u'some', u'of', u'the', u'longest', u'easiest', u'lives', u'in', u'the', u'world,', u'how', u'do', u'you', u'manage', u'to', u'maintain', u'a', u'victim', u'mentality', u'through', u'it', u'all?', u'You', u'live', u'in', u'a', u'bubble', u'and', u'breathe', u'your', u'own', u'fumes.']

Can FPH just absolutely raid this Twitter account? Please? 
[u'Can', u'FPH', u'just', u'absolutely', u'raid', u'this', u'Twitter', u'account?', u'Please?', u'']

What options are left here for this government? Are we looking at a total collapse with an uprising/coup possible in the coming year? 

I know tensions are high, but what's the most likely outcome in the near future for this country?
[u'What', u'options', u'are', u'left', u'here', u'for', u'this', u'government?', u'Are', u'we', u'looking', u'at', u'a', u'total', u'collapse', u'with', u'an', u'uprising/coup', u'possible', u'in', u'the', u'coming', u'year?', u'I', u'know', u'tensions', u'are', u'high,', u'but', u"what's", u'the', u'most', u'likely', u'outcome', u'in', u'the', u'near', u'future', u'for', u'this', u'country?']

I have one extra protein bar than usual. And only cause I worked out.
[u'I', u'have', u'one', u'extra', u'protein', u'bar', u'than', u'usual.', u'And', u'only', u'cause', u'I', u'worked', u'out.']

"Drove my bike" - haha perfect! 
[u'"Drove', u'my', u'bike"', u'-', u'haha', u'perfect!', u'']

[deleted]
[u'[deleted]']

I don't want to date either but I guess the loose skin. 
[u'I', u"don't", u'want', u'to', u'date', u'either', u'but', u'I', u'guess', u'the', u'loose', u'skin.', u'']

Nah dude, you're all good! I totally get where you are coming from, too :) I think you have a point--with more awareness coming around to how much girls are harassed daily, I could see how the actual good guys do sometimes step in and defend her/tell the dude to lay off. Which I could totally see happening (and let me tell you, it's a godsend when it does happen; more often that not, the bigger guys are very handsy and pushy and aggressive even when you've told them no a couple times). 

And those "other females" are probably either gonna be hambeasts themselves and then, having witnessed that rejection, realize you're not interested in her cuuurrrrves and attack, OR they're fatty Mcbooboo's friends and basically have to defend her lardass. It's a lose-lose situation for everyone involved, really :/
[u'Nah', u'dude,', u"you're", u'all', u'good!', u'I', u'totally', u'get', u'where', u'you', u'are', u'coming', u'from,', u'too', u':)', u'I', u'think', u'you', u'have', u'a', u'point--with', u'more', u'awareness', u'coming', u'around', u'to', u'how', u'much', u'girls', u'are', u'harassed', u'daily,', u'I', u'could', u'see', u'how', u'the', u'actual', u'good', u'guys', u'do', u'sometimes', u'step', u'in', u'and', u'defend', u'her/tell', u'the', u'dude', u'to', u'lay', u'off.', u'Which', u'I', u'could', u'totally', u'see', u'happening', u'(and', u'let', u'me', u'tell', u'you,', u"it's", u'a', u'godsend', u'when', u'it', u'does', u'happen;', u'more', u'often', u'that', u'not,', u'the', u'bigger', u'guys', u'are', u'very', u'handsy', u'and', u'pushy', u'and', u'aggressive', u'even', u'when', u"you've", u'told', u'them', u'no', u'a', u'couple', u'times).', u'And', u'those', u'"other', u'females"', u'are', u'probably', u'either', u'gonna', u'be', u'hambeasts', u'themselves', u'and', u'then,', u'having', u'witnessed', u'that', u'rejection,', u'realize', u"you're", u'not', u'interested', u'in', u'her', u'cuuurrrrves', u'and', u'attack,', u'OR', u"they're", u'fatty', u"Mcbooboo's", u'friends', u'and', u'basically', u'have', u'to', u'defend', u'her', u'lardass.', u"It's", u'a', u'lose-lose', u'situation', u'for', u'everyone', u'involved,', u'really', u':/']

Not really. At least with my girlfriend, the breathing and the muscles spasming really amp up toward the end of it and you know it's there. Not to mention all the audible gasping and legs trembling. It's really not a subtle thing to notice, haha.
[u'Not', u'really.', u'At', u'least', u'with', u'my', u'girlfriend,', u'the', u'breathing', u'and', u'the', u'muscles', u'spasming', u'really', u'amp', u'up', u'toward', u'the', u'end', u'of', u'it', u'and', u'you', u'know', u"it's", u'there.', u'Not', u'to', u'mention', u'all', u'the', u'audible', u'gasping', u'and', u'legs', u'trembling.', u"It's", u'really', u'not', u'a', u'subtle', u'thing', u'to', u'notice,', u'haha.']

[Pig hearts are all the rage](https://www.youtube.com/watch?v=W_E6p-dpkDo)
[u'[Pig', u'hearts', u'are', u'all', u'the', u'rage](https://www.youtube.com/watch?v=W_E6p-dpkDo)']

I mentioned it to her in text, saying I had feelings for her, but I got the "I'm not looking for a relationship right now, doing that might jeopardize our friendship" but in person, no. Never had the guts.
[u'I', u'mentioned', u'it', u'to', u'her', u'in', u'text,', u'saying', u'I', u'had', u'feelings', u'for', u'her,', u'but', u'I', u'got', u'the', u'"I\'m', u'not', u'looking', u'for', u'a', u'relationship', u'right', u'now,', u'doing', u'that', u'might', u'jeopardize', u'our', u'friendship"', u'but', u'in', u'person,', u'no.', u'Never', u'had', u'the', u'guts.']


In [26]:
# interesting edge cases

# [deleted]
ex1 = [u'[deleted]']

# here u go m8 some [help](http://us.whales.org/issues/what-to-do-if-you-find-live-stranded-whale-or-dolphin)
ex2 = [u'here', u'u', u'go', u'm8', u'some', 
 u'[help](http://us.whales.org/issues/what-to-do-if-you-find-live-stranded-whale-or-dolphin)']

# *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit]
# (/message/compose/?to=/r/politics) if you have any questions or concerns.*
ex3 = [u'\nUnfortunately,', u'this', u'submission', u'has', u'been', u'removed.', u'Per', u'the', u'sidebar:\n\n*',
 u'**Rule', u'7', u'-**', u'Do', u'not', u'use', u'"BREAKING"', u'or', u'ALL', u'CAPS', u'in', u'titles.', u'The',
 u'ALL', u'CAPS', u'and', u"'Breaking'", u'rule', u'is', u'applied', u'even', u'when', u'the', u'actual', u'title',
 u'of', u'the', u'article', u'is', u'in', u'all', u'caps', u'or', u'contains', u'the', u'word', u"'Breaking'.", 
 u'This', u'rule', u'may', u'be', u'applied', u'to', u'other', u'single', u'word', u'declarative', u'and/or', 
 u'sensational', u'expressions,', u'such', u'as', u"'EXCLUSIVE:'", u'or', u"'HOT:'.", u'Please', u'resubmit',
 u'your', u'post', u'with', u'a', u'title', u'that', u'is', u'in', u'sentence', u'or', u'title', u'case.',
 u'[**More', u'Info.**](/r/politics/wiki/rulesandregs#wiki_follow_reddiquette.27s_title_instructions.)', u'*I',
 u'am', u'a', u'bot,', u'and', u'this', u'action', u'was', u'performed', u'automatically.', u'Please', u'[contact',
 u'the', u'moderators', u'of', u'this', u'subreddit](/message/compose/?to=/r/politics)', u'if', u'you', u'have', 
 u'any', u'questions', u'or', u'concerns.', u'*']

# I think the other thing we have to consider is the majority population of Reddie: White males between the ages of 
# 18-35 who have been raised by society, and their parents, to believe that everything should be handed to them,     
# everybody thinks they're special, and everything they think/say has validity. Even if they come from an abusive
# household, society tells them their abuse is valid, their emotions and insights, also valid.
# Way more "valid" than anyone else's.
# So as much as I think it's white people needing to "make progress", I also think it's worthwhile for POC to 
# recognize the narcissism they are socialized into. This isn't to give them pity. I certainly have no sympathy for 
# some jack-off on the internet raging about the fact that he can't get pity sex from a girl he stalked oh and by the 
# way, fuck black people; but it does remind me that I am not dealing with someone who has reasoning skills, the
# ability to empathize, or the desire to do so.
ex4 = [u'I', u'think', u'the', u'other', u'thing', u'we', u'have', u'to', u'consider', u'is', u'the', u'majority', 
 u'population', u'of', u'Reddie:', u'White', u'males', u'between', u'the', u'ages', u'of', u'18-35', u'who', 
 u'have', u'been', u'raised', u'by', u'society,', u'and', u'their', u'parents,', u'to', u'believe', u'that', 
 u'everything', u'should', u'be', u'handed', u'to', u'them,', u'everybody', u'thinks', u"they're", u'special,', 
 u'and', u'everything', u'they', u'think/say', u'has', u'validity.', u'Even', u'if', u'they', u'come', u'from',
 u'an', u'abusive', u'household,', u'society', u'tells', u'them', u'their', u'abuse', u'is', u'valid,', u'their',
 u'emotions', u'and', u'insights,', u'also', u'valid.', u'Way', u'more', u'"valid"', u'than', u'anyone', u"else's.",
 u'So', u'as', u'much', u'as', u'I', u'think', u"it's", u'white', u'people', u'needing', u'to', u'"make', 
 u'progress",', u'I', u'also', u'think', u"it's", u'worthwhile', u'for', u'POC', u'to', u'recognize', u'the',
 u'narcissism', u'they', u'are', u'socialized', u'into.', u'This', u"isn't", u'to', u'give', u'them', u'pity.', 
 u'I', u'certainly', u'have', u'no', u'sympathy', u'for', u'some', u'jack-off', u'on', u'the', u'internet',
 u'raging', u'about', u'the', u'fact', u'that', u'he', u"can't", u'get', u'pity', u'sex', u'from', u'a', u'girl', 
 u'he', u'stalked', u'oh', u'and', u'by', u'the', u'way,', u'fuck', u'black', u'people;', u'but', u'it', u'does', 
 u'remind', u'me', u'that', u'I', u'am', u'not', u'dealing', u'with', u'someone', u'who', u'has', u'reasoning', 
 u'skills,', u'the', u'ability', u'to', u'empathize,', u'or', u'the', u'desire', u'to', u'do', u'so.']

# You must have forgotten what sub you are in.  Anyone who is not fanatically liberal in every sense of the word is
# a fucking idiot.  If you believe in god you are a fucking idiot.  If you are rich, you are a bad person and 
# probably don't deserve it.  /r/politics
ex5 = [u'You', u'must', u'have', u'forgotten', u'what', u'sub', u'you', u'are', u'in.', u'Anyone', u'who', u'is', u'not',
 u'fanatically', u'liberal', u'in', u'every', u'sense', u'of', u'the', u'word', u'is', u'a', u'fucking', u'idiot.', 
 u'If', u'you', u'believe', u'in', u'god', u'you', u'are', u'a', u'fucking', u'idiot.', u'If', u'you', u'are', 
 u'rich,', u'you', u'are', u'a', u'bad', u'person', u'and', u'probably', u"don't", u'deserve', u'it.',
 u'/r/politics']

# >Because it's not easier to just tell everyone himself?
# Ok I see how it is. If He just tells everyone Himself He's kinda limiting the choices they have, which is limiting free will, which He can't do. Other than that I can't really answer that at this current time.
# >Murder and adultery? That was not horrific. No shit.
# Not at that specific point in history, no.
# >which must have made god look like an extra gigantic asshole for slowly killing David's baby as revenge for his sin.
# On the contrary. Even though some translations might say that God *caused* Davids baby to slowly die, or whatever, the Hebrew can easily be translated as "God *predicted* that Davids *actions* would cause his baby to die." Again there is a lot of debate on that subject but we're talking about prophets.
# >I believe you. Because you said it. Whatever you say is true without any need to actually logically justify it. I can see why you enjoy the copy and pasting, helps ya get all the truth out there at once.
# Well golly gee thanks buddy.
# >Oh yea, speculatular job. The Israelites rejected Jesus splitting some of the Israelites off into a new religion all together. They have been scattered all across the world, they have undergone tragedy aftet tragedy, attempted genocide and recently a Jewish person on this subreddit referred to the state of Judaism dying as a culture as a "silent holocaust."
# That's right. Just focus on the negative. I could list off all the bad things that has happened to any religion or ideology and it wouldn't help this discussion at all.
# Back to you /u/potzdamn
ex6 = [u'>Because', u"it's", u'not', u'easier', u'to', u'just', u'tell', u'everyone', u'himself?', u'Ok', u'I', 
 u'see', u'how', u'it', u'is.', u'If', u'He', u'just', u'tells', u'everyone', u'Himself', u"He's", u'kinda', 
 u'limiting', u'the', u'choices', u'they', u'have,', u'which', u'is', u'limiting', u'free', u'will,', u'which',
 u'He', u"can't", u'do.', u'Other', u'than', u'that', u'I', u"can't", u'really', u'answer', u'that', u'at', 
 u'this', u'current', u'time.', u'>Murder', u'and', u'adultery?', u'That', u'was', u'not', u'horrific.', 
 u'No', u'shit.', u'Not', u'at', u'that', u'specific', u'point', u'in', u'history,', u'no.', u'>which', 
 u'must', u'have', u'made', u'god', u'look', u'like', u'an', u'extra', u'gigantic', u'asshole', u'for', 
 u'slowly', u'killing', u"David's", u'baby', u'as', u'revenge', u'for', u'his', u'sin.', u'On', u'the', 
 u'contrary.', u'Even', u'though', u'some', u'translations', u'might', u'say', u'that', u'God', u'*caused*',
 u'Davids', u'baby', u'to', u'slowly', u'die,', u'or', u'whatever,', u'the', u'Hebrew', u'can', u'easily',
 u'be', u'translated', u'as', u'"God', u'*predicted*', u'that', u'Davids', u'*actions*', u'would', u'cause',
 u'his', u'baby', u'to', u'die."', u'Again', u'there', u'is', u'a', u'lot', u'of', u'debate', u'on', u'that',
 u'subject', u'but', u"we're", u'talking', u'about', u'prophets.', u'>I', u'believe', u'you.', u'Because',
 u'you', u'said', u'it.', u'Whatever', u'you', u'say', u'is', u'true', u'without', u'any', u'need', u'to', 
 u'actually', u'logically', u'justify', u'it.', u'I', u'can', u'see', u'why', u'you', u'enjoy', u'the', u'copy',
 u'and', u'pasting,', u'helps', u'ya', u'get', u'all', u'the', u'truth', u'out', u'there', u'at', u'once.', 
 u'Well', u'golly', u'gee', u'thanks', u'buddy.', u'>Oh', u'yea,', u'speculatular', u'job.', u'The', 
 u'Israelites', u'rejected', u'Jesus', u'splitting', u'some', u'of', u'the', u'Israelites', u'off', u'into', u'a',
 u'new', u'religion', u'all', u'together.', u'They', u'have', u'been', u'scattered', u'all', u'across', u'the',
 u'world,', u'they', u'have', u'undergone', u'tragedy', u'aftet', u'tragedy,', u'attempted', u'genocide', u'and',
 u'recently', u'a', u'Jewish', u'person', u'on', u'this', u'subreddit', u'referred', u'to', u'the', u'state', 
 u'of', u'Judaism', u'dying', u'as', u'a', u'culture', u'as', u'a', u'"silent', u'holocaust."', u"That's", 
 u'right.', u'Just', u'focus', u'on', u'the', u'negative.', u'I', u'could', u'list', u'off', u'all', u'the', 
 u'bad', u'things', u'that', u'has', u'happened', u'to', u'any', u'religion', u'or', u'ideology', u'and', u'it',
 u"wouldn't", u'help', u'this', u'discussion', u'at', u'all.', u'Back', u'to', u'you', u'/u/potzdamn']

In [ ]:


In [22]:
def seperatePunct(incomingString):
    newstring = incomingString
    newstring = newstring.replace("!"," ! ")
    newstring = newstring.replace("@"," @ ")
    newstring = newstring.replace("#"," # ")
    newstring = newstring.replace("$"," $ ")
    newstring = newstring.replace("%"," % ")
    newstring = newstring.replace("^"," ^ ")
    newstring = newstring.replace("&"," & ")
    newstring = newstring.replace("*"," * ")
    newstring = newstring.replace("("," ( ")
    newstring = newstring.replace(")"," ) ")
    newstring = newstring.replace("+"," + ")
    newstring = newstring.replace("="," = ")
    newstring = newstring.replace("?"," ? ")
    newstring = newstring.replace("\'"," \' ")
    newstring = newstring.replace("\""," \" ")
    newstring = newstring.replace("{"," { ")
    newstring = newstring.replace("}"," } ")
    newstring = newstring.replace("["," [ ")
    newstring = newstring.replace("]"," ] ")
    newstring = newstring.replace("<"," < ")
    newstring = newstring.replace(">"," > ")
    newstring = newstring.replace("~"," ~ ")
    newstring = newstring.replace("`"," ` ")
    newstring = newstring.replace(":"," : ")
    newstring = newstring.replace(";"," ; ")
    newstring = newstring.replace("|"," | ")
    newstring = newstring.replace("\\"," \\ ")
    newstring = newstring.replace("/"," / ")        
    return newstring

In [23]:
def hasNumbers(inputString):
     return any(char.isdigit() for char in inputString)

In [33]:
def mytokenize(wordList):
    
    tokenziedList = []
   
    for word in wordList:
        
        #remove these substrings from the word
        word = word.replace('[deleted]','')
        word = word.replace('&gt','')
            
        #if link, replace with linktag    
        if 'http://' in word:
            tokenziedList.append('LINK_TAG')
            continue
            
        #if reference to subreddit, replace with reddittag    
        if '/r/' in word:
            tokenziedList.append('SUBREDDIT_TAG')
            continue
            
        #if reference to reddit user, replace with usertag    
        if '/u/' in word:
            tokenziedList.append('USER_TAG')
            continue
        
        #if number, replace with numtag
        #m8 is a word, 82255-124] is a number
        if hasNumbers(word) and not any(char.isalpha() for char in word):
            tokenziedList.append('NUM_TAG')
            continue
        
        #seperate puncuations and add to tokenizedList
        newwords = seperatePunct(word).split(" ")
        tokenziedList.extend(newwords)
        
    return tokenziedList

In [34]:
exList = [ex1, ex2, ex3, ex4, ex5, ex6]

for ex in exList:
    tokList = mytokenize(ex)
    print 'orig'
    print ex
    print 'new'
    print tokList
    print ""


orig
[u'[deleted]']
new
[u'']

orig
[u'here', u'u', u'go', u'm8', u'some', u'[help](http://us.whales.org/issues/what-to-do-if-you-find-live-stranded-whale-or-dolphin)']
new
[u'here', u'u', u'go', u'm8', u'some', 'LINK_TAG']

orig
[u'\nUnfortunately,', u'this', u'submission', u'has', u'been', u'removed.', u'Per', u'the', u'sidebar:\n\n*', u'**Rule', u'7', u'-**', u'Do', u'not', u'use', u'"BREAKING"', u'or', u'ALL', u'CAPS', u'in', u'titles.', u'The', u'ALL', u'CAPS', u'and', u"'Breaking'", u'rule', u'is', u'applied', u'even', u'when', u'the', u'actual', u'title', u'of', u'the', u'article', u'is', u'in', u'all', u'caps', u'or', u'contains', u'the', u'word', u"'Breaking'.", u'This', u'rule', u'may', u'be', u'applied', u'to', u'other', u'single', u'word', u'declarative', u'and/or', u'sensational', u'expressions,', u'such', u'as', u"'EXCLUSIVE:'", u'or', u"'HOT:'.", u'Please', u'resubmit', u'your', u'post', u'with', u'a', u'title', u'that', u'is', u'in', u'sentence', u'or', u'title', u'case.', u'[**More', u'Info.**](/r/politics/wiki/rulesandregs#wiki_follow_reddiquette.27s_title_instructions.)', u'*I', u'am', u'a', u'bot,', u'and', u'this', u'action', u'was', u'performed', u'automatically.', u'Please', u'[contact', u'the', u'moderators', u'of', u'this', u'subreddit](/message/compose/?to=/r/politics)', u'if', u'you', u'have', u'any', u'questions', u'or', u'concerns.', u'*']
new
[u'\nUnfortunately,', u'this', u'submission', u'has', u'been', u'removed.', u'Per', u'the', u'sidebar', u':', u'\n\n', u'*', u'', u'', u'*', u'', u'*', u'Rule', 'NUM_TAG', u'-', u'*', u'', u'*', u'', u'Do', u'not', u'use', u'', u'"', u'BREAKING', u'"', u'', u'or', u'ALL', u'CAPS', u'in', u'titles.', u'The', u'ALL', u'CAPS', u'and', u'', u"'", u'Breaking', u"'", u'', u'rule', u'is', u'applied', u'even', u'when', u'the', u'actual', u'title', u'of', u'the', u'article', u'is', u'in', u'all', u'caps', u'or', u'contains', u'the', u'word', u'', u"'", u'Breaking', u"'", u'.', u'This', u'rule', u'may', u'be', u'applied', u'to', u'other', u'single', u'word', u'declarative', u'and', u'/', u'or', u'sensational', u'expressions,', u'such', u'as', u'', u"'", u'EXCLUSIVE', u':', u'', u"'", u'', u'or', u'', u"'", u'HOT', u':', u'', u"'", u'.', u'Please', u'resubmit', u'your', u'post', u'with', u'a', u'title', u'that', u'is', u'in', u'sentence', u'or', u'title', u'case.', u'', u'[', u'', u'*', u'', u'*', u'More', 'SUBREDDIT_TAG', u'', u'*', u'I', u'am', u'a', u'bot,', u'and', u'this', u'action', u'was', u'performed', u'automatically.', u'Please', u'', u'[', u'contact', u'the', u'moderators', u'of', u'this', 'SUBREDDIT_TAG', u'if', u'you', u'have', u'any', u'questions', u'or', u'concerns.', u'', u'*', u'']

orig
[u'I', u'think', u'the', u'other', u'thing', u'we', u'have', u'to', u'consider', u'is', u'the', u'majority', u'population', u'of', u'Reddie:', u'White', u'males', u'between', u'the', u'ages', u'of', u'18-35', u'who', u'have', u'been', u'raised', u'by', u'society,', u'and', u'their', u'parents,', u'to', u'believe', u'that', u'everything', u'should', u'be', u'handed', u'to', u'them,', u'everybody', u'thinks', u"they're", u'special,', u'and', u'everything', u'they', u'think/say', u'has', u'validity.', u'Even', u'if', u'they', u'come', u'from', u'an', u'abusive', u'household,', u'society', u'tells', u'them', u'their', u'abuse', u'is', u'valid,', u'their', u'emotions', u'and', u'insights,', u'also', u'valid.', u'Way', u'more', u'"valid"', u'than', u'anyone', u"else's.", u'So', u'as', u'much', u'as', u'I', u'think', u"it's", u'white', u'people', u'needing', u'to', u'"make', u'progress",', u'I', u'also', u'think', u"it's", u'worthwhile', u'for', u'POC', u'to', u'recognize', u'the', u'narcissism', u'they', u'are', u'socialized', u'into.', u'This', u"isn't", u'to', u'give', u'them', u'pity.', u'I', u'certainly', u'have', u'no', u'sympathy', u'for', u'some', u'jack-off', u'on', u'the', u'internet', u'raging', u'about', u'the', u'fact', u'that', u'he', u"can't", u'get', u'pity', u'sex', u'from', u'a', u'girl', u'he', u'stalked', u'oh', u'and', u'by', u'the', u'way,', u'fuck', u'black', u'people;', u'but', u'it', u'does', u'remind', u'me', u'that', u'I', u'am', u'not', u'dealing', u'with', u'someone', u'who', u'has', u'reasoning', u'skills,', u'the', u'ability', u'to', u'empathize,', u'or', u'the', u'desire', u'to', u'do', u'so.']
new
[u'I', u'think', u'the', u'other', u'thing', u'we', u'have', u'to', u'consider', u'is', u'the', u'majority', u'population', u'of', u'Reddie', u':', u'', u'White', u'males', u'between', u'the', u'ages', u'of', 'NUM_TAG', u'who', u'have', u'been', u'raised', u'by', u'society,', u'and', u'their', u'parents,', u'to', u'believe', u'that', u'everything', u'should', u'be', u'handed', u'to', u'them,', u'everybody', u'thinks', u'they', u"'", u're', u'special,', u'and', u'everything', u'they', u'think', u'/', u'say', u'has', u'validity.', u'Even', u'if', u'they', u'come', u'from', u'an', u'abusive', u'household,', u'society', u'tells', u'them', u'their', u'abuse', u'is', u'valid,', u'their', u'emotions', u'and', u'insights,', u'also', u'valid.', u'Way', u'more', u'', u'"', u'valid', u'"', u'', u'than', u'anyone', u'else', u"'", u's.', u'So', u'as', u'much', u'as', u'I', u'think', u'it', u"'", u's', u'white', u'people', u'needing', u'to', u'', u'"', u'make', u'progress', u'"', u',', u'I', u'also', u'think', u'it', u"'", u's', u'worthwhile', u'for', u'POC', u'to', u'recognize', u'the', u'narcissism', u'they', u'are', u'socialized', u'into.', u'This', u'isn', u"'", u't', u'to', u'give', u'them', u'pity.', u'I', u'certainly', u'have', u'no', u'sympathy', u'for', u'some', u'jack-off', u'on', u'the', u'internet', u'raging', u'about', u'the', u'fact', u'that', u'he', u'can', u"'", u't', u'get', u'pity', u'sex', u'from', u'a', u'girl', u'he', u'stalked', u'oh', u'and', u'by', u'the', u'way,', u'fuck', u'black', u'people', u';', u'', u'but', u'it', u'does', u'remind', u'me', u'that', u'I', u'am', u'not', u'dealing', u'with', u'someone', u'who', u'has', u'reasoning', u'skills,', u'the', u'ability', u'to', u'empathize,', u'or', u'the', u'desire', u'to', u'do', u'so.']

orig
[u'You', u'must', u'have', u'forgotten', u'what', u'sub', u'you', u'are', u'in.', u'Anyone', u'who', u'is', u'not', u'fanatically', u'liberal', u'in', u'every', u'sense', u'of', u'the', u'word', u'is', u'a', u'fucking', u'idiot.', u'If', u'you', u'believe', u'in', u'god', u'you', u'are', u'a', u'fucking', u'idiot.', u'If', u'you', u'are', u'rich,', u'you', u'are', u'a', u'bad', u'person', u'and', u'probably', u"don't", u'deserve', u'it.', u'/r/politics']
new
[u'You', u'must', u'have', u'forgotten', u'what', u'sub', u'you', u'are', u'in.', u'Anyone', u'who', u'is', u'not', u'fanatically', u'liberal', u'in', u'every', u'sense', u'of', u'the', u'word', u'is', u'a', u'fucking', u'idiot.', u'If', u'you', u'believe', u'in', u'god', u'you', u'are', u'a', u'fucking', u'idiot.', u'If', u'you', u'are', u'rich,', u'you', u'are', u'a', u'bad', u'person', u'and', u'probably', u'don', u"'", u't', u'deserve', u'it.', 'SUBREDDIT_TAG']

orig
[u'&gt;Because', u"it's", u'not', u'easier', u'to', u'just', u'tell', u'everyone', u'himself?', u'Ok', u'I', u'see', u'how', u'it', u'is.', u'If', u'He', u'just', u'tells', u'everyone', u'Himself', u"He's", u'kinda', u'limiting', u'the', u'choices', u'they', u'have,', u'which', u'is', u'limiting', u'free', u'will,', u'which', u'He', u"can't", u'do.', u'Other', u'than', u'that', u'I', u"can't", u'really', u'answer', u'that', u'at', u'this', u'current', u'time.', u'&gt;Murder', u'and', u'adultery?', u'That', u'was', u'not', u'horrific.', u'No', u'shit.', u'Not', u'at', u'that', u'specific', u'point', u'in', u'history,', u'no.', u'&gt;which', u'must', u'have', u'made', u'god', u'look', u'like', u'an', u'extra', u'gigantic', u'asshole', u'for', u'slowly', u'killing', u"David's", u'baby', u'as', u'revenge', u'for', u'his', u'sin.', u'On', u'the', u'contrary.', u'Even', u'though', u'some', u'translations', u'might', u'say', u'that', u'God', u'*caused*', u'Davids', u'baby', u'to', u'slowly', u'die,', u'or', u'whatever,', u'the', u'Hebrew', u'can', u'easily', u'be', u'translated', u'as', u'"God', u'*predicted*', u'that', u'Davids', u'*actions*', u'would', u'cause', u'his', u'baby', u'to', u'die."', u'Again', u'there', u'is', u'a', u'lot', u'of', u'debate', u'on', u'that', u'subject', u'but', u"we're", u'talking', u'about', u'prophets.', u'&gt;I', u'believe', u'you.', u'Because', u'you', u'said', u'it.', u'Whatever', u'you', u'say', u'is', u'true', u'without', u'any', u'need', u'to', u'actually', u'logically', u'justify', u'it.', u'I', u'can', u'see', u'why', u'you', u'enjoy', u'the', u'copy', u'and', u'pasting,', u'helps', u'ya', u'get', u'all', u'the', u'truth', u'out', u'there', u'at', u'once.', u'Well', u'golly', u'gee', u'thanks', u'buddy.', u'&gt;Oh', u'yea,', u'speculatular', u'job.', u'The', u'Israelites', u'rejected', u'Jesus', u'splitting', u'some', u'of', u'the', u'Israelites', u'off', u'into', u'a', u'new', u'religion', u'all', u'together.', u'They', u'have', u'been', u'scattered', u'all', u'across', u'the', u'world,', u'they', u'have', u'undergone', u'tragedy', u'aftet', u'tragedy,', u'attempted', u'genocide', u'and', u'recently', u'a', u'Jewish', u'person', u'on', u'this', u'subreddit', u'referred', u'to', u'the', u'state', u'of', u'Judaism', u'dying', u'as', u'a', u'culture', u'as', u'a', u'"silent', u'holocaust."', u"That's", u'right.', u'Just', u'focus', u'on', u'the', u'negative.', u'I', u'could', u'list', u'off', u'all', u'the', u'bad', u'things', u'that', u'has', u'happened', u'to', u'any', u'religion', u'or', u'ideology', u'and', u'it', u"wouldn't", u'help', u'this', u'discussion', u'at', u'all.', u'Back', u'to', u'you', u'/u/potzdamn']
new
[u'', u';', u'Because', u'it', u"'", u's', u'not', u'easier', u'to', u'just', u'tell', u'everyone', u'himself', u'?', u'', u'Ok', u'I', u'see', u'how', u'it', u'is.', u'If', u'He', u'just', u'tells', u'everyone', u'Himself', u'He', u"'", u's', u'kinda', u'limiting', u'the', u'choices', u'they', u'have,', u'which', u'is', u'limiting', u'free', u'will,', u'which', u'He', u'can', u"'", u't', u'do.', u'Other', u'than', u'that', u'I', u'can', u"'", u't', u'really', u'answer', u'that', u'at', u'this', u'current', u'time.', u'', u';', u'Murder', u'and', u'adultery', u'?', u'', u'That', u'was', u'not', u'horrific.', u'No', u'shit.', u'Not', u'at', u'that', u'specific', u'point', u'in', u'history,', u'no.', u'', u';', u'which', u'must', u'have', u'made', u'god', u'look', u'like', u'an', u'extra', u'gigantic', u'asshole', u'for', u'slowly', u'killing', u'David', u"'", u's', u'baby', u'as', u'revenge', u'for', u'his', u'sin.', u'On', u'the', u'contrary.', u'Even', u'though', u'some', u'translations', u'might', u'say', u'that', u'God', u'', u'*', u'caused', u'*', u'', u'Davids', u'baby', u'to', u'slowly', u'die,', u'or', u'whatever,', u'the', u'Hebrew', u'can', u'easily', u'be', u'translated', u'as', u'', u'"', u'God', u'', u'*', u'predicted', u'*', u'', u'that', u'Davids', u'', u'*', u'actions', u'*', u'', u'would', u'cause', u'his', u'baby', u'to', u'die.', u'"', u'', u'Again', u'there', u'is', u'a', u'lot', u'of', u'debate', u'on', u'that', u'subject', u'but', u'we', u"'", u're', u'talking', u'about', u'prophets.', u'', u';', u'I', u'believe', u'you.', u'Because', u'you', u'said', u'it.', u'Whatever', u'you', u'say', u'is', u'true', u'without', u'any', u'need', u'to', u'actually', u'logically', u'justify', u'it.', u'I', u'can', u'see', u'why', u'you', u'enjoy', u'the', u'copy', u'and', u'pasting,', u'helps', u'ya', u'get', u'all', u'the', u'truth', u'out', u'there', u'at', u'once.', u'Well', u'golly', u'gee', u'thanks', u'buddy.', u'', u';', u'Oh', u'yea,', u'speculatular', u'job.', u'The', u'Israelites', u'rejected', u'Jesus', u'splitting', u'some', u'of', u'the', u'Israelites', u'off', u'into', u'a', u'new', u'religion', u'all', u'together.', u'They', u'have', u'been', u'scattered', u'all', u'across', u'the', u'world,', u'they', u'have', u'undergone', u'tragedy', u'aftet', u'tragedy,', u'attempted', u'genocide', u'and', u'recently', u'a', u'Jewish', u'person', u'on', u'this', u'subreddit', u'referred', u'to', u'the', u'state', u'of', u'Judaism', u'dying', u'as', u'a', u'culture', u'as', u'a', u'', u'"', u'silent', u'holocaust.', u'"', u'', u'That', u"'", u's', u'right.', u'Just', u'focus', u'on', u'the', u'negative.', u'I', u'could', u'list', u'off', u'all', u'the', u'bad', u'things', u'that', u'has', u'happened', u'to', u'any', u'religion', u'or', u'ideology', u'and', u'it', u'wouldn', u"'", u't', u'help', u'this', u'discussion', u'at', u'all.', u'Back', u'to', u'you', 'USER_TAG']


In [ ]: