Select out daily popular topics

Objective: For each day, select out daily popular topic by analyzing high frequency terms in news titles of that day.

Last modified: 2017-10-15

Roadmap

  1. Build news title docs for each day
  2. Find out high frequency words in each news title doc
  3. Write out to csv files for manually inspection
  4. Filter out news related to manually selected topics and re-produce csv files

Steps


In [1]:
"""
Initialization
"""

'''
Standard modules
'''
import os
import collections
from pprint import pprint

'''
Analysis modules
'''
import pandas as pd
import nltk


'''
Custom modules
'''
import config
import utilities

'''
Misc
'''
nb_name = '20171002-daheng-select_daily_popular_topics'

news_period_title_docs_pkl = os.path.join(config.TMP_DIR, '{}-{}'.format(nb_name, 'news-period-title_docs.sr.pkl'))
news_title_docs_high_freq_words_df_pkl = os.path.join(config.TMP_DIR, '{}-{}'.format(nb_name, 'news-title_docs-high_freq_words.df.pkl'))

Build news title docs for each day


In [2]:
if 1 == 1:
    '''
    Load in pickle for news data over selected period.
    '''
    news_period_df = pd.read_pickle(config.NEWS_PERIOD_DF_PKL)

In [3]:
"""
Print any single news title
"""
news_period_df.loc[3, 'news_title']


Out[3]:
'At least 4 dead in attack in Kabul, official says'

In [4]:
"""
Print complete news titles
"""
with pd.option_context('display.max_colwidth', 100):
    display(news_period_df[['news_collected_time', 'news_title']])


news_collected_time news_title
0 2014-11-18 Missouri's Nixon Declares State of Emergency Awaiting Grand Jury
1 2014-11-18 PEOPLE: Bill Cosby. Charles Manson, Solange Knowles and more!
2 2014-11-18 Ebola patient who died had received ZMapp late in his treatment
3 2014-11-18 At least 4 dead in attack in Kabul, official says
4 2014-11-18 Australia will not be at periphery of India's vision: Modi
5 2014-11-18 FBI: Violence could follow Ferguson indictment decision
6 2014-11-18 Four Killed in Palestinian Attack at Jerusalem Synagogue
7 2014-11-18 Mass murderer Charles Manson issued marriage license, may get hitched next ...
8 2014-11-18 News Guide: Texas' latest history textbook tussle
9 2014-11-18 Abdul-Rahman Kassig's parents mourn 'beloved son'
10 2014-11-18 Obama orders full review of US hostage policy
11 2014-11-18 Homeless Children in US: A parent-to-parent approach to help kids (+video)
12 2014-11-18 Alleged Bill Cosby victim has connection to Colorado
13 2014-11-18 Church of England approves women bishops
14 2014-11-18 Uber executive wants to dig into personal lives, discredit journalists who cover ...
15 2014-11-18 Suicide blast kills two at Kabul's foreign compound
16 2014-11-18 Answers to questions about the Ferguson grand jury
17 2014-11-18 Four Killed in Jerusalem Synagogue Complex
18 2014-11-18 Source: Charles Manson, fiance get marriage license
19 2014-11-18 Suicide Attack in Afghan Capital Kills 2
20 2014-11-18 Hong Kong Protesters Greet Court Officials With Indifference
21 2014-11-18 Europeans have prominent role in beheading video
22 2014-11-18 Deals Heat Up for Lawyers Like It's 1998: Business of Law
23 2014-11-18 Cupich set to become 9th archbishop of Chicago
24 2014-11-18 4 Israelis, 2 Palestinians killed in synagogue attack, Israeli police say
25 2014-11-18 Missouri Gov. Jay Nixon Declares State Of Emergency Ahead Of Grand Jury Decision
26 2014-11-18 French National Identified In Islamic State Beheading Video
27 2014-11-18 Suicide bombing near coalition base in Kabul kills 2 security officers
28 2014-11-18 Baseball notes, Nov. 17: Stanton gets record $325 million deal
29 2014-11-18 Surgeon dies of Ebola at Nebraska hospital after contracting disease in Sierra ...
... ... ...
38213 2015-04-14 The enduring images of Abraham Lincoln
38214 2015-04-14 Senate Approves a Bill on Changes to Medicare
38215 2015-04-14 Hillary Clinton is going after Wall Street
38216 2015-04-14 Lincoln's Assassination: 150 Years Later
38217 2015-04-14 SpaceX Launches Cargo Capsule, Fails to Nail Rocket Landing
38218 2015-04-14 UPDATE 4-Shooting at NC community college investigated as possible hate crime
38219 2015-04-14 Hillary's folksy, populist re-entry
38220 2015-04-14 The man who created the Lincoln we know
38221 2015-04-14 US promises stricter regulation on private security firms after Blackwater verdict
38222 2015-04-14 Cuba praises 'fair' US pledge on terrorism list
38223 2015-04-14 SEE IT: Former NYPD cop disarms gunman by ramming with police cruiser to ...
38224 2015-04-14 Jury Selected In Colorado Movie Theater Shooting Trial
38225 2015-04-14 The American Register – USS Oklahoma crew members remains to be exhumed
38226 2015-04-14 Killing investigated as hate crime; accused man makes allegations in court
38227 2015-04-14 April 14, 2015 in Falcon 9: Falcon 9 successfully launches, descends to off ...
38228 2015-04-14 Watch Arizona Police Car Ram Suspect, Ending Day-Long Crime Spree
38229 2015-04-14 USS Oklahoma sailors, Marines to be exhumed
38230 2015-04-14 Quintana leads White Sox to third straight victory
38231 2015-04-14 Senate approves bill changing how Medicare pays doctors
38232 2015-04-14 150 years ago, Abraham Lincoln was shot. Historians still argue over what happened next.
38233 2015-04-14 Dashboard camera shows Arizona police officer hitting armed suspect with cruiser
38234 2015-04-14 SCOTUS, Gov. Jay Nixon allows execution to go forward
38235 2015-04-14 SpaceX launches space station groceries, espresso maker
38236 2015-04-14 Cuba gave US assurances it will not support terrorism in future: US officials
38237 2015-04-14 Unknown USS Oklahoma sailors to be disinterred for possible identification
38238 2015-04-14 Events, exhibits mark 150 years since President Abraham Lincoln assassination
38239 2015-04-14 Attorneys ask US Supreme Court to halt Missouri execution
38240 2015-04-14 Cuba welcomes US move to drop island from terror list
38241 2015-04-14 Actress urges second opinion in cancer diagnosis
38242 2015-04-14 Best Boss Ever Aims to Raise Minimum Worker Pay to $70K per Year

37286 rows × 2 columns


In [5]:
"""
Group news by day of news_collected_time and concatenate news_titles
"""
news_titles_sr = news_period_df.resample('D', on='news_collected_time')['news_title'].apply(lambda x: '\n'.join(x))

In [6]:
"""
Print any single news title doc
"""
print(news_titles_sr.iloc[0])
# news_titles_sr.loc['2015-01-01']


Missouri's Nixon Declares State of Emergency Awaiting Grand Jury
PEOPLE: Bill Cosby. Charles Manson, Solange Knowles and more!
Ebola patient who died had received ZMapp late in his treatment
At least 4 dead in attack in Kabul, official says
Australia will not be at periphery of India's vision: Modi
FBI: Violence could follow Ferguson indictment decision
Four Killed in Palestinian Attack at Jerusalem Synagogue
Mass murderer Charles Manson issued marriage license, may get hitched next ...
News Guide: Texas' latest history textbook tussle
Abdul-Rahman Kassig's parents mourn 'beloved son'
Obama orders full review of US hostage policy
Homeless Children in US: A parent-to-parent approach to help kids (+video)
Alleged Bill Cosby victim has connection to Colorado
Church of England approves women bishops
Uber executive wants to dig into personal lives, discredit journalists who cover ...
Suicide blast kills two at Kabul's foreign compound
Answers to questions about the Ferguson grand jury
Four Killed in Jerusalem Synagogue Complex
Source: Charles Manson, fiance get marriage license
Suicide Attack in Afghan Capital Kills 2
Hong Kong Protesters Greet Court Officials With Indifference
Europeans have prominent role in beheading video
Deals Heat Up for Lawyers Like It's 1998: Business of Law
Cupich set to become 9th archbishop of Chicago
4 Israelis, 2 Palestinians killed in synagogue attack, Israeli police say
Missouri Gov. Jay Nixon Declares State Of Emergency Ahead Of Grand Jury Decision
French National Identified In Islamic State Beheading Video
Suicide bombing near coalition base in Kabul kills 2 security officers
Baseball notes, Nov. 17: Stanton gets record $325 million deal
Surgeon dies of Ebola at Nebraska hospital after contracting disease in Sierra ...
Some Barricades Cleared From HK Protest Site
Pakistan ranks third among terror-hit countries
Gunfight at godman Rampal's ashram in Hisar, devotees take on cops
AP Exclusive: Charles Manson gets marriage license
India sixth worst affected country by terrorism in 2013: Report
Hong Kong authorities clear part of Admiralty protest site
Should Uber Fire Exec Who Suggested Investigating Reporters' Personal Lives?
'Cruel murder': Netanyahu, Kerry denounce terror attack on Jerusalem synagogue
Missouri Governor Activates National Guard Ahead of Ferguson Grand Jury Ruling
Taliban Suicide Attacker Kills 4 in Kabul
'Bill Cosby 77' Will Still Premiere on Netflix
Uber executive suggests 'digging up dirt' on media critics: BuzzFeed
Indian PM Modi Urges Greater Security, Economic Ties With Australia
Study Finds Alternative to Anti-Cholesterol Drug
Four dead in suspected Palestinian attack on Jerusalem synagogue
Answers to questions about the Ferguson grand jury
Charles Manson reluctantly applies for marriage license to wed girlfriend
UPDATE 3-Truck bomb kills two in attack on foreign base in Kabul
All 50 States Face Winter Whack; 5 Feet of Snow Forecast Near Buffalo
Deals Heat Up for Lawyers Like It's 1998: Business of Law
Israel: 'We will respond with a heavy hand' after synagogue attack kills 4
Missouri governor declares state of emergency in Ferguson, St. Louis region
Charles Manson reluctantly applies for marriage license to wed girlfriend
Supporters of oil pipeline scramble for last vote
Police Storm Ashram in India in Search of Guru
Modi, Abbott agree on closer cooperation on security and trade
Missouri governor declares state of emergency ahead of ruling on Ferguson shooting
Keystone Vote May Be Too Late to Help Democrat Hold Senate Seat
Deaths caused by terrorism rises by 61 percent, report shows
Truck bomb hits foreign base in Afghan capital, kills two
Toyota bets on hydrogen fuel cell technology
Followers of wanted Indian guru hold out against police
Uber exec proposed publishing journalists' personal secrets to fight bad press
Prime Minister Narendra Modi in Australia Parliament.
Can a Keystone pipeline vote help Mary Landrieu?
Small plane crashes into home near Chicago's Midway airport (PHOTOS)
Whoopi Goldberg defends Bill Cosby over rape allegations: 'I have a lot of ...
Haryana police storm godman Rampal's ashram, main gate damaged: Report
Child Homelessness Reaches Record High
Attackers storm Jerusalem synagogue, killing 4 worshippers
Deaths Linked to Terrorism Are Up 60 Percent, Study Finds
Russia sees no chance of breakthrough on Ukraine in German minister's visit
Small Plane Crashes Into Home Near Chicago's Midway Airport
Suicide bombers attack foreigner compound in Afghan capital, killing 2
Why liberals are turning on Bill Cosby over rape allegations
Japan's Abe Calls Early Election to Save His Grand Economic Plan
At Least 6 Killed at Jerusalem Synagogue
National Guard Prepares for More Ferguson Unrest
Global terrorism on rise: Fivefold increase in terror-related deaths since 2000 — RT News
Small Plane Crashes Into Chicago Home
Home Depot profit beats estimates as US job market improves
Indian PM jokingly accuses Tony Abbott of 'shirt-fronting' Australia
As Missouri awaits decision on police shooting, National Guard called in
'Terror' Deaths Up by 37%: Study
Uber rides into new PR storm over digging dirt on hostile press
State Board Mulls New History Textbooks « CBS Dallas / Fort Worth
Cupich set to become 9th archbishop of Chicago
FBI warns Ferguson grand jury decision 'will likely' lead to violence
Charles Manson gets marriage license
Small plane crashes into Chicago home, police say
Supporters Of Keystone Oil Pipeline Scramble For Last Vote - NewsOn6.com - Tulsa, OK - News, Weather, Video and Sports - KOTV.com
Lake-Effect Snow Pummels New York, Closes Thruway
Uber Exec Suggests Spending $1 Million To 'Dig Up Dirt' On Journalists: BuzzFeed
Japan leads world markets higher on stimulus hopes
Mary Landrieu scrambles for 60th vote for Keystone
Plane crashes into Chicago home, but elderly couple survives
Bernie Sanders Has Found His Grassroots Support Base: The Fake News Audience
House Democrats lash out at Nancy Pelosi
Lives lost to terrorism up by 61%, with 18000 dead
What explains the continuing fascination with Charles Manson?
NFL suspends Vikings RB Adrian Peterson without pay for remainder of the ...
Obama orders review of hostage policy - World News
NM Rep. Ben Ray Luján tapped to head Democratic campaign committee
US Producer Prices Rise 0.2 Percent in October
Upholding the Sanctity of Marriage for Charles Manson
NFL suspends Adrian Peterson without pay for at least rest of regular season
Small Cargo Plane Crashes Into Chicago Home
Lake-effect snow snarls Buffalo flights in wake of storm
Uber Just Stuck a Knife in the Republican Party's Heart
Jose Canseco says he's selling detached finger and digit-blasting gun
Jennifer Lawrence and the stars of 'The Hunger Games: Mockingjay Part 1' stun ...
Disappointment Becomes Global-Growth Norm as Japan Contracts
America's Disastrous History of Pipeline Accidents Shows Why the Keystone Vote Matters
AP Exclusive: Charles Manson Plans Prison Wedding
Missouri Gov. Jay Nixon issues state of emergency ahead of Ferguson grand jury decision
Peterson Suspended Without Pay for Rest of Season by NFL
And the 2014 Word of the Year Is…
Everything you need to know about the Bill Cosby sexual assault allegations
Former Slugger Jose Canseco Plans To Put His Finger On EBay
House Democrats re-elect Pelosi as minority leader
Keystone Supporters Hustle to Get 60 Yeses for Senate Vote on Tuesday
14 Questions And Answers About The Ferguson Grand Jury
Chicago plane crashes into home near Midway; 1 dead
Oxford names 'vape' 2014 Word of the Year
Uber Responds to BuzzFeed Report on Journalism Smear Campaign
Obama Orders Review Of U.S. Hostage Policy
Jerusalem Attack: A Look at the Victims
AP Exclusive: Charles Manson gets marriage license
Tie Keystone approval to bigger environmental goals
Myers: With Adrian Peterson ban, NFL commish gets it right
Jonathan Gruber's Obamacare comments, Ferguson grand jury, and more
Uber's Plan to Win Over the Press Backfires
President Orders Review of US Hostage Policies
Israel's 'Lone Wolf' Attacks Show Weapons Threat Hard to Track
GOP Vows to Pass Keystone Later If Bill Fails Now
National Guard prepares for more Ferguson unrest
Uber Exec in Hot Water After Suggesting Smear Campaign on Journalists
Where does Adrian Peterson's NFL career go from here?
Plane misses elderly couple by '8 inches' after crashing into home near Midway
Obama orders review of US policy on hostages
Oxford Dictionaries' 2014 Word of the Year is 'vape'
The new PEANUTS trailer is here And it's everything we ever wanted!!!
Israel's 'Lone Wolf' Attacks Show Weapons Threat Hard to Track
Louisiana's Landrieu Silent at Almost 70% of Energy Hearings
Obama order could protect thousands of illegal immigrants in Md. Va., report says
Browns waive running back Ben Tate
Bob Marley Named As Face of Global Marijuana Brand
Keystone backers scramble for last vote on bill, Boehner warns Obama against veto
NFL suspends Adrian Peterson for remainder of 2014 season
House Democrats Re-elect Pelosi as Minority Leader
Oxford chooses 'vape' as its 2014 Word of the Year
Obama Orders Review of Hostage Policy
Four rabbis killed in Jerusalem synagogue terror attack
Keystone XL chances dim in Senate as King says 'no'
What Charles Manson's Future Mother-in-Law Thinks About Wedding
Report: Alleged Officer Warns Ferguson 'If You Do Not Have a Gun, Get One'
Tech world calls out Uber for "thuggish" behavior
Liberals oppose Himes in House Democratic race
President Obama Orders Full Review of Hostage Negotiation Policy
Charles Manson Set to Tie the Knot With 26-Year-Old Woman
Ferguson Activists Prepare Havens for Post-Decision Protests
Toyota aims to replicate Prius success with fuel cell Mirai
Official Bob Marley Marijuana Is Coming
Three Americans Among Four Rabbis 'Slaughtered' in Jerusalem Synagogue
Senator Landrieu's Hail Mary goes beyond Keystone XL pipeline
Charles Manson fan insists she will marry the 80-year-old murderer
Uber Draws Fire After Executive Suggests Investigating Reporters
Earnest: "Old" Gruber Videos "Are Not Views That Are Shared By Anybody At The White House"
Crime and Inept Punishment: Sheriff Roger Goodell Is Barney Fife Once More
Vape is Oxford Dictionaries' Word Of The Year
Putin says US wants to subdue Russia
Top Republican floats new attack plan for Obama's immigration action
Dear relatives not in SoCal: We really don't miss your crazy snowstorms
Cost to Treat Ebola: $1 Million For Two Patients
Fla sees big rise of residents in US illegally
Uber can't sweep exec's revenge campaign under the car mat
The NFL Suspends Adrian Peterson, and the Sponsors Stay Quiet
Lake-effect snow pummels New York, closes Thruway
East Coast popular for immigrants in US illegally
'Vape' is English Word of the Year for 2014, Oxford Says
National Guard coming to help dig out from colossal effect storm
Pilot dies after small plane crashes into Chicago home
Who is Hannibal Buress, and why did he call Bill Cosby a "rapist"?
Obama will not change policy against paying ransom for hostages
Winter Whack: Nation Faces Arctic Chill; Almost 6 Feet of Snow Forecast Near Buffalo
Ferguson dilemma: Was calling up National Guard the right move?
Democrats Re-Elect Nancy Pelosi As House Minority Leader Amid Criticism Over 2014 Midterm Elections
Can Uber afford to have this many enemies?
Synagogue attack: Netanyahu vow in 'battle for Jerusalem'
Vape is word of the year for 2014
Louisiana Senate Seat Is Real Reward in Keystone Pipeline Vote
What May Happen to Officer Darren Wilson After Ferguson Grand Jury Decision
Bill Cosby hunkers down as scandal rages
Chicago plane crashes into home near Midway, pilot killed
NY agency aids more than 100 snow-stranded drivers
UPDATE 1-Seventh Sierra Leone doctor killed by Ebola -source
For Obama, Executive Action Will Not Be Limited to Immigration
Vape named as Oxford English Dictionary's 2014 word of the year
Gruber frequently visited White House
Here's Everything We Know (and Don't Know) About the Bill Cosby Rape ...
Obama orders review of US hostage policy
Uber CEO Apologies For Exec's 'Terrible' Suggestion That The Company ...
With beheading deaths of Americans, Obama orders review of US response to hostage takings
Bill Cosby Rape Accuser Joan Tarshis Reveals Details Of Horrifying Attack
House Democratic Leaders Hold Caucus Meeting To Elect Leaders For 114th Congress
U-Va. student Hannah Graham's death the result of 'homicidal violence,' officials ...
Adrian Peterson will not return this weekend against Green Bay [updated]
Palestinians kill five in Jerusalem synagogue attack
Keystone's Big Senate Test: A Search For One Vote
Missouri Gov. swears in Ferguson panel ahead of grand jury decision in shooting
LISTEN: Bill Cosby's 1969 riff on drugging women's drinks
Obama orders review of US hostage policy
Uber's vast trove of customer data is ripe for abuse
Senate Narrowly Defeats Keystone XL Bill
Oxford Dictionaries' word of 2014: Have you ever heard of it? - One News
What will Bill Cosby's legacy be?
Obama: 'Nowhere Near Out of the Woods' on Ebola
The Short List: Uber wants to silence journalists; Keystone bill fails; Peterson ...
Keystone Vote Falls Short in Senate
Jerusalem synagogue attack: 'Lone wolf' pattern seen in deadly assault
Western New York Snow Storm Could Set Records
Justice Department Probe Of Ferguson Police Could Spur Broad Change
Janice Dickinson accuses Bill Cosby of sexual assault during 1982 hotel meetup
West Africa "nowhere near out of woods" on Ebola: Obama - Xinhua
Obama orders review of US hostage policy
Tracy Morgan Still Struggling but "Fighting to Get Better" After Brain Injury ...
Will Kim Jong-un face mass crimes prosecution at The Hague? (+video)
Keystone Pipeline Fails to Get Through Senate
Tech lobby to keep tabs on NSA reform votes
Lawyer: Tracy Morgan Still Struggling With Severe Brain Injury
UPDATE 1-Seventh Sierra Leone doctor killed by Ebola -source
Quinn: Uber grapples with its aggressive image
Senate Narrowly Defeats Keystone XL Pipeline
Senate Republicans Block Sweeping Overhaul of NSA Program
Obama orders review of hostage policy
Ebola Researchers Race to Slow Epidemic
Sarah Lacy on Uber: I'm doing everything I can to keep my family safe
Police nab man sought after fatal NYC subway shove
UN panel calls for N.Korea referral to international court
Senate defeats Keystone XL pipeline
Middle East|Jewish Victims, All From One Jerusalem Street, Were a ...
Obama orders review of the policy on terrorist-related hostage cases
150 cars snowbound in early winter storm
Uber's plot to spy on reporter is latest controversy
UN Rights Committee Urges Court Referral for North Korea
Death of Virginia college student ruled a homicide
After Jerusalem attack, Netanyahu hopes 'PR porn' will win support abroad - Diplomacy and Defense Israel News
In all 50 states, it's below freezing in at least one spot
Uber CEO Apologizes For Exec's 'Terrible' Suggestion That The Company Investigate Journalists
Ebola crisis: Seventh Sierra Leone doctor dies from virus
United Nations Urges North Korea Prosecutions
Stupidity reconsidered
Ryan to chair tax panel, a possible 2016 platform
Senate fails to advance legislation on NSA reform
Uber executive stirs up privacy controversy
Janice Dickinson Says She Was Sexually Assaulted By Bill Cosby
UVA student Hannah Graham died from 'homicidal violence': medical examiner
Police nab man sought after fatal NYC subway shove
Ryan to Chair Tax Panel, a Possible 2016 Platform
In latest College Football Playoff rankings, Alabama rolls straight to the No. 1 spot
Senate Fails to Advance NSA Data Collection Overhaul Legislation
Storm blamed for at least 4 deaths in upstate New York
Report: Janice Dickinson accuses Bill Cosby of rape
Tracy Morgan Still Battling With Brain Injury
UN calls for probe of North Korea 'crimes against humanity'
Police question man in subway shoving death
Keystone Vote Unlikely to Change Odds for Mary Landrieu
Senate Republicans block bill: NSA will continue monitoring your calls
TV host Janice Dickinson latest Cosby accuser
Lawyer: Comedian Tracy Morgan Suffered Traumatic Brain Injury In NJ Tpke Crash
Video seems to show Ferguson officer in confrontation
North Korea: UN moves closer to ICC human rights probe
Defective Takata Airbag Grows Into Global Problem for Manufacturer
Report: Va. student's death was homicide
Winter Whack: Nation Faces Arctic Chill; 6 Feet of Snow Hit Buffalo Area
Fail Mary: Senate rejects Keystone bill
Palestinians kill five in Jerusalem synagogue attack
Attorney says actor Tracy Morgan struggling after crash: report
Court clears way for gay marriage in South Carolina
Subway Motorman Describes Deadly Train Push
Federal highway safety agency demands recall of cars with Takata air bags
North Korea reacts angrily after UN votes to probe 'crimes against humanity'
Early winter pummels much of country, strands motorists, emergency vehicles
Palestinians kill five in Jerusalem synagogue attack
Attorney says actor Tracy Morgan struggling after crash -report
UPDATE 3-US auto regulator seeks nationwide recall of Takata air bags
Death of Virginia college student ruled a homicide
Adrian Peterson May Be Suspended, But He's Unlikely To Lose Any Pay

In [7]:
"""
Print all news title docs
"""
with pd.option_context('display.max_colwidth', 130):
    print(news_titles_sr)


news_collected_time
2014-11-18    Missouri's Nixon Declares State of Emergency Awaiting Grand Jury\nPEOPLE: Bill Cosby. Charles Manson, Solange Knowles and more...
2014-11-19    Early winter pummels much of country, strands motorists, emergency vehicles\nAt the site of Jerusalem terror attack, no calls ...
2014-11-20    Americans brace for more icy temperatures and snow as ferocious storms linger\nREFILE-UPDATE 5-NBC, Netflix cancel Bill Cosby'...
2014-11-21    Obama unveils actions to spare some illegal immigrants\nTears, smiles in Nevada over US immigration reform some call bitterswe...
2014-11-22    Activists Rush to Help People Use Obama Immigration Plan\nOfficial: Ferguson grand jury still meeting\nFamily of NYC man kille...
2014-11-23    Mike Brown's Mom Urges Ferguson Protesters To Remain Peaceful\n6 things to watch for this holiday shopping season\nWork begins...
2014-11-24    Obama: Americans want 'new car smell' in 2016\nFormer DC Mayor Marion Barry Dies At 78 « CBS Baltimore\nOne Direction, Katy Pe...
2014-11-25    Could Obama choose a woman as next Defense secretary? One name tops list. (+video)\nWith No Immediate Prospect of Sanctions Re...
2014-11-26    Mississippi same-sex marriage ban overturned\nRain, snow could mess up plans for Thanksgiving travel across the nation\nLIVE: ...
2014-11-27    Ferguson shooting: Governor 'rejects calls for second jury'\nSpecial forces free eight hostages from Al-Qaeda in Yemen\nThanks...
2014-11-28    The Most Noteworthy 2014 Black Friday Deals\nFerguson Celebrates Thanksgiving Amidst Turmoil\nUK embassy car attacked in Kabul...
2014-11-29    'I pay you,' protesters chant at authorities as tensions return to Ferguson streets\nSource: 2 shot, 1 fatally, in Nordstrom o...
2014-11-30    Teen missing for four years found alive, hidden behind wall near Atlanta\nFormer New York Gov. Mario Cuomo, 82, Hospitalized «...
2014-12-01    Hong Kong Protests Close Down Government\nFewer shoppers and a decline in spending during Black Friday weekend\nPastors Join C...
2014-12-02    St. Louis Rams, Police Disagree Over 'Apology' for Players' Ferguson Gesture\nCongressional Aide Who Blasted Obama Daughters O...
2014-12-03    Another alleged Cosby victim claims he raped her at 15\nUS and Cuba Working On Solution to Free American Alan Gross From Cuban...
2014-12-04    Denmark world's least corrupt country: TI\nOpen doors to those displaced by Ruby, CBCP president asks churches, schools\nTexas...
2014-12-05    Protesters Swarm NYC Over Eric Garner Death For Second Night\nRussian historians left baffled by Putin's Crimean claims\nSuper...
2014-12-06    NASA's Orion Conquers Orbital Test as US Budget Debate Looms\nFour Injured in Michigan Amtrak Stabbing\nSwiss hostages escapes...
2014-12-07    NYPD Officer Daniel Pantaleo faces wrongful arrest lawsuits\nHostage Rescues Called Worth Trying Despite Many Failures\nRelief...
2014-12-08    Dire warning over pending released of CIA torture report\n10 Things to Know for Monday\nTyphoon Hagupit Weakens but the Risk o...
2014-12-09    'Unconscionable': Top Republicans lash out ahead of release of CIA report\n6 dead after plane crashes into Maryland home near ...
2014-12-10    Congress Deal to Avoid Shutdown Includes Victory for Big Banks\nNY police promise to rebuild trust as protests spread\nDuke an...
2014-12-11    Burning death inquiry eyes woman's last hours\nTIME names 'Person of Year,' Ebola survivor reacts\nUkraine urges Russia to rem...
2014-12-12    US House narrowly passes spending bill, averts government shutdown\nCenturies-old time capsule removed from State House\nUkrai...
2014-12-13    Tornado, mudslides triggered by powerful California storm\n8 dead, 100 missing in landslide in Indonesia\nMajor storm sweeps t...
2014-12-14    Thousands March Across Nation to Protest Police Killings of Black Men\n20 dead, 88 missing in Indonesian landslide\nJeb Bush t...
2014-12-15                                                                                                                                     
2014-12-16    Taliban Besiege Pakistan School, Leaving 145 Dead\nJeb Bush's decision to explore presidential bid scrambles the 2016 GOP fiel...
2014-12-17    Sony under attack from hackers and ex-employees\nFederal judge: Obama immigration actions 'unconstitutional'\nThe Senate Just ...
                                                                            ...                                                                
2015-03-16    Eccentric Durst arrested, says on tape, 'killed them all'\nMass protests present big challenge to Brazil's Rousseff\nVanuatu l...
2015-03-17    Relief Crews Try to Reach Cyclone Victims in Vanuatu\nWade scores 32 as Heat top Cavs\nFor friends of Susan Berman, Durst's ar...
2015-03-18    Netanyahu Pulls Ahead of Main Challenger Herzog in Israeli Elections\nPSU Fraternity Suspended\nMissouri executed an inmate wh...
2015-03-19    Netanyahu Starts Search for Partners After Election Win\nTunisian Parliament Calls Day of Solidarity After Deadly Attack\nSecr...
2015-03-20    Islamic State responsible for Tunisia museum attack\nUS will 're-assess' options after Netanyahu win: Obama\nEmails reveal WHO...
2015-03-21    Robert Durst lawyers: release him, you won't find anything\nUS sets first fracking rules since process fueled gas boom\nMan fo...
2015-03-22    Western powers stress unity in Iran talks, 'won't do bad deal'\nPresident Obama to 'reconsider' Israel relationship\nOrthodox ...
2015-03-23                                                                                                                                     
2015-03-24                                                                                                                                     
2015-03-25                                                                                                                                     
2015-03-26                                                                                                                                     
2015-03-27                                                                                                                                     
2015-03-28                                                                                                                                     
2015-03-29    Germanwings co-pilot examined for vision problems before fatal flight, reports say\nAs deadline looms, Iran nuke talks take on...
2015-03-30    Comicios regionales dan espacio político a la oposición boliviana\nQuedan varados en lo alto de una montaña rusa en Nueva York...
2015-03-31    Indianapolis Star protests law on Tuesday's cover\nTwo Former Feds Accused of Stealing Around $1 Million in Silk Road Bust\nMi...
2015-04-01    Buhari takes historic victory in Nigeria\nTalks for framework of Iran nuclear deal continue\nJoni Mitchell Rushed to Los Angel...
2015-04-02    California governor issues mandatory water cuts as snowpack hits record low\nNo breakthrough in Iran nuclear talks after all-n...
2015-04-03    Republicans uneasy over Iran nuke 'deal,' lawmakers demand say on any final agreement\nEaster events coming this weekend in th...
2015-04-04    Obama seeks to persuade Congress on Iran nuclear deal\nSarah Brady, wife of former White House Press Secretary James Br KCTV5\...
2015-04-05    Unusually quick total lunar eclipse dazzles skywatchers\nKenya mourns victims of Garissa al-Shabab attack\nSummit of the Ameri...
2015-04-06    Social media honor Kenya attack victims\n'Door closed' to US Ambassador to Prague\nCERN restarts 'Big Bang' Hadron Collider\nM...
2015-04-07    Duke surges past Wisconsin to win fifth NCAA basketball championship\nWorld's oldest person dies at age 116 in Arkansas\nJohn ...
2015-04-08    6 factions Rand Paul must court in Iowa, activists say\nChicago Mayor Rahm Emanuel wins second term\nSmall plane returning fro...
2015-04-09    Footage of police shooting makes a difference\nWhite House Supports Efforts to Ban 'Conversion Therapy' for Gay and ...\nPaul ...
2015-04-10    With Masters offering chance to clinch career Grand Slam, Rory McIlroy opens with steady 71\nRUDGLEY: Rand Paul and the future...
2015-04-11    Obama seeks to re-engage with Latin America at summit\nWhat Videos Show\nEyes of Texas switch from Ben Crenshaw playing his fi...
2015-04-12    Obama, Castro reach for thaw in relations with historic meeting\nClinton tries again to crack 'highest glass ceiling' with Whi...
2015-04-13    By the Numbers: When America Loved and Hated Hillary\nJordan Spieth's 2015 US Masters win helped by victory in Australia\nEart...
2015-04-14    Flight Bound For LA Locates Man Trapped In Cargo Hold After Emergency ...\nSharpton praises response to fatal SC police shooti...
Freq: D, Name: news_title, Length: 148, dtype: object

In [8]:
"""
Make tmp sr pickle
"""
if 0 == 1:
    news_titles_sr.to_pickle(news_period_title_docs_pkl)

Find out high frequency word in each news title doc


In [2]:
"""
Load tmp sr pickle for news title docs
"""
if 1 == 1:
    news_titles_sr = pd.read_pickle(news_period_title_docs_pkl)

In [3]:
test_str = news_titles_sr.iloc[0]

In [4]:
print(test_str)


Missouri's Nixon Declares State of Emergency Awaiting Grand Jury
PEOPLE: Bill Cosby. Charles Manson, Solange Knowles and more!
Ebola patient who died had received ZMapp late in his treatment
At least 4 dead in attack in Kabul, official says
Australia will not be at periphery of India's vision: Modi
FBI: Violence could follow Ferguson indictment decision
Four Killed in Palestinian Attack at Jerusalem Synagogue
Mass murderer Charles Manson issued marriage license, may get hitched next ...
News Guide: Texas' latest history textbook tussle
Abdul-Rahman Kassig's parents mourn 'beloved son'
Obama orders full review of US hostage policy
Homeless Children in US: A parent-to-parent approach to help kids (+video)
Alleged Bill Cosby victim has connection to Colorado
Church of England approves women bishops
Uber executive wants to dig into personal lives, discredit journalists who cover ...
Suicide blast kills two at Kabul's foreign compound
Answers to questions about the Ferguson grand jury
Four Killed in Jerusalem Synagogue Complex
Source: Charles Manson, fiance get marriage license
Suicide Attack in Afghan Capital Kills 2
Hong Kong Protesters Greet Court Officials With Indifference
Europeans have prominent role in beheading video
Deals Heat Up for Lawyers Like It's 1998: Business of Law
Cupich set to become 9th archbishop of Chicago
4 Israelis, 2 Palestinians killed in synagogue attack, Israeli police say
Missouri Gov. Jay Nixon Declares State Of Emergency Ahead Of Grand Jury Decision
French National Identified In Islamic State Beheading Video
Suicide bombing near coalition base in Kabul kills 2 security officers
Baseball notes, Nov. 17: Stanton gets record $325 million deal
Surgeon dies of Ebola at Nebraska hospital after contracting disease in Sierra ...
Some Barricades Cleared From HK Protest Site
Pakistan ranks third among terror-hit countries
Gunfight at godman Rampal's ashram in Hisar, devotees take on cops
AP Exclusive: Charles Manson gets marriage license
India sixth worst affected country by terrorism in 2013: Report
Hong Kong authorities clear part of Admiralty protest site
Should Uber Fire Exec Who Suggested Investigating Reporters' Personal Lives?
'Cruel murder': Netanyahu, Kerry denounce terror attack on Jerusalem synagogue
Missouri Governor Activates National Guard Ahead of Ferguson Grand Jury Ruling
Taliban Suicide Attacker Kills 4 in Kabul
'Bill Cosby 77' Will Still Premiere on Netflix
Uber executive suggests 'digging up dirt' on media critics: BuzzFeed
Indian PM Modi Urges Greater Security, Economic Ties With Australia
Study Finds Alternative to Anti-Cholesterol Drug
Four dead in suspected Palestinian attack on Jerusalem synagogue
Answers to questions about the Ferguson grand jury
Charles Manson reluctantly applies for marriage license to wed girlfriend
UPDATE 3-Truck bomb kills two in attack on foreign base in Kabul
All 50 States Face Winter Whack; 5 Feet of Snow Forecast Near Buffalo
Deals Heat Up for Lawyers Like It's 1998: Business of Law
Israel: 'We will respond with a heavy hand' after synagogue attack kills 4
Missouri governor declares state of emergency in Ferguson, St. Louis region
Charles Manson reluctantly applies for marriage license to wed girlfriend
Supporters of oil pipeline scramble for last vote
Police Storm Ashram in India in Search of Guru
Modi, Abbott agree on closer cooperation on security and trade
Missouri governor declares state of emergency ahead of ruling on Ferguson shooting
Keystone Vote May Be Too Late to Help Democrat Hold Senate Seat
Deaths caused by terrorism rises by 61 percent, report shows
Truck bomb hits foreign base in Afghan capital, kills two
Toyota bets on hydrogen fuel cell technology
Followers of wanted Indian guru hold out against police
Uber exec proposed publishing journalists' personal secrets to fight bad press
Prime Minister Narendra Modi in Australia Parliament.
Can a Keystone pipeline vote help Mary Landrieu?
Small plane crashes into home near Chicago's Midway airport (PHOTOS)
Whoopi Goldberg defends Bill Cosby over rape allegations: 'I have a lot of ...
Haryana police storm godman Rampal's ashram, main gate damaged: Report
Child Homelessness Reaches Record High
Attackers storm Jerusalem synagogue, killing 4 worshippers
Deaths Linked to Terrorism Are Up 60 Percent, Study Finds
Russia sees no chance of breakthrough on Ukraine in German minister's visit
Small Plane Crashes Into Home Near Chicago's Midway Airport
Suicide bombers attack foreigner compound in Afghan capital, killing 2
Why liberals are turning on Bill Cosby over rape allegations
Japan's Abe Calls Early Election to Save His Grand Economic Plan
At Least 6 Killed at Jerusalem Synagogue
National Guard Prepares for More Ferguson Unrest
Global terrorism on rise: Fivefold increase in terror-related deaths since 2000 — RT News
Small Plane Crashes Into Chicago Home
Home Depot profit beats estimates as US job market improves
Indian PM jokingly accuses Tony Abbott of 'shirt-fronting' Australia
As Missouri awaits decision on police shooting, National Guard called in
'Terror' Deaths Up by 37%: Study
Uber rides into new PR storm over digging dirt on hostile press
State Board Mulls New History Textbooks « CBS Dallas / Fort Worth
Cupich set to become 9th archbishop of Chicago
FBI warns Ferguson grand jury decision 'will likely' lead to violence
Charles Manson gets marriage license
Small plane crashes into Chicago home, police say
Supporters Of Keystone Oil Pipeline Scramble For Last Vote - NewsOn6.com - Tulsa, OK - News, Weather, Video and Sports - KOTV.com
Lake-Effect Snow Pummels New York, Closes Thruway
Uber Exec Suggests Spending $1 Million To 'Dig Up Dirt' On Journalists: BuzzFeed
Japan leads world markets higher on stimulus hopes
Mary Landrieu scrambles for 60th vote for Keystone
Plane crashes into Chicago home, but elderly couple survives
Bernie Sanders Has Found His Grassroots Support Base: The Fake News Audience
House Democrats lash out at Nancy Pelosi
Lives lost to terrorism up by 61%, with 18000 dead
What explains the continuing fascination with Charles Manson?
NFL suspends Vikings RB Adrian Peterson without pay for remainder of the ...
Obama orders review of hostage policy - World News
NM Rep. Ben Ray Luján tapped to head Democratic campaign committee
US Producer Prices Rise 0.2 Percent in October
Upholding the Sanctity of Marriage for Charles Manson
NFL suspends Adrian Peterson without pay for at least rest of regular season
Small Cargo Plane Crashes Into Chicago Home
Lake-effect snow snarls Buffalo flights in wake of storm
Uber Just Stuck a Knife in the Republican Party's Heart
Jose Canseco says he's selling detached finger and digit-blasting gun
Jennifer Lawrence and the stars of 'The Hunger Games: Mockingjay Part 1' stun ...
Disappointment Becomes Global-Growth Norm as Japan Contracts
America's Disastrous History of Pipeline Accidents Shows Why the Keystone Vote Matters
AP Exclusive: Charles Manson Plans Prison Wedding
Missouri Gov. Jay Nixon issues state of emergency ahead of Ferguson grand jury decision
Peterson Suspended Without Pay for Rest of Season by NFL
And the 2014 Word of the Year Is…
Everything you need to know about the Bill Cosby sexual assault allegations
Former Slugger Jose Canseco Plans To Put His Finger On EBay
House Democrats re-elect Pelosi as minority leader
Keystone Supporters Hustle to Get 60 Yeses for Senate Vote on Tuesday
14 Questions And Answers About The Ferguson Grand Jury
Chicago plane crashes into home near Midway; 1 dead
Oxford names 'vape' 2014 Word of the Year
Uber Responds to BuzzFeed Report on Journalism Smear Campaign
Obama Orders Review Of U.S. Hostage Policy
Jerusalem Attack: A Look at the Victims
AP Exclusive: Charles Manson gets marriage license
Tie Keystone approval to bigger environmental goals
Myers: With Adrian Peterson ban, NFL commish gets it right
Jonathan Gruber's Obamacare comments, Ferguson grand jury, and more
Uber's Plan to Win Over the Press Backfires
President Orders Review of US Hostage Policies
Israel's 'Lone Wolf' Attacks Show Weapons Threat Hard to Track
GOP Vows to Pass Keystone Later If Bill Fails Now
National Guard prepares for more Ferguson unrest
Uber Exec in Hot Water After Suggesting Smear Campaign on Journalists
Where does Adrian Peterson's NFL career go from here?
Plane misses elderly couple by '8 inches' after crashing into home near Midway
Obama orders review of US policy on hostages
Oxford Dictionaries' 2014 Word of the Year is 'vape'
The new PEANUTS trailer is here And it's everything we ever wanted!!!
Israel's 'Lone Wolf' Attacks Show Weapons Threat Hard to Track
Louisiana's Landrieu Silent at Almost 70% of Energy Hearings
Obama order could protect thousands of illegal immigrants in Md. Va., report says
Browns waive running back Ben Tate
Bob Marley Named As Face of Global Marijuana Brand
Keystone backers scramble for last vote on bill, Boehner warns Obama against veto
NFL suspends Adrian Peterson for remainder of 2014 season
House Democrats Re-elect Pelosi as Minority Leader
Oxford chooses 'vape' as its 2014 Word of the Year
Obama Orders Review of Hostage Policy
Four rabbis killed in Jerusalem synagogue terror attack
Keystone XL chances dim in Senate as King says 'no'
What Charles Manson's Future Mother-in-Law Thinks About Wedding
Report: Alleged Officer Warns Ferguson 'If You Do Not Have a Gun, Get One'
Tech world calls out Uber for "thuggish" behavior
Liberals oppose Himes in House Democratic race
President Obama Orders Full Review of Hostage Negotiation Policy
Charles Manson Set to Tie the Knot With 26-Year-Old Woman
Ferguson Activists Prepare Havens for Post-Decision Protests
Toyota aims to replicate Prius success with fuel cell Mirai
Official Bob Marley Marijuana Is Coming
Three Americans Among Four Rabbis 'Slaughtered' in Jerusalem Synagogue
Senator Landrieu's Hail Mary goes beyond Keystone XL pipeline
Charles Manson fan insists she will marry the 80-year-old murderer
Uber Draws Fire After Executive Suggests Investigating Reporters
Earnest: "Old" Gruber Videos "Are Not Views That Are Shared By Anybody At The White House"
Crime and Inept Punishment: Sheriff Roger Goodell Is Barney Fife Once More
Vape is Oxford Dictionaries' Word Of The Year
Putin says US wants to subdue Russia
Top Republican floats new attack plan for Obama's immigration action
Dear relatives not in SoCal: We really don't miss your crazy snowstorms
Cost to Treat Ebola: $1 Million For Two Patients
Fla sees big rise of residents in US illegally
Uber can't sweep exec's revenge campaign under the car mat
The NFL Suspends Adrian Peterson, and the Sponsors Stay Quiet
Lake-effect snow pummels New York, closes Thruway
East Coast popular for immigrants in US illegally
'Vape' is English Word of the Year for 2014, Oxford Says
National Guard coming to help dig out from colossal effect storm
Pilot dies after small plane crashes into Chicago home
Who is Hannibal Buress, and why did he call Bill Cosby a "rapist"?
Obama will not change policy against paying ransom for hostages
Winter Whack: Nation Faces Arctic Chill; Almost 6 Feet of Snow Forecast Near Buffalo
Ferguson dilemma: Was calling up National Guard the right move?
Democrats Re-Elect Nancy Pelosi As House Minority Leader Amid Criticism Over 2014 Midterm Elections
Can Uber afford to have this many enemies?
Synagogue attack: Netanyahu vow in 'battle for Jerusalem'
Vape is word of the year for 2014
Louisiana Senate Seat Is Real Reward in Keystone Pipeline Vote
What May Happen to Officer Darren Wilson After Ferguson Grand Jury Decision
Bill Cosby hunkers down as scandal rages
Chicago plane crashes into home near Midway, pilot killed
NY agency aids more than 100 snow-stranded drivers
UPDATE 1-Seventh Sierra Leone doctor killed by Ebola -source
For Obama, Executive Action Will Not Be Limited to Immigration
Vape named as Oxford English Dictionary's 2014 word of the year
Gruber frequently visited White House
Here's Everything We Know (and Don't Know) About the Bill Cosby Rape ...
Obama orders review of US hostage policy
Uber CEO Apologies For Exec's 'Terrible' Suggestion That The Company ...
With beheading deaths of Americans, Obama orders review of US response to hostage takings
Bill Cosby Rape Accuser Joan Tarshis Reveals Details Of Horrifying Attack
House Democratic Leaders Hold Caucus Meeting To Elect Leaders For 114th Congress
U-Va. student Hannah Graham's death the result of 'homicidal violence,' officials ...
Adrian Peterson will not return this weekend against Green Bay [updated]
Palestinians kill five in Jerusalem synagogue attack
Keystone's Big Senate Test: A Search For One Vote
Missouri Gov. swears in Ferguson panel ahead of grand jury decision in shooting
LISTEN: Bill Cosby's 1969 riff on drugging women's drinks
Obama orders review of US hostage policy
Uber's vast trove of customer data is ripe for abuse
Senate Narrowly Defeats Keystone XL Bill
Oxford Dictionaries' word of 2014: Have you ever heard of it? - One News
What will Bill Cosby's legacy be?
Obama: 'Nowhere Near Out of the Woods' on Ebola
The Short List: Uber wants to silence journalists; Keystone bill fails; Peterson ...
Keystone Vote Falls Short in Senate
Jerusalem synagogue attack: 'Lone wolf' pattern seen in deadly assault
Western New York Snow Storm Could Set Records
Justice Department Probe Of Ferguson Police Could Spur Broad Change
Janice Dickinson accuses Bill Cosby of sexual assault during 1982 hotel meetup
West Africa "nowhere near out of woods" on Ebola: Obama - Xinhua
Obama orders review of US hostage policy
Tracy Morgan Still Struggling but "Fighting to Get Better" After Brain Injury ...
Will Kim Jong-un face mass crimes prosecution at The Hague? (+video)
Keystone Pipeline Fails to Get Through Senate
Tech lobby to keep tabs on NSA reform votes
Lawyer: Tracy Morgan Still Struggling With Severe Brain Injury
UPDATE 1-Seventh Sierra Leone doctor killed by Ebola -source
Quinn: Uber grapples with its aggressive image
Senate Narrowly Defeats Keystone XL Pipeline
Senate Republicans Block Sweeping Overhaul of NSA Program
Obama orders review of hostage policy
Ebola Researchers Race to Slow Epidemic
Sarah Lacy on Uber: I'm doing everything I can to keep my family safe
Police nab man sought after fatal NYC subway shove
UN panel calls for N.Korea referral to international court
Senate defeats Keystone XL pipeline
Middle East|Jewish Victims, All From One Jerusalem Street, Were a ...
Obama orders review of the policy on terrorist-related hostage cases
150 cars snowbound in early winter storm
Uber's plot to spy on reporter is latest controversy
UN Rights Committee Urges Court Referral for North Korea
Death of Virginia college student ruled a homicide
After Jerusalem attack, Netanyahu hopes 'PR porn' will win support abroad - Diplomacy and Defense Israel News
In all 50 states, it's below freezing in at least one spot
Uber CEO Apologizes For Exec's 'Terrible' Suggestion That The Company Investigate Journalists
Ebola crisis: Seventh Sierra Leone doctor dies from virus
United Nations Urges North Korea Prosecutions
Stupidity reconsidered
Ryan to chair tax panel, a possible 2016 platform
Senate fails to advance legislation on NSA reform
Uber executive stirs up privacy controversy
Janice Dickinson Says She Was Sexually Assaulted By Bill Cosby
UVA student Hannah Graham died from 'homicidal violence': medical examiner
Police nab man sought after fatal NYC subway shove
Ryan to Chair Tax Panel, a Possible 2016 Platform
In latest College Football Playoff rankings, Alabama rolls straight to the No. 1 spot
Senate Fails to Advance NSA Data Collection Overhaul Legislation
Storm blamed for at least 4 deaths in upstate New York
Report: Janice Dickinson accuses Bill Cosby of rape
Tracy Morgan Still Battling With Brain Injury
UN calls for probe of North Korea 'crimes against humanity'
Police question man in subway shoving death
Keystone Vote Unlikely to Change Odds for Mary Landrieu
Senate Republicans block bill: NSA will continue monitoring your calls
TV host Janice Dickinson latest Cosby accuser
Lawyer: Comedian Tracy Morgan Suffered Traumatic Brain Injury In NJ Tpke Crash
Video seems to show Ferguson officer in confrontation
North Korea: UN moves closer to ICC human rights probe
Defective Takata Airbag Grows Into Global Problem for Manufacturer
Report: Va. student's death was homicide
Winter Whack: Nation Faces Arctic Chill; 6 Feet of Snow Hit Buffalo Area
Fail Mary: Senate rejects Keystone bill
Palestinians kill five in Jerusalem synagogue attack
Attorney says actor Tracy Morgan struggling after crash: report
Court clears way for gay marriage in South Carolina
Subway Motorman Describes Deadly Train Push
Federal highway safety agency demands recall of cars with Takata air bags
North Korea reacts angrily after UN votes to probe 'crimes against humanity'
Early winter pummels much of country, strands motorists, emergency vehicles
Palestinians kill five in Jerusalem synagogue attack
Attorney says actor Tracy Morgan struggling after crash -report
UPDATE 3-US auto regulator seeks nationwide recall of Takata air bags
Death of Virginia college student ruled a homicide
Adrian Peterson May Be Suspended, But He's Unlikely To Lose Any Pay

In [5]:
"""
Check stop-words
"""
stop_words = nltk.corpus.stopwords.words('english')
print(stop_words)
print('Count: {}'.format(len(stop_words)))


['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', 'couldn', 'didn', 'doesn', 'hadn', 'hasn', 'haven', 'isn', 'ma', 'mightn', 'mustn', 'needn', 'shan', 'shouldn', 'wasn', 'weren', 'won', 'wouldn']
Count: 153

In [6]:
"""
Pre-processing steps
"""
if 1 == 1:
    tokens = nltk.word_tokenize(test_str)
    
    '''
    Regonize NE in tokens and preserve
    
    Note: performance of NE recognition depends on the pre-trained model provided in nltk package
    '''
    chunks = nltk.ne_chunk(nltk.pos_tag(tokens))
    # Each element of chunks is either a (word, pos) tuple or a Tree() containing the parts of the chunk
    tokens = [chunk[0] if isinstance(chunk, tuple) else ' '.join(node[0] for node in chunk) for chunk in chunks]
    
    '''
    Remove non-alphabetical tokens
    '''
    # tokens = [token for token in tokens if token.isalpha()]
    
    '''
    Remove stop-words
    '''
    tokens = [token for token in tokens if token.lower() not in stop_words]
    
    '''
    Remove misc single punctuation tokens
    '''
    misc_punc_lst = [",", ":", "'", "'s", ";", ".", "?", "(", ")", "..."]
    tokens = [token for token in tokens if token not in misc_punc_lst]
    
    print(tokens)


['Missouri', 'Nixon Declares State', 'Emergency Awaiting Grand Jury', 'PEOPLE', 'Bill Cosby', 'Charles Manson', 'Solange Knowles', '!', 'Ebola', 'patient', 'died', 'received', 'ZMapp', 'late', 'treatment', 'least', '4', 'dead', 'attack', 'Kabul', 'official', 'says', 'Australia', 'periphery', 'India', 'vision', 'Modi FBI', 'Violence', 'could', 'follow', 'Ferguson', 'indictment', 'decision', 'Four', 'Killed', 'Palestinian', 'Attack', 'Jerusalem Synagogue Mass', 'murderer', 'Charles Manson', 'issued', 'marriage', 'license', 'may', 'get', 'hitched', 'next', 'News', 'Guide', 'Texas', 'latest', 'history', 'textbook', 'tussle', 'Abdul-Rahman', 'Kassig', 'parents', 'mourn', "'beloved", "son'", 'Obama', 'orders', 'full', 'review', 'US', 'hostage', 'policy', 'Homeless Children', 'US', 'parent-to-parent', 'approach', 'help', 'kids', '+video', 'Alleged Bill Cosby', 'victim', 'connection', 'Colorado Church', 'England', 'approves', 'women', 'bishops', 'Uber', 'executive', 'wants', 'dig', 'personal', 'lives', 'discredit', 'journalists', 'cover', 'Suicide', 'blast', 'kills', 'two', 'Kabul', 'foreign', 'compound', 'Answers', 'questions', 'Ferguson', 'grand', 'jury', 'Four', 'Killed', 'Jerusalem', 'Synagogue', 'Complex', 'Source', 'Charles Manson', 'fiance', 'get', 'marriage', 'license', 'Suicide Attack', 'Afghan', 'Capital', 'Kills', '2', 'Hong Kong', 'Protesters', 'Greet', 'Court', 'Officials', 'Indifference', 'Europeans', 'prominent', 'role', 'beheading', 'video', 'Deals', 'Heat Up', 'Lawyers', 'Like', '1998', 'Business', 'Law Cupich', 'set', 'become', '9th', 'archbishop', 'Chicago', '4', 'Israelis', '2', 'Palestinians', 'killed', 'synagogue', 'attack', 'Israeli', 'police', 'say', 'Missouri Gov', 'Jay Nixon Declares State', 'Of Emergency Ahead Of Grand Jury', 'Decision', 'French National Identified', 'Islamic', 'State', 'Beheading', 'Video', 'Suicide', 'bombing', 'near', 'coalition', 'base', 'Kabul', 'kills', '2', 'security', 'officers', 'Baseball', 'notes', 'Nov.', '17', 'Stanton', 'gets', 'record', '$', '325', 'million', 'deal', 'Surgeon', 'dies', 'Ebola', 'Nebraska', 'hospital', 'contracting', 'disease', 'Sierra', 'Barricades', 'Cleared', 'HK Protest Site Pakistan', 'ranks', 'third', 'among', 'terror-hit', 'countries', 'Gunfight', 'godman', 'Rampal', 'ashram', 'Hisar', 'devotees', 'take', 'cops', 'AP', 'Exclusive', 'Charles Manson', 'gets', 'marriage', 'license', 'India', 'sixth', 'worst', 'affected', 'country', 'terrorism', '2013', 'Report', 'Hong Kong', 'authorities', 'clear', 'part', 'Admiralty', 'protest', 'site', 'Uber', 'Fire', 'Exec', 'Suggested', 'Investigating', 'Reporters', 'Personal', 'Lives', "'Cruel", 'murder', 'Netanyahu', 'Kerry', 'denounce', 'terror', 'attack', 'Jerusalem', 'synagogue', 'Missouri Governor Activates National Guard Ahead', 'Ferguson Grand Jury Ruling Taliban Suicide Attacker', 'Kills', '4', 'Kabul', "'Bill", 'Cosby', '77', 'Still', 'Premiere', 'Netflix Uber', 'executive', 'suggests', "'digging", 'dirt', 'media', 'critics', 'BuzzFeed Indian', 'PM', 'Modi', 'Urges', 'Greater', 'Security', 'Economic Ties With Australia Study Finds', 'Alternative', 'Anti-Cholesterol', 'Drug', 'Four', 'dead', 'suspected', 'Palestinian', 'attack', 'Jerusalem', 'synagogue', 'Answers', 'questions', 'Ferguson', 'grand', 'jury', 'Charles Manson', 'reluctantly', 'applies', 'marriage', 'license', 'wed', 'girlfriend', 'UPDATE', '3-Truck', 'bomb', 'kills', 'two', 'attack', 'foreign', 'base', 'Kabul', '50', 'States Face', 'Winter', 'Whack', '5', 'Feet', 'Snow Forecast Near Buffalo Deals Heat Up', 'Lawyers', 'Like', '1998', 'Business', 'Law Israel', "'We", 'respond', 'heavy', 'hand', 'synagogue', 'attack', 'kills', '4', 'Missouri', 'governor', 'declares', 'state', 'emergency', 'Ferguson', 'St. Louis', 'region', 'Charles Manson', 'reluctantly', 'applies', 'marriage', 'license', 'wed', 'girlfriend', 'Supporters', 'oil', 'pipeline', 'scramble', 'last', 'vote', 'Police', 'Storm', 'Ashram', 'India', 'Search', 'Guru Modi', 'Abbott', 'agree', 'closer', 'cooperation', 'security', 'trade', 'Missouri', 'governor', 'declares', 'state', 'emergency', 'ahead', 'ruling', 'Ferguson', 'shooting', 'Keystone Vote', 'May', 'Late', 'Help', 'Democrat Hold', 'Senate Seat', 'Deaths', 'caused', 'terrorism', 'rises', '61', 'percent', 'report', 'shows', 'Truck', 'bomb', 'hits', 'foreign', 'base', 'Afghan', 'capital', 'kills', 'two', 'Toyota', 'bets', 'hydrogen', 'fuel', 'cell', 'technology', 'Followers', 'wanted', 'Indian', 'guru', 'hold', 'police', 'Uber', 'exec', 'proposed', 'publishing', 'journalists', 'personal', 'secrets', 'fight', 'bad', 'press', 'Prime', 'Minister', 'Narendra Modi', 'Australia', 'Parliament', 'Keystone', 'pipeline', 'vote', 'help', 'Mary Landrieu', 'Small', 'plane', 'crashes', 'home', 'near', 'Chicago', 'Midway', 'airport', 'PHOTOS', 'Whoopi Goldberg', 'defends', 'Bill Cosby', 'rape', 'allegations', "'I", 'lot', 'Haryana', 'police', 'storm', 'godman', 'Rampal', 'ashram', 'main', 'gate', 'damaged', 'Report', 'Child', 'Homelessness', 'Reaches', 'Record', 'High', 'Attackers', 'storm', 'Jerusalem', 'synagogue', 'killing', '4', 'worshippers', 'Deaths', 'Linked', 'Terrorism Are Up', '60', 'Percent', 'Study Finds Russia', 'sees', 'chance', 'breakthrough', 'Ukraine', 'German', 'minister', 'visit', 'Small Plane Crashes', 'Into Home', 'Near', 'Chicago', 'Midway Airport Suicide', 'bombers', 'attack', 'foreigner', 'compound', 'Afghan', 'capital', 'killing', '2', 'liberals', 'turning', 'Bill Cosby', 'rape', 'allegations', 'Japan', 'Abe Calls', 'Early', 'Election', 'Save', 'Grand', 'Economic', 'Plan', 'Least', '6', 'Killed', 'Jerusalem Synagogue National Guard', 'Prepares', 'Ferguson Unrest Global', 'terrorism', 'rise', 'Fivefold', 'increase', 'terror-related', 'deaths', 'since', '2000', '—', 'RT', 'News', 'Small', 'Plane', 'Crashes Into Chicago Home Home Depot', 'profit', 'beats', 'estimates', 'US', 'job', 'market', 'improves', 'Indian', 'PM', 'jokingly', 'accuses', 'Tony Abbott', "'shirt-fronting", 'Australia', 'Missouri', 'awaits', 'decision', 'police', 'shooting', 'National Guard', 'called', "'Terror", 'Deaths Up', '37', '%', 'Study Uber', 'rides', 'new', 'PR', 'storm', 'digging', 'dirt', 'hostile', 'press', 'State', 'Board Mulls New History Textbooks', '«', 'CBS', 'Dallas', '/', 'Fort', 'Worth', 'Cupich', 'set', 'become', '9th', 'archbishop', 'Chicago', 'FBI', 'warns', 'Ferguson', 'grand', 'jury', 'decision', "'will", 'likely', 'lead', 'violence', 'Charles Manson', 'gets', 'marriage', 'license', 'Small', 'plane', 'crashes', 'Chicago', 'home', 'police', 'say', 'Supporters', 'Keystone Oil Pipeline Scramble For Last Vote', '-', 'NewsOn6.com', '-', 'Tulsa', 'OK', '-', 'News', 'Weather', 'Video', 'Sports', '-', 'KOTV.com', 'Lake-Effect', 'Snow', 'Pummels New York', 'Closes Thruway', 'Uber', 'Exec', 'Suggests', 'Spending', '$', '1', 'Million', "'Dig", 'Dirt', 'Journalists', 'BuzzFeed Japan', 'leads', 'world', 'markets', 'higher', 'stimulus', 'hopes', 'Mary Landrieu', 'scrambles', '60th', 'vote', 'Keystone Plane', 'crashes', 'Chicago', 'home', 'elderly', 'couple', 'survives', 'Bernie Sanders Has', 'Found', 'Grassroots Support', 'Base', 'Fake News Audience', 'House', 'Democrats', 'lash', 'Nancy Pelosi', 'Lives', 'lost', 'terrorism', '61', '%', '18000', 'dead', 'explains', 'continuing', 'fascination', 'Charles Manson', 'NFL', 'suspends', 'Vikings RB Adrian Peterson', 'without', 'pay', 'remainder', 'Obama', 'orders', 'review', 'hostage', 'policy', '-', 'World News', 'NM', 'Rep.', 'Ben', 'Ray', 'Luján', 'tapped', 'head', 'Democratic', 'campaign', 'committee', 'US', 'Producer', 'Prices', 'Rise', '0.2', 'Percent', 'October', 'Upholding', 'Sanctity', 'Marriage', 'Charles Manson', 'NFL', 'suspends', 'Adrian Peterson', 'without', 'pay', 'least', 'rest', 'regular', 'season', 'Small Cargo Plane Crashes Into Chicago Home', 'Lake-effect', 'snow', 'snarls', 'Buffalo', 'flights', 'wake', 'storm', 'Uber', 'Stuck', 'Knife', 'Republican Party', 'Heart Jose Canseco', 'says', 'selling', 'detached', 'finger', 'digit-blasting', 'gun', 'Jennifer Lawrence', 'stars', "'The", 'Hunger', 'Games', 'Mockingjay Part', '1', 'stun', 'Disappointment', 'Becomes', 'Global-Growth', 'Norm', 'Japan', 'Contracts', 'America', 'Disastrous History', 'Pipeline Accidents Shows', 'Keystone Vote Matters', 'AP', 'Exclusive', 'Charles Manson', 'Plans', 'Prison Wedding Missouri', 'Gov', 'Jay Nixon', 'issues', 'state', 'emergency', 'ahead', 'Ferguson', 'grand', 'jury', 'decision', 'Peterson', 'Suspended', 'Without', 'Pay', 'Rest', 'Season', 'NFL', '2014', 'Word', 'Year', 'Is…', 'Everything', 'need', 'know', 'Bill Cosby', 'sexual', 'assault', 'allegations', 'Former', 'Slugger Jose Canseco', 'Plans', 'Put', 'Finger', 'EBay House', 'Democrats', 're-elect', 'Pelosi', 'minority', 'leader', 'Keystone Supporters Hustle', 'Get', '60', 'Yeses', 'Senate Vote', 'Tuesday', '14', 'Questions', 'Answers', 'Ferguson Grand Jury Chicago', 'plane', 'crashes', 'home', 'near', 'Midway', '1', 'dead', 'Oxford', 'names', "'vape", '2014', 'Word', 'Year', 'Uber', 'Responds', 'BuzzFeed Report', 'Journalism Smear Campaign Obama Orders Review Of U.S', 'Hostage', 'Policy Jerusalem', 'Attack', 'Look', 'Victims', 'AP', 'Exclusive', 'Charles Manson', 'gets', 'marriage', 'license', 'Tie Keystone', 'approval', 'bigger', 'environmental', 'goals', 'Myers', 'Adrian Peterson', 'ban', 'NFL', 'commish', 'gets', 'right', 'Jonathan Gruber', 'Obamacare', 'comments', 'Ferguson', 'grand', 'jury', 'Uber', 'Plan', 'Win', 'Press', 'Backfires', 'President', 'Orders Review', 'US', 'Hostage Policies Israel', "'Lone", 'Wolf', 'Attacks Show Weapons Threat Hard', 'Track', 'GOP Vows', 'Pass Keystone', 'Later', 'Bill Fails', 'National Guard', 'prepares', 'Ferguson', 'unrest', 'Uber', 'Exec', 'Hot', 'Water', 'Suggesting', 'Smear Campaign', 'Journalists', 'Adrian Peterson', 'NFL', 'career', 'go', 'Plane', 'misses', 'elderly', 'couple', "'8", 'inches', 'crashing', 'home', 'near', 'Midway', 'Obama', 'orders', 'review', 'US', 'policy', 'hostages', 'Oxford Dictionaries', '2014', 'Word', 'Year', "'vape'", 'new', 'PEANUTS', 'trailer', 'everything', 'ever', 'wanted', '!', '!', '!', 'Israel', "'Lone", 'Wolf', 'Attacks Show Weapons Threat Hard', 'Track Louisiana', 'Landrieu Silent', 'Almost', '70', '%', 'Energy Hearings Obama', 'order', 'could', 'protect', 'thousands', 'illegal', 'immigrants', 'Md', 'Va.', 'report', 'says', 'Browns', 'waive', 'running', 'back', 'Ben Tate Bob Marley Named', 'Face', 'Global Marijuana Brand Keystone', 'backers', 'scramble', 'last', 'vote', 'bill', 'Boehner', 'warns', 'Obama', 'veto', 'NFL', 'suspends', 'Adrian Peterson', 'remainder', '2014', 'season', 'House', 'Democrats', 'Re-elect', 'Pelosi', 'Minority', 'Leader', 'Oxford', 'chooses', "'vape", '2014', 'Word', 'Year Obama Orders Review', 'Hostage Policy Four', 'rabbis', 'killed', 'Jerusalem', 'synagogue', 'terror', 'attack', 'Keystone XL', 'chances', 'dim', 'Senate', 'King', 'says', "'no'", 'Charles Manson', 'Future', 'Mother-in-Law', 'Thinks', 'Wedding', 'Report', 'Alleged', 'Officer', 'Warns', 'Ferguson', "'If", 'Gun', 'Get', "One'", 'Tech', 'world', 'calls', 'Uber', '``', 'thuggish', "''", 'behavior', 'Liberals', 'oppose', 'Himes', 'House', 'Democratic', 'race', 'President', 'Obama Orders Full Review', 'Hostage Negotiation Policy', 'Charles Manson Set', 'Tie', 'Knot With 26-Year-Old', 'Woman', 'Ferguson Activists Prepare Havens', 'Post-Decision', 'Protests', 'Toyota', 'aims', 'replicate', 'Prius', 'success', 'fuel', 'cell', 'Mirai Official Bob Marley Marijuana', 'Coming', 'Three Americans Among Four Rabbis', "'Slaughtered", 'Jerusalem', 'Synagogue', 'Senator', 'Landrieu', 'Hail Mary', 'goes', 'beyond', 'Keystone XL', 'pipeline', 'Charles Manson', 'fan', 'insists', 'marry', '80-year-old', 'murderer', 'Uber', 'Draws', 'Fire', 'Executive', 'Suggests', 'Investigating', 'Reporters', 'Earnest', '``', 'Old', "''", 'Gruber', 'Videos', '``', 'Views', 'Shared', 'Anybody At The', 'White House', "''", 'Crime', 'Inept Punishment', 'Sheriff Roger Goodell Is Barney Fife', 'Vape', 'Oxford Dictionaries', 'Word', 'Year Putin', 'says', 'US', 'wants', 'subdue', 'Russia', 'Top', 'Republican', 'floats', 'new', 'attack', 'plan', 'Obama', 'immigration', 'action', 'Dear', 'relatives', 'SoCal', 'really', "n't", 'miss', 'crazy', 'snowstorms', 'Cost', 'Treat Ebola', '$', '1', 'Million', 'Two', 'Patients', 'Fla', 'sees', 'big', 'rise', 'residents', 'US', 'illegally', 'Uber', 'ca', "n't", 'sweep', 'exec', 'revenge', 'campaign', 'car', 'mat', 'NFL Suspends Adrian Peterson', 'Sponsors Stay Quiet', 'Lake-effect', 'snow', 'pummels', 'New York', 'closes', 'Thruway East Coast', 'popular', 'immigrants', 'US', 'illegally', "'Vape", 'English Word', 'Year', '2014', 'Oxford Says National Guard', 'coming', 'help', 'dig', 'colossal', 'effect', 'storm', 'Pilot', 'dies', 'small', 'plane', 'crashes', 'Chicago', 'home', 'Hannibal Buress', 'call', 'Bill Cosby', '``', 'rapist', "''", 'Obama', 'change', 'policy', 'paying', 'ransom', 'hostages', 'Winter', 'Whack', 'Nation', 'Faces', 'Arctic Chill', 'Almost', '6', 'Feet', 'Snow Forecast Near Buffalo Ferguson', 'dilemma', 'calling', 'National Guard', 'right', 'move', 'Democrats', 'Re-Elect', 'Nancy Pelosi', 'House', 'Minority', 'Leader', 'Amid Criticism', '2014', 'Midterm', 'Elections', 'Uber', 'afford', 'many', 'enemies', 'Synagogue', 'attack', 'Netanyahu', 'vow', "'battle", "Jerusalem'", 'Vape', 'word', 'year', '2014', 'Louisiana Senate Seat', 'Real Reward', 'Keystone', 'Pipeline', 'Vote', 'May', 'Happen', 'Officer Darren Wilson', 'Ferguson Grand Jury', 'Decision', 'Bill Cosby', 'hunkers', 'scandal', 'rages', 'Chicago', 'plane', 'crashes', 'home', 'near', 'Midway', 'pilot', 'killed', 'NY', 'agency', 'aids', '100', 'snow-stranded', 'drivers', 'UPDATE', '1-Seventh', 'Sierra Leone', 'doctor', 'killed', 'Ebola -source', 'Obama', 'Executive Action', 'Limited', 'Immigration Vape', 'named', 'Oxford English Dictionary', '2014', 'word', 'year', 'Gruber', 'frequently', 'visited', 'White House', 'Everything', 'Know', "n't", 'Know', 'Bill Cosby Rape', 'Obama', 'orders', 'review', 'US', 'hostage', 'policy', 'Uber', 'CEO', 'Apologies', 'Exec', "'Terrible", 'Suggestion', 'Company', 'beheading', 'deaths', 'Americans', 'Obama', 'orders', 'review', 'US', 'response', 'hostage', 'takings', 'Bill Cosby Rape Accuser Joan Tarshis Reveals Details', 'Of Horrifying Attack', 'House', 'Democratic Leaders', 'Hold', 'Caucus Meeting', 'Elect', 'Leaders For 114th', 'Congress', 'U-Va.', 'student', 'Hannah Graham', 'death', 'result', "'homicidal", 'violence', 'officials', 'Adrian Peterson', 'return', 'weekend', 'Green Bay', '[', 'updated', ']', 'Palestinians', 'kill', 'five', 'Jerusalem', 'synagogue', 'attack', 'Keystone', 'Big', 'Senate', 'Test', 'Search', 'One', 'Vote', 'Missouri', 'Gov', 'swears', 'Ferguson', 'panel', 'ahead', 'grand', 'jury', 'decision', 'shooting', 'LISTEN', 'Bill Cosby', '1969', 'riff', 'drugging', 'women', 'drinks', 'Obama', 'orders', 'review', 'US', 'hostage', 'policy', 'Uber', 'vast', 'trove', 'customer', 'data', 'ripe', 'abuse', 'Senate Narrowly Defeats Keystone', 'XL', 'Bill Oxford Dictionaries', 'word', '2014', 'ever', 'heard', '-', 'One', 'News', 'Bill Cosby', 'legacy', 'Obama', "'Nowhere", 'Near', 'Woods', 'Ebola', 'Short', 'List', 'Uber', 'wants', 'silence', 'journalists', 'Keystone', 'bill', 'fails', 'Peterson', 'Keystone Vote Falls Short', 'Senate Jerusalem', 'synagogue', 'attack', "'Lone", 'wolf', 'pattern', 'seen', 'deadly', 'assault', 'Western New York Snow', 'Storm', 'Could', 'Set', 'Records', 'Justice Department Probe Of Ferguson', 'Police', 'Could', 'Spur', 'Broad Change', 'Janice', 'Dickinson', 'accuses', 'Bill Cosby', 'sexual', 'assault', '1982', 'hotel', 'meetup', 'West Africa', '``', 'nowhere', 'near', 'woods', "''", 'Ebola', 'Obama', '-', 'Xinhua Obama', 'orders', 'review', 'US', 'hostage', 'policy', 'Tracy Morgan', 'Still', 'Struggling', '``', 'Fighting', 'Get', 'Better', "''", 'Brain Injury', 'Will Kim', 'Jong-un', 'face', 'mass', 'crimes', 'prosecution', 'The Hague', '+video', 'Keystone Pipeline Fails', 'Get', 'Senate Tech', 'lobby', 'keep', 'tabs', 'NSA', 'reform', 'votes', 'Lawyer', 'Tracy Morgan', 'Still', 'Struggling', 'Severe Brain Injury', 'UPDATE', '1-Seventh', 'Sierra Leone', 'doctor', 'killed', 'Ebola', '-source', 'Quinn', 'Uber', 'grapples', 'aggressive', 'image', 'Senate Narrowly Defeats Keystone', 'XL', 'Pipeline', 'Senate', 'Republicans', 'Block Sweeping Overhaul', 'NSA Program Obama', 'orders', 'review', 'hostage', 'policy', 'Ebola Researchers Race', 'Slow', 'Epidemic Sarah Lacy', 'Uber', "'m", 'everything', 'keep', 'family', 'safe', 'Police', 'nab', 'man', 'sought', 'fatal', 'NYC', 'subway', 'shove', 'UN', 'panel', 'calls', 'N.Korea', 'referral', 'international', 'court', 'Senate', 'defeats', 'Keystone XL', 'pipeline', 'Middle', 'East|Jewish', 'Victims', 'All From One Jerusalem Street', 'Obama', 'orders', 'review', 'policy', 'terrorist-related', 'hostage', 'cases', '150', 'cars', 'snowbound', 'early', 'winter', 'storm', 'Uber', 'plot', 'spy', 'reporter', 'latest', 'controversy', 'UN', 'Rights Committee Urges Court Referral', 'North Korea', 'Death', 'Virginia', 'college', 'student', 'ruled', 'homicide', 'Jerusalem', 'attack', 'Netanyahu', 'hopes', "'PR", 'porn', 'win', 'support', 'abroad', '-', 'Diplomacy', 'Defense Israel News', '50', 'states', 'freezing', 'least', 'one', 'spot', 'Uber', 'CEO', 'Apologizes', 'Exec', "'Terrible", 'Suggestion', 'Company Investigate Journalists Ebola', 'crisis', 'Seventh', 'Sierra Leone', 'doctor', 'dies', 'virus', 'United Nations Urges North Korea Prosecutions', 'Stupidity', 'reconsidered', 'Ryan', 'chair', 'tax', 'panel', 'possible', '2016', 'platform', 'Senate', 'fails', 'advance', 'legislation', 'NSA', 'reform', 'Uber', 'executive', 'stirs', 'privacy', 'controversy', 'Janice', 'Dickinson Says', 'Sexually', 'Assaulted', 'Bill Cosby', 'UVA', 'student', 'Hannah Graham', 'died', "'homicidal", 'violence', 'medical', 'examiner', 'Police', 'nab', 'man', 'sought', 'fatal', 'NYC', 'subway', 'shove', 'Ryan', 'Chair Tax Panel', 'Possible', '2016', 'Platform', 'latest', 'College Football Playoff', 'rankings', 'Alabama', 'rolls', 'straight', '1', 'spot', 'Senate', 'Fails', 'Advance NSA Data Collection Overhaul', 'Legislation', 'Storm', 'blamed', 'least', '4', 'deaths', 'upstate', 'New York', 'Report', 'Janice', 'Dickinson', 'accuses', 'Bill Cosby', 'rape', 'Tracy Morgan', 'Still', 'Battling', 'Brain Injury UN', 'calls', 'probe', 'North Korea', "'crimes", "humanity'", 'Police', 'question', 'man', 'subway', 'shoving', 'death', 'Keystone Vote Unlikely', 'Change Odds', 'Mary Landrieu', 'Senate', 'Republicans', 'block', 'bill', 'NSA', 'continue', 'monitoring', 'calls', 'TV', 'host', 'Janice', 'Dickinson', 'latest', 'Cosby', 'accuser', 'Lawyer', 'Comedian', 'Tracy Morgan Suffered Traumatic Brain Injury', 'NJ Tpke Crash Video', 'seems', 'show', 'Ferguson', 'officer', 'confrontation', 'North Korea', 'UN', 'moves', 'closer', 'ICC', 'human', 'rights', 'probe', 'Defective Takata Airbag Grows Into Global Problem', 'Manufacturer Report', 'Va.', 'student', 'death', 'homicide', 'Winter', 'Whack', 'Nation', 'Faces', 'Arctic Chill', '6', 'Feet', 'Snow Hit Buffalo Area Fail', 'Mary', 'Senate', 'rejects', 'Keystone', 'bill', 'Palestinians', 'kill', 'five', 'Jerusalem', 'synagogue', 'attack', 'Attorney', 'says', 'actor', 'Tracy Morgan', 'struggling', 'crash', 'report', 'Court', 'clears', 'way', 'gay', 'marriage', 'South Carolina', 'Subway', 'Motorman', 'Describes', 'Deadly', 'Train', 'Push Federal', 'highway', 'safety', 'agency', 'demands', 'recall', 'cars', 'Takata', 'air', 'bags', 'North Korea', 'reacts', 'angrily', 'UN', 'votes', 'probe', "'crimes", "humanity'", 'Early', 'winter', 'pummels', 'much', 'country', 'strands', 'motorists', 'emergency', 'vehicles', 'Palestinians', 'kill', 'five', 'Jerusalem', 'synagogue', 'attack', 'Attorney', 'says', 'actor', 'Tracy Morgan', 'struggling', 'crash', '-report', 'UPDATE', '3-US', 'auto', 'regulator', 'seeks', 'nationwide', 'recall', 'Takata', 'air', 'bags', 'Death', 'Virginia', 'college', 'student', 'ruled', 'homicide', 'Adrian Peterson', 'May', 'Suspended', 'Unlikely', 'Lose', 'Any Pay']

In [7]:
"""
Count token frequency and print
"""
if 1 == 1:
    token_counter = collections.Counter(tokens)
    print(str(token_counter.most_common(50)))


[('Uber', 20), ('attack', 15), ('Charles Manson', 13), ('Obama', 13), ('US', 13), ('Ferguson', 12), ('Bill Cosby', 11), ('Jerusalem', 10), ('synagogue', 10), ('2014', 10), ('orders', 9), ('review', 9), ('policy', 9), ('marriage', 8), ('hostage', 8), ('Chicago', 8), ('-', 8), ('Senate', 8), ('says', 7), ('license', 7), ('home', 7), ('4', 6), ('grand', 6), ('jury', 6), ('near', 6), ('crashes', 6), ('storm', 6), ('NFL', 6), ('Adrian Peterson', 6), ('``', 6), ("''", 6), ('Missouri', 5), ('Ebola', 5), ('Kabul', 5), ('decision', 5), ('kills', 5), ('killed', 5), ('police', 5), ('gets', 5), ('Exec', 5), ('Police', 5), ('Keystone', 5), ('plane', 5), ('1', 5), ('House', 5), ('Word', 5), ('student', 5), ('Tracy Morgan', 5), ('!', 4), ('least', 4)]

In [2]:
%%time
"""
Warp up previous pre-process steps in function and apply to all news title docs.
Put resluts in df.
"""
def count_high_freq_words(news_title_doc):
    """
    param new_title_doc: a string of news title doc
    return: a string of high frequency words in doc
    """
    tokens = nltk.word_tokenize(news_title_doc)
    
    chunks = nltk.ne_chunk(nltk.pos_tag(tokens))
    # Each element of chunks is either a (word, pos) tuple or a Tree() containing the parts of the chunk
    tokens = [chunk[0] if isinstance(chunk, tuple) else ' '.join(node[0] for node in chunk) for chunk in chunks]
    
    stop_words = nltk.corpus.stopwords.words('english')
    tokens = [token for token in tokens if token.lower() not in stop_words]
    
    misc_punc_lst = [",", ":", "'", "'s", ";", ".", "?", "(", ")", "..."]
    tokens = [token for token in tokens if token not in misc_punc_lst]
    
    token_counter = collections.Counter(tokens)
    return str(token_counter.most_common(50))

if 0 == 1:
    results_dict = {}
    
    for ind_val, sr_val in news_titles_sr.iteritems():
        results_dict[ind_val] = count_high_freq_words(sr_val)
        
    high_freq_words_sr = pd.Series(results_dict)


CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 15.3 µs

In [9]:
with pd.option_context('display.max_rows', 150, 'display.max_colwidth', 130):
    print(high_freq_words_sr)


2014-11-18    [('Uber', 20), ('attack', 15), ('Charles Manson', 13), ('Obama', 13), ('US', 13), ('Ferguson', 12), ('Bill Cosby', 11), ('Jeru...
2014-11-19    [('Obama', 19), ('Man', 18), ('Bill Cosby', 15), ('immigration', 15), ("'Sexiest", 12), ('found', 10), ("Alive'", 9), ('Missou...
2014-11-20    [('Obama', 23), ('Mike Nichols', 14), ('immigration', 13), ('Buffalo', 13), ('FSU', 12), ('Bill Cosby', 11), ('dies', 10), ('3...
2014-11-21    [('Obama', 41), ('immigration', 20), ('Iran', 17), ('nuclear', 13), ('Ferguson', 13), ('Buffalo', 11), ('police', 10), ('kille...
2014-11-22    [('Obama', 33), ('immigration', 16), ('Afghanistan', 16), ('28', 14), ('US', 13), ('Japan', 13), ('role', 11), ('bus', 11), ('...
2014-11-23    [('Obama', 19), ('Iran', 15), ('Ferguson', 14), ('Marion Barry', 14), ('45', 13), ('immigration', 12), ('grand', 12), ('jury',...
2014-11-24    [('Iran', 22), ('police', 18), ('2016', 17), ('nuclear', 16), ('talks', 16), ('Cleveland', 15), ('Ferguson', 14), ('boy', 14),...
2014-11-25    [('US', 17), ('Thanksgiving', 16), ('Ferguson', 16), ('Iran', 16), ('FDA', 15), ('Hong Kong', 15), ('Obama', 14), ('rules', 11...
2014-11-26    [('Thanksgiving', 46), ('Hong Kong', 18), ('Obama', 18), ('Ferguson', 16), ('Police', 15), ('US', 14), ('EPA', 13), ('police',...
2014-11-27    [('Thanksgiving', 44), ('British', 16), ('Ferguson', 15), ('shooting', 13), ('OPEC', 12), ('leader', 11), ('Day', 11), ('attac...
2014-11-28    [('Friday', 42), ('Black', 31), ('Ferguson', 15), ('EU', 14), ('Thanksgiving', 13), ('Turkey', 13), ('boys', 12), ('David Came...
2014-11-29    [('Saturday', 17), ('Friday', 15), ('Ferguson', 13), ('Small Business', 13), ('Turkey', 12), ('attack', 12), ('Black', 11), ('...
2014-11-30    [('police', 21), ('Hong Kong', 16), ('Obama', 15), ('Monday', 14), ('Police', 14), ('missing', 11), ('years', 11), ('found', 1...
2014-12-01    [('Obama', 23), ('Cyber', 15), ('Monday', 13), ('Ferguson', 12), ('online', 12), ('resigns', 12), ('attack', 11), ('says', 10)...
2014-12-02    [('Obama', 33), ('Ashton Carter', 11), ('36', 10), ('Bill Cosby', 10), ('Lebanon', 10), ('says', 9), ('Senate', 9), ('State', ...
2014-12-03    [('US', 24), ('Iraq', 20), ('Iran', 17), ('State', 14), ('brains', 14), ('Islamic', 13), ('says', 13), ('100', 13), ('Texas', ...
2014-12-04    [('NASA', 16), ('police', 16), ('US', 15), ('Orion', 13), ('Cleveland', 12), ('Yemen', 11), ('launch', 11), ('Police', 11), ('...
2014-12-05    [('Police', 14), ('US', 10), ('says', 9), ('Obama', 9), ('death', 9), ('Philippines', 9), ('Ashton Carter', 9), ('officer', 8)...
2014-12-06    [('US', 25), ('Philippines', 14), ('Afghanistan', 14), ('Amtrak', 10), ('profiling', 10), ('Obama', 10), ('Yemen', 10), ('kill...
2014-12-07    [('US', 17), ('Pearl Harbor', 13), ('attack', 12), ('Uber', 11), ('missing', 10), ('Philippines', 10), ('killed', 10), ('drive...
2014-12-08    [('US', 24), ('dead', 19), ('3', 13), ("n't", 13), ('police', 12), ('Uber', 11), ('fire', 11), ('Obama', 10), ('William', 10),...
2014-12-09    [('Obama', 17), ('Uber', 17), ('plane', 14), ('Police', 12), ('crash', 12), ('report', 11), ('Maryland', 11), ('CIA', 10), ('R...
2014-12-10    [('CIA', 15), ('Obama', 13), ('Palestinian', 13), ('minister', 12), ('police', 11), ('death', 10), ('dies', 10), ('$', 10), ('...
2014-12-11    [('$', 18), ('CIA', 15), ('found', 13), ('storm', 13), ('California', 13), ('Palestinian', 11), ('3', 10), ('car', 9), ('Flori...
2014-12-12    [('Obama', 12), ('bill', 11), ('storm', 10), ('says', 10), ('Palestinian', 10), ('$', 10), ('shot', 9), ('NFL', 9), ('US', 8),...
2014-12-13    [('Police', 24), ('police', 19), ('school', 15), ('shooting', 15), ('missing', 14), ('arrest', 13), ('landslide', 12), ('Indon...
2014-12-14    [('police', 17), ('Japan', 10), ('climate', 9), ('march', 8), ('Abe', 7), ('US', 7), ('Senate', 7), ('deal', 7), ('killed', 7)...
2014-12-15                                                                                                                                   []
2014-12-16    [('Taliban', 3), ('2016', 3), ('menorah', 3), ('lighting', 3), ('says', 3), ('Obama', 3), ('$', 3), ('1.1', 3), ('145', 2), ('...
2014-12-17    [('Pakistan', 16), ('Jeb Bush', 15), ('US', 14), ('Obama', 13), ('Sony', 12), ('school', 12), ('Cuba', 11), ('attack', 10), ('...
2014-12-18    [('US', 5), ('Iraq', 4), ('say', 4), ('leaders', 3), ('calls', 3), ('White House', 3), ('fence', 3), ('Obama', 3), ('officials...
2014-12-19    [('Obama', 32), ('US', 23), ('Pakistan', 19), ('Cuba', 18), ('Sony', 16), ('children', 15), ('killed', 12), ('attack', 11), ('...
2014-12-20    [('US', 20), ('Sony', 15), ('Cuba', 14), ('Obama', 12), ('Pakistan', 12), ('says', 10), ('Taliban', 10), ('school', 8), ('Gaza...
2014-12-21    [('Obama', 12), ('police', 10), ('US', 10), ('home', 9), ('school', 9), ('Police', 9), ('North Korea', 8), ('Muhammad Ali', 8)...
2014-12-22    [('Christmas', 12), ('Police', 11), ('police', 10), ('charges', 9), ('North Korea', 8), ('US', 8), ('hearing', 8), ('10', 8), ...
2014-12-23    [('North Korea', 15), ('Internet', 13), ('Christmas', 12), ('shooting', 12), ('guilty', 12), ('US', 11), ('crash', 11), ('sieg...
2014-12-24    [('Christmas', 19), ('Police', 14), ('Sony', 11), ('India', 11), ('North Korea', 10), ('Christmas Eve', 10), ('blood', 9), ('P...
2014-12-25    [('Christmas', 29), ('attack', 16), ('Day', 13), ('CDC', 12), ('hospital', 11), ('Obama', 11), ('police', 11), ('Sony', 10), (...
2014-12-26    [('US', 17), ('Christmas', 13), ('attack', 12), ('Pakistan', 12), ('tsunami', 11), ('10', 11), ("'The", 11), ('Interview', 10)...
2014-12-27    [('Internet', 15), ('US', 13), ('North Korea', 13), ('Obama', 11), ('Christmas', 10), ('calls', 8), ('blames', 8), ('riding', ...
2014-12-28    [('US', 18), ('police', 17), ('Afghanistan', 16), ('ferry', 15), ('killed', 14), ('officer', 13), ('fire', 12), ('Police', 12)...
2014-12-29    [('Obama', 18), ('US', 14), ('wedding', 11), ('golf', 10), ('Missing', 9), ('GOP', 8), ('attack', 8), ('couple', 8), ('ferry',...
2014-12-30    [('2015', 6), ('Ebola', 5), ('Rep.', 4), ('announces', 4), ('resignation', 4), ('guilty', 4), ('Penalties', 4), ('Michael Grim...
2014-12-31    [('Eve', 5), ('2015', 5), ('New', 4), ('Year', 4), ('flu', 4), ('New Year', 4), ('celebration', 3), ('year', 3), ('Times', 3),...
2015-01-01    [('New', 16), ('Shanghai', 15), ('retrial', 12), ('killed', 11), ('Egypt', 11), ('says', 11), ('police', 11), ('Year', 10), ('...
2015-01-02    [('AirAsia', 13), ('found', 13), ('man', 12), ('bodies', 11), ('new', 11), ('New Year', 10), ('death', 9), ('team', 8), ('Mari...
2015-01-03    [('US', 30), ('dies', 21), ('AirAsia', 17), ('Israel', 13), ('crash', 12), ('4', 12), ('plane', 11), ('Congress', 11), ('Pales...
2015-01-04    [('US', 16), ('new', 13), ('crash', 12), ('sanctions', 11), ('funeral', 11), ('dies', 11), ('plane', 11), ('North Korea', 10),...
2015-01-05    [('dead', 12), ('$', 11), ('police', 11), ('US', 10), ('crash', 10), ('says', 9), ('CES', 9), ('2', 8), ('fund', 7), ('5', 7),...
2015-01-06    [('Gov', 15), ('2', 14), ('McDonnell', 11), ('Jeb Bush', 10), ('SpaceX', 9), ('Snow', 9), ('officers', 8), ('shot', 8), ('poli...
2015-01-07    [('FBI', 24), ('NAACP', 16), ('plane', 14), ('shooting', 14), ('AirAsia', 13), ('brother', 13), ('Police', 13), ('Yemen', 12),...
2015-01-08    [('California', 18), ('People', 15), ('$', 15), ('attack', 13), ('bridge', 12), ('gras', 11), ('Obama', 11), ('says', 10), ('G...
2015-01-09    [('Obama', 13), ('community', 13), ('college', 12), ('bid', 12), ('2024', 12), ('US', 12), ('president', 11), ('Boston', 11), ...
2015-01-10    [('Obama', 28), ('college', 14), ('AirAsia', 14), ('SpaceX', 14), ('arrested', 12), ('George Zimmerman', 11), ('Nigeria', 10),...
2015-01-11    [('Paris', 31), ('AirAsia', 15), ('attack', 13), ('Golden Globes', 13), ('2015', 12), ('Nigeria', 12), ('France', 10), ('rally...
2015-01-12    [('Paris', 20), ('Obama', 20), ('US', 16), ('AirAsia', 15), ('says', 14), ('data', 12), ('retrieve', 9), ('black', 9), ('rally...
2015-01-13    [('Obama', 28), ('State', 20), ('Paris', 15), ('US', 15), ('2016', 11), ('Senate', 11), ('GOP', 11), ('says', 10), ('new', 10)...
2015-01-14    [('Obama', 20), ('US', 16), ('attack', 16), ('bus', 11), ('Paris', 11), ('says', 10), ('officials', 10), ('station', 10), ('Bo...
2015-01-15    [('Obama', 19), ('Oscar', 16), ('attack', 13), ('nominations', 12), ('man', 10), ('Yosemite', 10), ('Paris', 10), ('Ohio', 9),...
2015-01-16    [('police', 19), ('Paris', 16), ('Obama', 15), ('2', 13), ('2014', 13), ('call', 11), ('2016', 11), ('US', 11), ('plan', 10), ...
2015-01-17    [('police', 16), ('dead', 13), ('2014', 12), ('2016', 12), ('Pope', 11), ('Obama', 11), ('Florida', 11), ('shooting', 11), ('y...
2015-01-18    [('Obama', 19), ('Florida', 13), ('State', 13), ('Indonesia', 11), ('dead', 11), ('Delaware', 11), ('President', 9), ('Ukraine...
2015-01-19    [('State', 21), ('1', 18), ('King', 17), ('%', 17), ('Union', 15), ('Obama', 13), ("'American", 13), ('Patriots', 12), ('Snipe...
2015-01-20    [('State', 37), ('Union', 17), ('Obama', 17), ('says', 16), ('trial', 13), ('hostages', 12), ('collapse', 11), ('Japanese', 11...
2015-01-21    [('State', 28), ('Obama', 27), ('Union', 16), ('2016', 12), ('says', 12), ('US', 12), ('leader', 11), ("n't", 11), ('Yemen', 1...
2015-01-22    [('says', 15), ('Senate', 14), ('Yemen', 14), ("n't", 14), ('US', 14), ('Obama', 12), ('bill', 10), ('abortion', 9), ('sex', 9...
2015-01-23    [('Doomsday', 10), ('King', 9), ('Saudi', 9), ('Clock', 9), ('closer', 9), ('YouTube', 8), ('Obama', 7), ('midnight', 7), ("'A...
2015-01-24    [('Obama', 23), ('Saudi', 21), ('2016', 21), ('India', 17), ('visit', 16), ('ISIS', 14), ('Ernie Banks', 12), ('king', 11), ('...
2015-01-25    [('Obama', 31), ('Egypt', 21), ('anniversary', 19), ('India', 18), ('killed', 16), ('uprising', 15), ('city', 13), ('Ukraine',...
2015-01-26    [('White House', 18), ('India', 13), ('Obama', 12), ('killed', 10), ('drone', 9), ('grounds', 9), ('Christie', 8), ('Earth', 8...
2015-01-27    [('2015', 7), ('drone', 6), ('Blizzard', 6), ('India', 5), ('New England', 4), ('Obama', 4), ('New York', 4), ('storm', 4), ('...
2015-01-28    [('Jordan', 10), ('2015', 9), ('Super Bowl', 7), ('pilot', 6), ('New', 5), ('Michelle Obama', 5), ('puppy', 5), ('nominee', 5)...
2015-01-29    [('murder', 12), ('says', 12), ('Obama', 12), ('State', 11), ('trial', 11), ('hostage', 10), ('Aaron Hernandez', 10), ('blast'...
2015-01-30    [('2016', 12), ('NFL', 12), ('killed', 11), ('Super Bowl', 10), ('death', 10), ('mosque', 10), ('Patriots', 10), ('attacks', 9...
2015-01-31    [('State', 17), ('Super Bowl', 14), ('Islamic', 14), ('Kobani', 11), ('Patriots', 10), ('says', 10), ('US', 10), ('ISIS', 10),...
2015-02-01    [('Super Bowl', 28), ('Bowl', 19), ('Super', 16), ('Seahawks', 13), ('2015', 13), ('parents', 13), ('Ukraine', 12), ('Patriots...
2015-02-02    [('Super Bowl', 34), ('Obama', 26), ('US', 12), ('Seahawks', 10), ('budget', 10), ('Christie', 10), ('XLIX', 9), ('show', 9), ...
2015-02-03    [('Obama', 17), ('$', 14), ('crash', 11), ('could', 10), ('-', 10), ('Harper Lee', 10), ('murder', 9), ('budget', 9), ('cause'...
2015-02-04    [('crash', 18), ('Jordan', 17), ('Obama', 13), ('immigration', 12), ('pilot', 11), ('Jeb Bush', 11), ('plane', 10), ('Metro-No...
2015-02-05    [('plane', 13), ('Ukraine', 13), ('Sniper', 11), ('Obama', 10), ('Anthem', 10), ('ISIS', 10), ('Jordan', 10), ("'American", 10...
2015-02-06    [('Obama', 20), ('US', 19), ('ISIS', 13), ('says', 10), ('Islamic', 10), ('State', 10), ('hostage', 10), ('killed', 9), ('Dala...
2015-02-07    [('US', 18), ('Ukraine', 17), ('Nigeria', 17), ('hostage', 14), ('6', 14), ('killed', 14), ('Baghdad', 13), ('State', 12), ('s...
2015-02-08    [('Powerball', 20), ('says', 19), ('$', 19), ('shooting', 15), ('crash', 15), ('jackpot', 14), ('Ukraine', 13), ('dead', 12), ...
2015-02-09    [('2015', 15), ('Grammys', 15), ('HSBC', 14), ('police', 13), ('Egypt', 12), ('Sam Smith', 11), ('violence', 10), ('US', 10), ...
2015-02-10    [('US', 15), ('death', 14), ('murder', 14), ('Jeb Bush', 13), ('marriage', 13), ('charged', 13), ('Obama', 12), ('Alabama', 11...
2015-02-11    [('Jon Stewart', 15), ('Obama', 15), ('Ukraine', 15), ('US', 12), ('ISIS', 12), ("'American", 12), ('Sniper', 12), ('Yemen', 1...
2015-02-12    [('Obama', 18), ('Ukraine', 17), ('Sniper', 17), ("'American", 15), ('$', 13), ('Bob Simon', 11), ('slain', 10), ('2016', 10),...
2015-02-13    [('Obama', 27), ('David Carr', 18), ('New', 13), ('police', 11), ('US', 10), ("n't", 10), ('Alabama', 9), ('attack', 9), ('Ore...
2015-02-14    [('Obama', 19), ('shooting', 16), ('Ukraine', 13), ('plot', 11), ('Valentine', 11), ('says', 10), ('Day', 10), ('Halifax', 9),...
2015-02-15    [('Ukraine', 17), ('police', 14), ('State', 12), ('Police', 11), ('Obama', 11), ('ceasefire', 11), ('Islamic', 11), ('rules', ...
2015-02-16    [('Greek', 15), ('dies', 14), ('Libya', 13), ('FAA', 10), ('Ukraine', 10), ('Copenhagen', 10), ('talks', 10), ('rules', 9), ('...
2015-02-17    [('Ukraine', 14), ('US', 13), ('train', 9), ('oil', 9), ('Obama', 9), ('immigration', 9), ('Haiti', 9), ('Judge', 8), ('Islami...
2015-02-18    [('Obama', 21), ('Wednesday', 13), ('immigration', 10), ('US', 10), ('Nicki Minaj', 10), ('Ukraine', 9), ('says', 8), ('Ash', ...
2015-02-19    [('Obama', 24), ('US', 18), ('Vanilla Ice', 10), ("n't", 10), ('burglary', 9), ('outbreak', 9), ('2', 9), ('trial', 9), ('Osca...
2015-02-20    [('Obama', 11), ("O'Reilly", 10), ('US', 10), ('Vegas', 9), ('rage', 9), ('says', 8), ('UK', 8), ('killing', 8), ('Ukraine', 8...
2015-02-21    [('US', 18), ('Afghanistan', 14), ('says', 11), ('chief', 10), ('winter', 9), ('Winter', 8), ('Malcolm X', 8), ('Dubai', 7), (...
2015-02-22    [('2015', 23), ('Oscars', 22), ('Syria', 17), ('ferry', 14), ('says', 11), ('Obama', 9), ('Ukraine', 9), ('Turkish', 8), ("n't...
2015-02-23    [('Oscars', 35), ('2015', 23), ('US', 21), ('Obama', 19), ('Oscar', 18), ('says', 16), ('attacks', 15), ("'Birdman", 14), ('Pa...
2015-02-24    [('Syria', 16), ('VA', 15), ('Obama', 14), ('US', 14), ('shooting', 13), ('Christians', 13), ('Alaska', 12), ('Czech', 12), ('...
2015-02-25    [("'American", 15), ('Sniper', 15), ('Obama', 15), ('US', 15), ('Top', 14), ('neutrality', 13), ('Southwest', 13), ('Model', 1...
2015-02-26    [('CPAC', 19), ('John', 16), ("'Jihadi", 14), ('DC', 11), ('FCC', 10), ('2016', 10), ('pot', 9), ('State', 9), ('GOP', 9), ('C...
2015-02-27    [("'Jihadi", 19), ('US', 17), ('John', 17), ('dress', 16), ('death', 13), ('blogger', 13), ('Leonard Nimoy', 13), ('dies', 12)...
2015-02-28    [('kills', 12), ('7', 12), ('says', 11), ('Mexico', 10), ('Congress', 10), ('Missouri', 10), ('Leonard Nimoy', 10), ('drug', 9...
2015-03-01    [('dies', 10), ('Minnie Minoso', 10), ('CPAC', 8), ('snow', 8), ('Netanyahu', 8), ('2', 8), ('California', 7), ('Rand Paul', 7...
2015-03-02    [('says', 18), ('Tikrit', 16), ('police', 15), ('State', 13), ('man', 12), ('Islamic', 12), ('Iraq', 12), ('Iran', 11), ('Russ...
2015-03-03    [('Clinton', 14), ('Netanyahu', 14), ('Iran', 12), ('John', 12), ("'Jihadi", 11), ('California', 10), ('Chile', 10), ('State',...
2015-03-04    [('US', 14), ('says', 14), ('gay', 13), ('Supreme Court', 12), ('Obamacare', 12), ('Clinton', 12), ('Ferguson', 11), ('trial',...
2015-03-05    [('US', 18), ('Boston', 13), ('Ringling', 13), ('New York', 10), ('back', 10), ('State', 9), ('South Korea', 9), ('Ferguson', ...
2015-03-06    [('police', 22), ('Ferguson', 20), ('attack', 16), ('Selma', 12), ('says', 11), ('US', 11), ('ancient', 10), ('shot', 10), ('C...
2015-03-07    [('Obama', 18), ('MH370', 14), ('Police', 13), ('Selma', 13), ('police', 13), ('Ferguson', 12), ('White House', 12), ('Mali', ...
2015-03-08    [('MH370', 14), ('police', 12), ('Iran', 11), ('Obama', 10), ('Selma', 10), ('Nemtsov', 10), ('Clinton', 9), ('Sunday', 9), ('...
2015-03-09    [('Iran', 15), ('Apple Watch', 14), ('deal', 13), ('Apple', 13), ('Clinton', 13), ('$', 12), ('Wisconsin', 11), ('crash', 11),...
2015-03-10    [('crash', 26), ('Iran', 15), ('Clinton', 14), ('Obama', 13), ('Argentina', 13), ('helicopter', 13), ('Oklahoma', 12), ('car',...
2015-03-11    [('video', 19), ('says', 15), ('firing', 14), ('squad', 14), ('Iran', 14), ("'Blurred", 13), ('Clinton', 12), ('Ferguson', 11)...
2015-03-12    [('Police', 18), ('crash', 17), ('Ferguson', 12), ('Secret Service', 11), ('police', 10), ('US', 10), ('State', 10), ('2', 10)...
2015-03-13    [('Obama', 19), ('Ferguson', 17), ('Vanuatu', 13), ('says', 12), ('President', 11), ('Secret Service', 10), ('White House', 10...
2015-03-14    [('Myanmar', 16), ('Israel', 16), ('CIA', 13), ('Vanuatu', 12), ('-', 12), ('dead', 12), ('3', 11), ('ferry', 11), ('Netanyahu...
2015-03-15    [('Iran', 17), ('Vanuatu', 16), ('Ferguson', 12), ('Brazil', 11), ('Israel', 10), ('President', 10), ('Police', 9), ('Cyclone ...
2015-03-16    [('says', 19), ('Vanuatu', 15), ('Robert Durst', 15), ('Police', 15), ('officers', 12), ('Hillary Clinton', 12), ('Ferguson', ...
2015-03-17    [('Day', 37), ('Patrick', 21), ('St.', 17), ('TV', 14), ('Robert Durst', 13), ('Apple', 12), ('Secret Service', 12), ('Chris B...
2015-03-18    [('Israel', 14), ('marriage', 13), ('Fed', 13), ('White House', 13), ('nude', 12), ('Penn State', 12), ('says', 12), ('gay', 1...
2015-03-19    [('Netanyahu', 17), ('shooting', 16), ('attack', 15), ('Iran', 15), ('eclipse', 14), ('killed', 12), ('arrest', 12), ('Israel'...
2015-03-20    [('eclipse', 17), ('Obama', 14), ('Iran', 14), ('fracking', 14), ('rules', 13), ('Mississippi', 11), ('talks', 10), ('Yemen', ...
2015-03-21    [('Obama', 24), ('airport', 14), ('attack', 14), ('US', 13), ('Yemen', 13), ('7', 12), ('New Orleans', 11), ('Princeton', 10),...
2015-03-22    [('NCAA', 6), ('US', 4), ('tournament', 4), ('President', 3), ('Obama', 3), ('Villanova', 3), ('seed', 3), ('Tournament', 3), ...
2015-03-23                                                                                                                                   []
2015-03-24                                                                                                                                   []
2015-03-25                                                                                                                                   []
2015-03-26                                                                                                                                   []
2015-03-27                                                                                                                                   []
2015-03-28                                                                                                                                   []
2015-03-29    [('Iran', 29), ('crash', 22), ('says', 18), ('law', 15), ('talks', 14), ('nuclear', 13), ('found', 13), ('Indiana', 12), ('Hal...
2015-03-30    [('Iran', 25), ('nuclear', 19), ('Indiana', 19), ('talks', 15), ('Obama', 15), ('crash', 12), ('law', 12), ('deal', 11), ('Dea...
2015-03-31    [('Iran', 28), ('law', 14), ('prosecutor', 14), ('nuclear', 13), ('talks', 13), ('hostage', 13), ('$', 12), ('deadline', 12), ...
2015-04-01    [('Iran', 17), ('dead', 16), ('dies', 15), ('found', 15), ('crash', 14), ('117', 13), ('world', 13), ('Nigeria', 12), ('oldest...
2015-04-02    [('Iran', 27), ('charges', 13), ('Indiana', 13), ('attack', 13), ('Kenya', 12), ('dead', 12), ('Florida', 12), ('Sen.', 11), (...
2015-04-03    [('Iran', 24), ('Easter', 16), ('66', 16), ('sea', 15), ('days', 14), ('deal', 12), ('Kenya', 12), ('charged', 12), ('death', ...
2015-04-04    [('Iran', 21), ('years', 20), ('nuclear', 16), ('18', 16), ('deal', 14), ('Yemen', 14), ('Easter', 13), ('Kenya', 12), ('dies'...
2015-04-05    [('Easter', 33), ('Iran', 18), ('deal', 15), ('Wisconsin', 12), ('Kentucky', 10), ('Christians', 10), ('Kenya', 9), ('attack',...
2015-04-06    [('Iran', 20), ('Rolling', 19), ('Stone', 18), ('trial', 16), ('2016', 16), ('Obama', 15), ('bombing', 14), ('rape', 13), ('Ed...
2015-04-07    [('Obama', 16), ('7', 15), ('plane', 15), ('Iran', 14), ('White House', 13), ('crash', 13), ('Duke', 12), ('Rand Paul', 12), (...
2015-04-08    [('Obama', 17), ('Rand Paul', 16), ('US', 15), ('Ferguson', 11), ('Afghan', 11), ('White House', 10), ('20', 9), ('years', 9),...
2015-04-09    [('Obama', 20), ('deal', 16), ('shooting', 14), ('Cuba', 14), ('US', 13), ('says', 11), ('Masters', 11), ('nuclear', 10), ('Po...
2015-04-10    [('Masters', 17), ('police', 14), ('shooting', 13), ('man', 12), ('Obama', 11), ('Hillary Clinton', 10), ('Pakistan', 10), ('L...
2015-04-11    [('Obama', 24), ('Iran', 20), ('Clinton', 19), ('police', 18), ('Castro', 15), ('Pakistan', 10), ('says', 10), ('10', 10), ('d...
2015-04-12    [('Iran', 17), ('police', 17), ('shooting', 16), ('charges', 15), ('Masters', 13), ('says', 12), ('Egypt', 12), ('Armenian', 1...
2015-04-13    [('Iran', 20), ('shooting', 17), ('2016', 16), ('Obama', 13), ('Marco Rubio', 12), ('Police', 11), ('Clinton', 11), ('Hillary'...
2015-04-14    [('Hillary Clinton', 18), ('Iran', 17), ('shooting', 16), ('Clinton', 15), ('Marco Rubio', 13), ('year', 13), ('cargo', 11), (...
dtype: object

In [10]:
"""
Make tmp df pickle
"""
if 0 == 1:
    news_title_docs_high_freq_words_df = pd.concat([news_titles_sr, high_freq_words_sr], axis=1)
    news_title_docs_high_freq_words_df.columns = ['news_title_doc', 'high_freq_words']

    news_title_docs_high_freq_words_df.to_pickle(news_title_docs_high_freq_words_df_pkl)

In [2]:
"""
Check results
"""
news_title_docs_high_freq_words_df = pd.read_pickle(news_title_docs_high_freq_words_df_pkl)
with pd.option_context('display.max_colwidth', 100):
    display(news_title_docs_high_freq_words_df)


news_title_doc high_freq_words
news_collected_time
2014-11-18 Missouri's Nixon Declares State of Emergency Awaiting Grand Jury\nPEOPLE: Bill Cosby. Charles Ma... [('Uber', 20), ('attack', 15), ('Charles Manson', 13), ('Obama', 13), ('US', 13), ('Ferguson', 1...
2014-11-19 Early winter pummels much of country, strands motorists, emergency vehicles\nAt the site of Jeru... [('Obama', 19), ('Man', 18), ('Bill Cosby', 15), ('immigration', 15), ("'Sexiest", 12), ('found'...
2014-11-20 Americans brace for more icy temperatures and snow as ferocious storms linger\nREFILE-UPDATE 5-N... [('Obama', 23), ('Mike Nichols', 14), ('immigration', 13), ('Buffalo', 13), ('FSU', 12), ('Bill ...
2014-11-21 Obama unveils actions to spare some illegal immigrants\nTears, smiles in Nevada over US immigrat... [('Obama', 41), ('immigration', 20), ('Iran', 17), ('nuclear', 13), ('Ferguson', 13), ('Buffalo'...
2014-11-22 Activists Rush to Help People Use Obama Immigration Plan\nOfficial: Ferguson grand jury still me... [('Obama', 33), ('immigration', 16), ('Afghanistan', 16), ('28', 14), ('US', 13), ('Japan', 13),...
2014-11-23 Mike Brown's Mom Urges Ferguson Protesters To Remain Peaceful\n6 things to watch for this holida... [('Obama', 19), ('Iran', 15), ('Ferguson', 14), ('Marion Barry', 14), ('45', 13), ('immigration'...
2014-11-24 Obama: Americans want 'new car smell' in 2016\nFormer DC Mayor Marion Barry Dies At 78 « CBS Bal... [('Iran', 22), ('police', 18), ('2016', 17), ('nuclear', 16), ('talks', 16), ('Cleveland', 15), ...
2014-11-25 Could Obama choose a woman as next Defense secretary? One name tops list. (+video)\nWith No Imme... [('US', 17), ('Thanksgiving', 16), ('Ferguson', 16), ('Iran', 16), ('FDA', 15), ('Hong Kong', 15...
2014-11-26 Mississippi same-sex marriage ban overturned\nRain, snow could mess up plans for Thanksgiving tr... [('Thanksgiving', 46), ('Hong Kong', 18), ('Obama', 18), ('Ferguson', 16), ('Police', 15), ('US'...
2014-11-27 Ferguson shooting: Governor 'rejects calls for second jury'\nSpecial forces free eight hostages ... [('Thanksgiving', 44), ('British', 16), ('Ferguson', 15), ('shooting', 13), ('OPEC', 12), ('lead...
2014-11-28 The Most Noteworthy 2014 Black Friday Deals\nFerguson Celebrates Thanksgiving Amidst Turmoil\nUK... [('Friday', 42), ('Black', 31), ('Ferguson', 15), ('EU', 14), ('Thanksgiving', 13), ('Turkey', 1...
2014-11-29 'I pay you,' protesters chant at authorities as tensions return to Ferguson streets\nSource: 2 s... [('Saturday', 17), ('Friday', 15), ('Ferguson', 13), ('Small Business', 13), ('Turkey', 12), ('a...
2014-11-30 Teen missing for four years found alive, hidden behind wall near Atlanta\nFormer New York Gov. M... [('police', 21), ('Hong Kong', 16), ('Obama', 15), ('Monday', 14), ('Police', 14), ('missing', 1...
2014-12-01 Hong Kong Protests Close Down Government\nFewer shoppers and a decline in spending during Black ... [('Obama', 23), ('Cyber', 15), ('Monday', 13), ('Ferguson', 12), ('online', 12), ('resigns', 12)...
2014-12-02 St. Louis Rams, Police Disagree Over 'Apology' for Players' Ferguson Gesture\nCongressional Aide... [('Obama', 33), ('Ashton Carter', 11), ('36', 10), ('Bill Cosby', 10), ('Lebanon', 10), ('says',...
2014-12-03 Another alleged Cosby victim claims he raped her at 15\nUS and Cuba Working On Solution to Free ... [('US', 24), ('Iraq', 20), ('Iran', 17), ('State', 14), ('brains', 14), ('Islamic', 13), ('says'...
2014-12-04 Denmark world's least corrupt country: TI\nOpen doors to those displaced by Ruby, CBCP president... [('NASA', 16), ('police', 16), ('US', 15), ('Orion', 13), ('Cleveland', 12), ('Yemen', 11), ('la...
2014-12-05 Protesters Swarm NYC Over Eric Garner Death For Second Night\nRussian historians left baffled by... [('Police', 14), ('US', 10), ('says', 9), ('Obama', 9), ('death', 9), ('Philippines', 9), ('Asht...
2014-12-06 NASA's Orion Conquers Orbital Test as US Budget Debate Looms\nFour Injured in Michigan Amtrak St... [('US', 25), ('Philippines', 14), ('Afghanistan', 14), ('Amtrak', 10), ('profiling', 10), ('Obam...
2014-12-07 NYPD Officer Daniel Pantaleo faces wrongful arrest lawsuits\nHostage Rescues Called Worth Trying... [('US', 17), ('Pearl Harbor', 13), ('attack', 12), ('Uber', 11), ('missing', 10), ('Philippines'...
2014-12-08 Dire warning over pending released of CIA torture report\n10 Things to Know for Monday\nTyphoon ... [('US', 24), ('dead', 19), ('3', 13), ("n't", 13), ('police', 12), ('Uber', 11), ('fire', 11), (...
2014-12-09 'Unconscionable': Top Republicans lash out ahead of release of CIA report\n6 dead after plane cr... [('Obama', 17), ('Uber', 17), ('plane', 14), ('Police', 12), ('crash', 12), ('report', 11), ('Ma...
2014-12-10 Congress Deal to Avoid Shutdown Includes Victory for Big Banks\nNY police promise to rebuild tru... [('CIA', 15), ('Obama', 13), ('Palestinian', 13), ('minister', 12), ('police', 11), ('death', 10...
2014-12-11 Burning death inquiry eyes woman's last hours\nTIME names 'Person of Year,' Ebola survivor react... [('$', 18), ('CIA', 15), ('found', 13), ('storm', 13), ('California', 13), ('Palestinian', 11), ...
2014-12-12 US House narrowly passes spending bill, averts government shutdown\nCenturies-old time capsule r... [('Obama', 12), ('bill', 11), ('storm', 10), ('says', 10), ('Palestinian', 10), ('$', 10), ('sho...
2014-12-13 Tornado, mudslides triggered by powerful California storm\n8 dead, 100 missing in landslide in I... [('Police', 24), ('police', 19), ('school', 15), ('shooting', 15), ('missing', 14), ('arrest', 1...
2014-12-14 Thousands March Across Nation to Protest Police Killings of Black Men\n20 dead, 88 missing in In... [('police', 17), ('Japan', 10), ('climate', 9), ('march', 8), ('Abe', 7), ('US', 7), ('Senate', ...
2014-12-15 []
2014-12-16 Taliban Besiege Pakistan School, Leaving 145 Dead\nJeb Bush's decision to explore presidential b... [('Taliban', 3), ('2016', 3), ('menorah', 3), ('lighting', 3), ('says', 3), ('Obama', 3), ('$', ...
2014-12-17 Sony under attack from hackers and ex-employees\nFederal judge: Obama immigration actions 'uncon... [('Pakistan', 16), ('Jeb Bush', 15), ('US', 14), ('Obama', 13), ('Sony', 12), ('school', 12), ('...
... ... ...
2015-03-16 Eccentric Durst arrested, says on tape, 'killed them all'\nMass protests present big challenge t... [('says', 19), ('Vanuatu', 15), ('Robert Durst', 15), ('Police', 15), ('officers', 12), ('Hillar...
2015-03-17 Relief Crews Try to Reach Cyclone Victims in Vanuatu\nWade scores 32 as Heat top Cavs\nFor frien... [('Day', 37), ('Patrick', 21), ('St.', 17), ('TV', 14), ('Robert Durst', 13), ('Apple', 12), ('S...
2015-03-18 Netanyahu Pulls Ahead of Main Challenger Herzog in Israeli Elections\nPSU Fraternity Suspended\n... [('Israel', 14), ('marriage', 13), ('Fed', 13), ('White House', 13), ('nude', 12), ('Penn State'...
2015-03-19 Netanyahu Starts Search for Partners After Election Win\nTunisian Parliament Calls Day of Solida... [('Netanyahu', 17), ('shooting', 16), ('attack', 15), ('Iran', 15), ('eclipse', 14), ('killed', ...
2015-03-20 Islamic State responsible for Tunisia museum attack\nUS will 're-assess' options after Netanyahu... [('eclipse', 17), ('Obama', 14), ('Iran', 14), ('fracking', 14), ('rules', 13), ('Mississippi', ...
2015-03-21 Robert Durst lawyers: release him, you won't find anything\nUS sets first fracking rules since p... [('Obama', 24), ('airport', 14), ('attack', 14), ('US', 13), ('Yemen', 13), ('7', 12), ('New Orl...
2015-03-22 Western powers stress unity in Iran talks, 'won't do bad deal'\nPresident Obama to 'reconsider' ... [('NCAA', 6), ('US', 4), ('tournament', 4), ('President', 3), ('Obama', 3), ('Villanova', 3), ('...
2015-03-23 []
2015-03-24 []
2015-03-25 []
2015-03-26 []
2015-03-27 []
2015-03-28 []
2015-03-29 Germanwings co-pilot examined for vision problems before fatal flight, reports say\nAs deadline ... [('Iran', 29), ('crash', 22), ('says', 18), ('law', 15), ('talks', 14), ('nuclear', 13), ('found...
2015-03-30 Comicios regionales dan espacio político a la oposición boliviana\nQuedan varados en lo alto de ... [('Iran', 25), ('nuclear', 19), ('Indiana', 19), ('talks', 15), ('Obama', 15), ('crash', 12), ('...
2015-03-31 Indianapolis Star protests law on Tuesday's cover\nTwo Former Feds Accused of Stealing Around $1... [('Iran', 28), ('law', 14), ('prosecutor', 14), ('nuclear', 13), ('talks', 13), ('hostage', 13),...
2015-04-01 Buhari takes historic victory in Nigeria\nTalks for framework of Iran nuclear deal continue\nJon... [('Iran', 17), ('dead', 16), ('dies', 15), ('found', 15), ('crash', 14), ('117', 13), ('world', ...
2015-04-02 California governor issues mandatory water cuts as snowpack hits record low\nNo breakthrough in ... [('Iran', 27), ('charges', 13), ('Indiana', 13), ('attack', 13), ('Kenya', 12), ('dead', 12), ('...
2015-04-03 Republicans uneasy over Iran nuke 'deal,' lawmakers demand say on any final agreement\nEaster ev... [('Iran', 24), ('Easter', 16), ('66', 16), ('sea', 15), ('days', 14), ('deal', 12), ('Kenya', 12...
2015-04-04 Obama seeks to persuade Congress on Iran nuclear deal\nSarah Brady, wife of former White House P... [('Iran', 21), ('years', 20), ('nuclear', 16), ('18', 16), ('deal', 14), ('Yemen', 14), ('Easter...
2015-04-05 Unusually quick total lunar eclipse dazzles skywatchers\nKenya mourns victims of Garissa al-Shab... [('Easter', 33), ('Iran', 18), ('deal', 15), ('Wisconsin', 12), ('Kentucky', 10), ('Christians',...
2015-04-06 Social media honor Kenya attack victims\n'Door closed' to US Ambassador to Prague\nCERN restarts... [('Iran', 20), ('Rolling', 19), ('Stone', 18), ('trial', 16), ('2016', 16), ('Obama', 15), ('bom...
2015-04-07 Duke surges past Wisconsin to win fifth NCAA basketball championship\nWorld's oldest person dies... [('Obama', 16), ('7', 15), ('plane', 15), ('Iran', 14), ('White House', 13), ('crash', 13), ('Du...
2015-04-08 6 factions Rand Paul must court in Iowa, activists say\nChicago Mayor Rahm Emanuel wins second t... [('Obama', 17), ('Rand Paul', 16), ('US', 15), ('Ferguson', 11), ('Afghan', 11), ('White House',...
2015-04-09 Footage of police shooting makes a difference\nWhite House Supports Efforts to Ban 'Conversion T... [('Obama', 20), ('deal', 16), ('shooting', 14), ('Cuba', 14), ('US', 13), ('says', 11), ('Master...
2015-04-10 With Masters offering chance to clinch career Grand Slam, Rory McIlroy opens with steady 71\nRUD... [('Masters', 17), ('police', 14), ('shooting', 13), ('man', 12), ('Obama', 11), ('Hillary Clinto...
2015-04-11 Obama seeks to re-engage with Latin America at summit\nWhat Videos Show\nEyes of Texas switch fr... [('Obama', 24), ('Iran', 20), ('Clinton', 19), ('police', 18), ('Castro', 15), ('Pakistan', 10),...
2015-04-12 Obama, Castro reach for thaw in relations with historic meeting\nClinton tries again to crack 'h... [('Iran', 17), ('police', 17), ('shooting', 16), ('charges', 15), ('Masters', 13), ('says', 12),...
2015-04-13 By the Numbers: When America Loved and Hated Hillary\nJordan Spieth's 2015 US Masters win helped... [('Iran', 20), ('shooting', 17), ('2016', 16), ('Obama', 13), ('Marco Rubio', 12), ('Police', 11...
2015-04-14 Flight Bound For LA Locates Man Trapped In Cargo Hold After Emergency ...\nSharpton praises resp... [('Hillary Clinton', 18), ('Iran', 17), ('shooting', 16), ('Clinton', 15), ('Marco Rubio', 13), ...

148 rows × 2 columns


In [3]:
with pd.option_context('display.max_colwidth', 130):
    print(news_title_docs_high_freq_words_df['high_freq_words'])


news_collected_time
2014-11-18    [('Uber', 20), ('attack', 15), ('Charles Manson', 13), ('Obama', 13), ('US', 13), ('Ferguson', 12), ('Bill Cosby', 11), ('Jeru...
2014-11-19    [('Obama', 19), ('Man', 18), ('Bill Cosby', 15), ('immigration', 15), ("'Sexiest", 12), ('found', 10), ("Alive'", 9), ('Missou...
2014-11-20    [('Obama', 23), ('Mike Nichols', 14), ('immigration', 13), ('Buffalo', 13), ('FSU', 12), ('Bill Cosby', 11), ('dies', 10), ('3...
2014-11-21    [('Obama', 41), ('immigration', 20), ('Iran', 17), ('nuclear', 13), ('Ferguson', 13), ('Buffalo', 11), ('police', 10), ('kille...
2014-11-22    [('Obama', 33), ('immigration', 16), ('Afghanistan', 16), ('28', 14), ('US', 13), ('Japan', 13), ('role', 11), ('bus', 11), ('...
2014-11-23    [('Obama', 19), ('Iran', 15), ('Ferguson', 14), ('Marion Barry', 14), ('45', 13), ('immigration', 12), ('grand', 12), ('jury',...
2014-11-24    [('Iran', 22), ('police', 18), ('2016', 17), ('nuclear', 16), ('talks', 16), ('Cleveland', 15), ('Ferguson', 14), ('boy', 14),...
2014-11-25    [('US', 17), ('Thanksgiving', 16), ('Ferguson', 16), ('Iran', 16), ('FDA', 15), ('Hong Kong', 15), ('Obama', 14), ('rules', 11...
2014-11-26    [('Thanksgiving', 46), ('Hong Kong', 18), ('Obama', 18), ('Ferguson', 16), ('Police', 15), ('US', 14), ('EPA', 13), ('police',...
2014-11-27    [('Thanksgiving', 44), ('British', 16), ('Ferguson', 15), ('shooting', 13), ('OPEC', 12), ('leader', 11), ('Day', 11), ('attac...
2014-11-28    [('Friday', 42), ('Black', 31), ('Ferguson', 15), ('EU', 14), ('Thanksgiving', 13), ('Turkey', 13), ('boys', 12), ('David Came...
2014-11-29    [('Saturday', 17), ('Friday', 15), ('Ferguson', 13), ('Small Business', 13), ('Turkey', 12), ('attack', 12), ('Black', 11), ('...
2014-11-30    [('police', 21), ('Hong Kong', 16), ('Obama', 15), ('Monday', 14), ('Police', 14), ('missing', 11), ('years', 11), ('found', 1...
2014-12-01    [('Obama', 23), ('Cyber', 15), ('Monday', 13), ('Ferguson', 12), ('online', 12), ('resigns', 12), ('attack', 11), ('says', 10)...
2014-12-02    [('Obama', 33), ('Ashton Carter', 11), ('36', 10), ('Bill Cosby', 10), ('Lebanon', 10), ('says', 9), ('Senate', 9), ('State', ...
2014-12-03    [('US', 24), ('Iraq', 20), ('Iran', 17), ('State', 14), ('brains', 14), ('Islamic', 13), ('says', 13), ('100', 13), ('Texas', ...
2014-12-04    [('NASA', 16), ('police', 16), ('US', 15), ('Orion', 13), ('Cleveland', 12), ('Yemen', 11), ('launch', 11), ('Police', 11), ('...
2014-12-05    [('Police', 14), ('US', 10), ('says', 9), ('Obama', 9), ('death', 9), ('Philippines', 9), ('Ashton Carter', 9), ('officer', 8)...
2014-12-06    [('US', 25), ('Philippines', 14), ('Afghanistan', 14), ('Amtrak', 10), ('profiling', 10), ('Obama', 10), ('Yemen', 10), ('kill...
2014-12-07    [('US', 17), ('Pearl Harbor', 13), ('attack', 12), ('Uber', 11), ('missing', 10), ('Philippines', 10), ('killed', 10), ('drive...
2014-12-08    [('US', 24), ('dead', 19), ('3', 13), ("n't", 13), ('police', 12), ('Uber', 11), ('fire', 11), ('Obama', 10), ('William', 10),...
2014-12-09    [('Obama', 17), ('Uber', 17), ('plane', 14), ('Police', 12), ('crash', 12), ('report', 11), ('Maryland', 11), ('CIA', 10), ('R...
2014-12-10    [('CIA', 15), ('Obama', 13), ('Palestinian', 13), ('minister', 12), ('police', 11), ('death', 10), ('dies', 10), ('$', 10), ('...
2014-12-11    [('$', 18), ('CIA', 15), ('found', 13), ('storm', 13), ('California', 13), ('Palestinian', 11), ('3', 10), ('car', 9), ('Flori...
2014-12-12    [('Obama', 12), ('bill', 11), ('storm', 10), ('says', 10), ('Palestinian', 10), ('$', 10), ('shot', 9), ('NFL', 9), ('US', 8),...
2014-12-13    [('Police', 24), ('police', 19), ('school', 15), ('shooting', 15), ('missing', 14), ('arrest', 13), ('landslide', 12), ('Indon...
2014-12-14    [('police', 17), ('Japan', 10), ('climate', 9), ('march', 8), ('Abe', 7), ('US', 7), ('Senate', 7), ('deal', 7), ('killed', 7)...
2014-12-15                                                                                                                                   []
2014-12-16    [('Taliban', 3), ('2016', 3), ('menorah', 3), ('lighting', 3), ('says', 3), ('Obama', 3), ('$', 3), ('1.1', 3), ('145', 2), ('...
2014-12-17    [('Pakistan', 16), ('Jeb Bush', 15), ('US', 14), ('Obama', 13), ('Sony', 12), ('school', 12), ('Cuba', 11), ('attack', 10), ('...
                                                                            ...                                                                
2015-03-16    [('says', 19), ('Vanuatu', 15), ('Robert Durst', 15), ('Police', 15), ('officers', 12), ('Hillary Clinton', 12), ('Ferguson', ...
2015-03-17    [('Day', 37), ('Patrick', 21), ('St.', 17), ('TV', 14), ('Robert Durst', 13), ('Apple', 12), ('Secret Service', 12), ('Chris B...
2015-03-18    [('Israel', 14), ('marriage', 13), ('Fed', 13), ('White House', 13), ('nude', 12), ('Penn State', 12), ('says', 12), ('gay', 1...
2015-03-19    [('Netanyahu', 17), ('shooting', 16), ('attack', 15), ('Iran', 15), ('eclipse', 14), ('killed', 12), ('arrest', 12), ('Israel'...
2015-03-20    [('eclipse', 17), ('Obama', 14), ('Iran', 14), ('fracking', 14), ('rules', 13), ('Mississippi', 11), ('talks', 10), ('Yemen', ...
2015-03-21    [('Obama', 24), ('airport', 14), ('attack', 14), ('US', 13), ('Yemen', 13), ('7', 12), ('New Orleans', 11), ('Princeton', 10),...
2015-03-22    [('NCAA', 6), ('US', 4), ('tournament', 4), ('President', 3), ('Obama', 3), ('Villanova', 3), ('seed', 3), ('Tournament', 3), ...
2015-03-23                                                                                                                                   []
2015-03-24                                                                                                                                   []
2015-03-25                                                                                                                                   []
2015-03-26                                                                                                                                   []
2015-03-27                                                                                                                                   []
2015-03-28                                                                                                                                   []
2015-03-29    [('Iran', 29), ('crash', 22), ('says', 18), ('law', 15), ('talks', 14), ('nuclear', 13), ('found', 13), ('Indiana', 12), ('Hal...
2015-03-30    [('Iran', 25), ('nuclear', 19), ('Indiana', 19), ('talks', 15), ('Obama', 15), ('crash', 12), ('law', 12), ('deal', 11), ('Dea...
2015-03-31    [('Iran', 28), ('law', 14), ('prosecutor', 14), ('nuclear', 13), ('talks', 13), ('hostage', 13), ('$', 12), ('deadline', 12), ...
2015-04-01    [('Iran', 17), ('dead', 16), ('dies', 15), ('found', 15), ('crash', 14), ('117', 13), ('world', 13), ('Nigeria', 12), ('oldest...
2015-04-02    [('Iran', 27), ('charges', 13), ('Indiana', 13), ('attack', 13), ('Kenya', 12), ('dead', 12), ('Florida', 12), ('Sen.', 11), (...
2015-04-03    [('Iran', 24), ('Easter', 16), ('66', 16), ('sea', 15), ('days', 14), ('deal', 12), ('Kenya', 12), ('charged', 12), ('death', ...
2015-04-04    [('Iran', 21), ('years', 20), ('nuclear', 16), ('18', 16), ('deal', 14), ('Yemen', 14), ('Easter', 13), ('Kenya', 12), ('dies'...
2015-04-05    [('Easter', 33), ('Iran', 18), ('deal', 15), ('Wisconsin', 12), ('Kentucky', 10), ('Christians', 10), ('Kenya', 9), ('attack',...
2015-04-06    [('Iran', 20), ('Rolling', 19), ('Stone', 18), ('trial', 16), ('2016', 16), ('Obama', 15), ('bombing', 14), ('rape', 13), ('Ed...
2015-04-07    [('Obama', 16), ('7', 15), ('plane', 15), ('Iran', 14), ('White House', 13), ('crash', 13), ('Duke', 12), ('Rand Paul', 12), (...
2015-04-08    [('Obama', 17), ('Rand Paul', 16), ('US', 15), ('Ferguson', 11), ('Afghan', 11), ('White House', 10), ('20', 9), ('years', 9),...
2015-04-09    [('Obama', 20), ('deal', 16), ('shooting', 14), ('Cuba', 14), ('US', 13), ('says', 11), ('Masters', 11), ('nuclear', 10), ('Po...
2015-04-10    [('Masters', 17), ('police', 14), ('shooting', 13), ('man', 12), ('Obama', 11), ('Hillary Clinton', 10), ('Pakistan', 10), ('L...
2015-04-11    [('Obama', 24), ('Iran', 20), ('Clinton', 19), ('police', 18), ('Castro', 15), ('Pakistan', 10), ('says', 10), ('10', 10), ('d...
2015-04-12    [('Iran', 17), ('police', 17), ('shooting', 16), ('charges', 15), ('Masters', 13), ('says', 12), ('Egypt', 12), ('Armenian', 1...
2015-04-13    [('Iran', 20), ('shooting', 17), ('2016', 16), ('Obama', 13), ('Marco Rubio', 12), ('Police', 11), ('Clinton', 11), ('Hillary'...
2015-04-14    [('Hillary Clinton', 18), ('Iran', 17), ('shooting', 16), ('Clinton', 15), ('Marco Rubio', 13), ('year', 13), ('cargo', 11), (...
Freq: D, Name: high_freq_words, Length: 148, dtype: object

Write out to csv files for manually inspection


In [4]:
"""
Write out all news titles to csv file
"""
if 0 == 1:
    news_titles_csv_file = os.path.join(config.HR_DIR, 'news_titles.csv')
    news_period_df = pd.read_pickle(config.NEWS_PERIOD_DF_PKL)
    
    news_period_df.to_csv(path_or_buf=news_titles_csv_file, columns=['news_collected_time', 'news_title'], sep='\t', header=True, index=True)

In [5]:
"""
Write out news title high freq words to csv file
"""
if 0 == 1:
    news_title_high_freq_words_csv_file = os.path.join(config.HR_DIR, 'news_title_high_freq_words.csv')
    news_title_docs_high_freq_words_df = pd.read_pickle(news_title_docs_high_freq_words_df_pkl)
    
    news_title_docs_high_freq_words_df.to_csv(path_or_buf=news_title_high_freq_words_csv_file, columns=['high_freq_words'], sep='\t', header=True, index=True)

In [3]:
%%time
"""
Iteration of removing news related to manually selected topics and compling new selected topics list.
"""
iter_num = 'v2'

filtered_news_titles_csv_file = os.path.join(config.HR_DIR, 'news_titles.{}.csv'.format(iter_num))
filtered_news_title_high_freq_words_csv_file = os.path.join(config.HR_DIR, 'news_title_high_freq_words.{}.csv'.format(iter_num))

if 0 == 1:
    '''
    Filter out news related to topics have been manually selected out 
    '''
    filtered_news_dct_lst = []
    
    news_period_df = pd.read_pickle(config.NEWS_PERIOD_DF_PKL)
    
    count = 0
    for row_ind, row in news_period_df.iterrows():
        news_dict = {'news_collected_time': row['news_collected_time'], 'news_title': row['news_title']}
        match_topic = False

        for topic_ind, topic in enumerate(config.MANUALLY_SELECTED_TOPICS_LST):
            if utilities.news_title_match(row['news_title'], topic['keywords_lst'], verbose=False):
                match_topic = True
                count += 1
                break
        
        if not match_topic:
            filtered_news_dct_lst.append(news_dict)
    
    print('{} out of {} news titles filtered out.'.format(count, news_period_df['news_id'].count()))
    filtered_news_df = pd.DataFrame(filtered_news_dct_lst)
    
    '''
    Group news by day of news_collected_time and concatenate news_titles
    '''
    news_titles_sr = filtered_news_df.resample('D', on='news_collected_time')['news_title'].apply(lambda x: '\n'.join(x))
    
    '''
    Re-counted high freq words file
    '''
    results_dict = {}
    
    for ind_val, sr_val in news_titles_sr.iteritems():
        results_dict[ind_val] = count_high_freq_words(sr_val)
        
    high_freq_words_sr = pd.Series(results_dict)
    
    news_title_docs_high_freq_words_df = pd.concat([news_titles_sr, high_freq_words_sr], axis=1)
    news_title_docs_high_freq_words_df.columns = ['news_title_doc', 'high_freq_words']
    
    """
    Write out to csv files
    """
    filtered_news_df.to_csv(path_or_buf=filtered_news_titles_csv_file, columns=['news_collected_time', 'news_title'], sep='\t', header=True, index=True)
    news_title_docs_high_freq_words_df.to_csv(path_or_buf=filtered_news_title_high_freq_words_csv_file, columns=['high_freq_words'], sep='\t', header=True, index=True)


4142 out of 37286 news titles filtered out.
CPU times: user 4min 25s, sys: 808 ms, total: 4min 25s
Wall time: 4min 26s