Spring 2017 Data Bootcamp Final Project by Colleen Jin dj928, Yingying Chen yc1875

Analysis On Relation Between News Sentiment And Market Portfolio

In this project, we use two sets of data to draw insights on how media sentiment can be an indicator for the financial sector. For the financial data, we plan to use daily return of the market index (^GSPC), which is a good indicator for market fluctuation; for media sentiment, we use summarized information of news pieces from top 10 most popular press because of their stronger influence in shaping people's perception of events that are happening in the world.

Both sets of data are real-time, which means the source files are of the moment and need to be loaded each time analysis is performed. The sentiment analysis library returns a polarity score (-1.0 to 1.0) and a polarity score (0.0 to 1.0) on the news stories. Using quantified sentiment analysis, we juxtapose the two time series of data and observe if they present any correlation and search for potential causality. For example, we may test the hypothesis that when polarity among the daily news posts is higher (a.k.a., positive), the financial market that same day is more likely to rise. The rest of the notebook is a step-by-step instruction.

Modules used in this notebook:

  1. TextBlob: its library provides an API for common natural language processing (NLP) tasks, including part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, etc.
  2. Non-Parametric Regression: a category of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data.
  3. WordCloud

Data sources:

  1. News API: We use a news api provided by NewsAPI.org to load real-time news headlines (in the form of JSON metadata), then apply methods mainly from Python's TextBlob module to conduct sentiment analysis. We seleced 10 publish houses by their popularity (please see the ranking of news press here).
  2. S&P 500 index open and closing price derived from Yahoo Finance.

In [69]:
%matplotlib inline                     
# import necessary packages
import pandas as pd                    
import matplotlib.pyplot as plt        
from pandas_datareader import data
from datetime import datetime
import numpy as np
from textblob import TextBlob
import csv

from wordcloud import WordCloud,ImageColorGenerator
#from scipy.misc import imread
import string

PART 1: NEWS COLLECTION - pd.read_json()

We use pd.read_json() to import real-time news information (top 10 posts from each publisher). These news items are stored separately as dataframes and combined into one collective dataframe. (News API powered by NewsAPI.org)**

The news press consists of


In [70]:
cnn = pd.read_json('https://newsapi.org/v1/articles?source=cnn&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518')
nyt= pd.read_json('https://newsapi.org/v1/articles?source=the-new-york-times&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518')
wsp=pd.read_json('https://newsapi.org/v1/articles?source=the-washington-post&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518')
bbc=pd.read_json("https://newsapi.org/v1/articles?source=bbc-news&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518")
abc=pd.read_json("https://newsapi.org/v1/articles?source=abc-news-au&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518")
#google = pd.read_json(" https://newsapi.org/v1/articles?source=google-news&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518")
ft = pd.read_json("https://newsapi.org/v1/articles?source=financial-times&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518")
bloomberg = pd.read_json("https://newsapi.org/v1/articles?source=bloomberg&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518")
economist = pd.read_json("https://newsapi.org/v1/articles?source=the-economist&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518")
wsj = pd.read_json("https://newsapi.org/v1/articles?source=the-wall-street-journal&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518")

In [71]:
total = [wsj, cnn, nyt, wsp, bbc, abc, ft, bloomberg, economist]
total1 = pd.concat(total, ignore_index=True)
total1


Out[71]:
articles sortBy source status
0 {'title': 'James Comey Sought More Resources f... top the-wall-street-journal ok
1 {'title': 'Trump Fires FBI Director James Come... top the-wall-street-journal ok
2 {'title': 'Comey Firing Casts Harsh Spotlight ... top the-wall-street-journal ok
3 {'title': 'As the FBI Reels, Candidates Emerge... top the-wall-street-journal ok
4 {'title': 'Trump’s Firing of Comey Fans Partis... top the-wall-street-journal ok
5 {'title': 'Donald Trump Seeks to Mute Outcry f... top the-wall-street-journal ok
6 {'title': 'Senate Committee Subpoenas Document... top the-wall-street-journal ok
7 {'title': 'Snapchat Parent Posts $2.2 Billion ... top the-wall-street-journal ok
8 {'title': 'U.S. to Expand Intelligence Coopera... top the-wall-street-journal ok
9 {'title': 'Whole Foods Overhauls Board; Vows B... top the-wall-street-journal ok
10 {'title': '4 ways Trump miscalculated the Come... top cnn ok
11 {'title': 'Source close to Comey says there we... top cnn ok
12 {'title': 'Tapper: The real reasons Trump fire... top cnn ok
13 {'title': 'First on CNN: Comey sends farewell ... top cnn ok
14 {'title': 'WH: Comey tossed 'stick of dynamite... top cnn ok
15 {'title': 'Comey committed 'atrocities,' Sarah... top cnn ok
16 {'title': 'Rod Rosenstein: Trump's unlikely ha... top cnn ok
17 {'title': 'Senate intelligence committee subpo... top cnn ok
18 {'title': 'Europe view: American democracy isn... top cnn ok
19 {'title': 'Comey firing sends shockwaves throu... top cnn ok
20 {'title': 'F.B.I. Director James Comey Is Fire... top the-new-york-times ok
21 {'title': 'Days Before Firing, Comey Asked for... top the-new-york-times ok
22 {'title': 'Updates and Reactions to F.B.I. Dir... top the-new-york-times ok
23 {'title': 'Opinion | Trump’s Firing of Comey I... top the-new-york-times ok
24 {'title': 'Jimmy Kimmel Responds to Critics Ov... top the-new-york-times ok
25 {'title': 'In Trump’s Firing of James Comey, E... top the-new-york-times ok
26 {'title': 'Why Everything We Know About Salt M... top the-new-york-times ok
27 {'title': 'How Homeownership Became the Engine... top the-new-york-times ok
28 {'title': 'The Birth of a Mother', 'author': '... top the-new-york-times ok
29 {'title': 'How a 23-Year-Old With Mild Anxiety... top the-new-york-times ok
... ... ... ... ...
60 {'title': 'Defiant Trump courts Russia as prob... top financial-times ok
61 {'title': 'Comey dismissal unfolds in uniquely... top financial-times ok
62 {'title': 'Lavrov delivers a barbed script for... top financial-times ok
63 {'title': 'Little-known prosecutor under scrut... top financial-times ok
64 {'title': 'China ‘New Silk Road’ investment fe... top financial-times ok
65 {'title': 'Global ETF assets reach $4tn', 'aut... top financial-times ok
66 {'title': 'Snap shares slump as debut earnings... top financial-times ok
67 {'title': 'Whole Foods replaces chairman and f... top financial-times ok
68 {'title': 'Cable cowboy John Malone views a ne... top financial-times ok
69 {'title': 'Toshiba technology unit sale grows ... top financial-times ok
70 {'title': 'Flynn Subpoenaed in Russia Probe by... top bloomberg ok
71 {'title': 'After Comey, Justice Must Be Served... top bloomberg ok
72 {'title': 'What Happens to Trump-Russia Probe ... top bloomberg ok
73 {'title': 'Mobius Says Low Market Volatility I... top bloomberg ok
74 {'title': 'Aetna Is Latest Health Insurer to Q... top bloomberg ok
75 {'title': 'Uber Greyball Investigation Expands... top bloomberg ok
76 {'title': 'United Directors Sued Over Ousted C... top bloomberg ok
77 {'title': 'Whole Foods Names Panera CEO to Boa... top bloomberg ok
78 {'title': 'DeVos Booed Loudly by Graduates at ... top bloomberg ok
79 {'title': 'Boeing Halts Flights of 737 Max on ... top bloomberg ok
80 {'title': 'By attacking Kurdish allies of Amer... top the-economist ok
81 {'title': 'President Trump abruptly sacks the ... top the-economist ok
82 {'title': 'Moon Jae-in wins South Korea’s pres... top the-economist ok
83 {'title': 'Why are Russian opposition leaders’... top the-economist ok
84 {'title': '“Girlboss” is another disappointing... top the-economist ok
85 {'title': 'The Tory and Labour parties fail to... top the-economist ok
86 {'title': 'A mixed April for United Airlines',... top the-economist ok
87 {'title': 'Mumbai plans the world’s tallest st... top the-economist ok
88 {'title': 'The Kushners put controversial inve... top the-economist ok
89 {'title': 'Apocalyptic fiction and the Doomsda... top the-economist ok

90 rows × 4 columns

Some values may be missing in the article column. For example, if there is no imformation of the key author of news pieces from BBC, it will indicates None where the author information should have been. Therefore, we need to convert Nonetype entries to string type, because the .append() method for a list cannot pass values of Nonetype. We will use .append() method later for displaying sentiment analysis results.


In [72]:
k = 0
while k < len(total1):
    if total1['articles'][k]['description'] is None:
        total1['articles'][k]['description'] = 'None'
    k += 1

j = 0
while j < len(total1):
    print(type(total1['articles'][j]['description']))
    j += 1
# now all entries are of type string, regardless whether there is real contents.


<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>

In [73]:
l = 0
while l < len(total1):
    if total1['articles'][l]['title'] is None:
        total1['articles'][l]['title'] = 'None'
    l += 1

p = 0
while p < len(total1):
    print(type(total1['articles'][p]['title']))
    p += 1
# now all entries are of type string, regardless whether there is real contents.


<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>

Contents of the column named articles are of dict type; each row contains information including author, title, description, url, urlToImage and publishedAt, among which title is selected for main analysis.


In [74]:
# write the news posts into a new .csv file
n_rows = len(total1.index)
articles = total1['articles']
result = csv.writer(open('result.csv','a'))
result.writerow(['PublishedAt','Title','description'])
for i in range(0,n_rows): 
    line = [articles[i]['publishedAt'],articles[i]['title'],articles[i]['description']]
    result.writerow(line)

# print the first item in the 'articles' series as an example.
articles[0]


Out[74]:
{'author': 'Kristina Peterson',
 'description': 'Former Federal Bureau of Investigation Director James Comey asked the Justice Department last week for more resources for the agency’s investigation into Russian interference in the 2016 election, a U.S. official said.',
 'publishedAt': '2017-05-10T16:17:00Z',
 'title': 'James Comey Sought More Resources for FBI’s Russia Probe Before Being Fired',
 'url': 'https://www.wsj.com/articles/james-comey-had-requested-more-money-for-fbi-s-russia-investigation-before-being-fired-u-s-official-1494433061',
 'urlToImage': 'https://si.wsj.net/public/resources/images/BN-TJ559_TRUMPC_TOP_20170510115539.jpg'}

In [75]:
# type of each entry in the 'articles' column is 'dict'
type(articles[0])


Out[75]:
dict

In [76]:
# keys of the 'dict' variables are 'author', 'publishedAt', 'urlToImage', 'description', 'title', 'url'
articles[0].keys()


Out[76]:
dict_keys(['title', 'author', 'description', 'url', 'urlToImage', 'publishedAt'])

The tags method performs part-of-speech tagging (for example, NNP stands for a singular proper noun).


In [77]:
blob = TextBlob(str(articles[0]['title']))
blob.tags


Out[77]:
[('James', 'NNP'),
 ('Comey', 'NNP'),
 ('Sought', 'NNP'),
 ('More', 'NNP'),
 ('Resources', 'NNPS'),
 ('for', 'IN'),
 ('FBI’s', 'NNP'),
 ('Russia', 'NNP'),
 ('Probe', 'NNP'),
 ('Before', 'IN'),
 ('Being', 'NNP'),
 ('Fired', 'VBD')]

A loop prints all the news titles, which are later used for sentiment analysis.


In [78]:
i = 0
while i < n_rows:
    blob = TextBlob(articles[i]['title'])
    print(1 + i, ". ", blob, sep = "")
    i += 1


1. James Comey Sought More Resources for FBI’s Russia Probe Before Being Fired
2. Trump Fires FBI Director James Comey: Live Coverage of the Fallout
3. Comey Firing Casts Harsh Spotlight on Rod Rosenstein
4. As the FBI Reels, Candidates Emerge to Run Agency
5. Trump’s Firing of Comey Fans Partisan Flames in Congress
6. Donald Trump Seeks to Mute Outcry from Firing of James Comey
7. Senate Committee Subpoenas Documents from Mike Flynn in Russia Probe
8. Snapchat Parent Posts $2.2 Billion Loss in First Quarterly Report; Stock Plunges
9. U.S. to Expand Intelligence Cooperation With Turkey
10. Whole Foods Overhauls Board; Vows Big Changes
11. 4 ways Trump miscalculated the Comey firing
12. Source close to Comey says there were 2 reasons the FBI director was fired
13. Tapper: The real reasons Trump fired Comey - CNN Video
14. First on CNN: Comey sends farewell letter to friends and agents
15. WH: Comey tossed 'stick of dynamite' into DOJ - CNN Video
16. Comey committed 'atrocities,' Sarah Huckabee Sanders says
17. Rod Rosenstein: Trump's unlikely hatchet man
18. Senate intelligence committee subpoenas Michael Flynn
19. Europe view: American democracy isn't as strong as you think
20. Comey firing sends shockwaves through FBI rank-and-file
21. F.B.I. Director James Comey Is Fired by Trump
22. Days Before Firing, Comey Asked for More Resources for Russia Inquiry
23. Updates and Reactions to F.B.I. Director Comey’s Firing
24. Opinion | Trump’s Firing of Comey Is All About the Russia Inquiry
25. Jimmy Kimmel Responds to Critics Over Health Care
26. In Trump’s Firing of James Comey, Echoes of Watergate
27. Why Everything We Know About Salt May Be Wrong
28. How Homeownership Became the Engine of American Inequality
29. The Birth of a Mother
30. How a 23-Year-Old With Mild Anxiety and a Charmed Life Became the Lying, Sobbing, Lovesick Toast of Broadway
31. Here’s how an independent investigation into Trump and Russia would happen
32. Presence of Russian photographer in Oval Office raises alarms
33. Why Trump expected only applause when he told Comey, ‘You’re fired.’
34. Analysis | Mitch McConnell just shut down any hopes Democrats had of an independent Russia investigation
35. Analysis | Bob Woodward on Trump-Watergate comparisons: &#8216;Let&#8217;s see what the evidence is&#8217;
36. Furor over Comey firing grows with news that he sought resources for Russia investigation before his dismissal
37. Senate Intelligence Committee subpoenas documents from Flynn in Russia probe
38. The Daily 202: Firing FBI director Comey is already backfiring on Trump. It’s only going to get worse.
39. The weird moment on Colbert’s show that captured our political whiplash
40. Wait, ‘Can He Do That?’
41. General election 2017: Labour manifesto draft leaked
42. Trump 'considered firing Comey since taking office'
43. Trump Russia meeting: Lavrov praises Trump and Tillerson after talks
44. Drayton Manor: Park to stay closed after Evha Jannath's death
45. Women charged with terror offences and conspiracy to murder
46. HIV life expectancy 'near normal' thanks to new drugs
47. Mobile phone row driver runs cyclist over
48. Mobile phone row driver runs cyclist over
49. 'Love rival' Cardiff woman guilty over crash death
50. Snap shares slide as growth slows
51. 'They will be thanking me': Trump defends firing of FBI chief Comey
52. So the budget is out: What happens now?
53. Live: Morrison stands ground as fight brews over bank tax
54. Keep temporary tax on the wealthy, Shorten to argue in budget reply
55. Handy tips to avoid being locked up by US Immigration
56. Lawyer still haunted by client's execution in Scottish prison's 'hanging shed'
57. Classroom becomes isolation ward as spectre of cholera haunts South Sudan
58. How not to blow your top when you're on the phone to your telco
59. 'I struggle to make $100': Taxi drivers' desperation over Uber
60. Can Australians learn to love local fish?
61. Defiant Trump courts Russia as probe calls grow
62. Comey dismissal unfolds in uniquely Trumpian way
63. Lavrov delivers a barbed script for the Americans
64. Little-known prosecutor under scrutiny over FBI sacking
65. China ‘New Silk Road’ investment fell in 2016
66. Global ETF assets reach $4tn
67. Snap shares slump as debut earnings miss forecasts
68. Whole Foods replaces chairman and finance chief
69. Cable cowboy John Malone views a new landscape
70. Toshiba technology unit sale grows more complex
71. Flynn Subpoenaed in Russia Probe by Senate Intelligence Panel
72. After Comey, Justice Must Be Served
73. What Happens to Trump-Russia Probe After Comey: QuickTake Q&A
74. Mobius Says Low Market Volatility Is Tied to Social Media
75. Aetna Is Latest Health Insurer to Quit Obamacare Markets
76. Uber Greyball Investigation Expands to Multiple U.S. Cities
77. United Directors Sued Over Ousted CEO's Severance Package
78. Whole Foods Names Panera CEO to Board as It Faces Down Jana
79. DeVos Booed Loudly by Graduates at Historically Black College
80. Boeing Halts Flights of 737 Max on Fault in GE-Safran Engine
81. By attacking Kurdish allies of America, Turkey risks confrontation
82. President Trump abruptly sacks the head of the FBI
83. Moon Jae-in wins South Korea’s presidential elections by a landslide
84. Why are Russian opposition leaders’ faces turning green?
85. “Girlboss” is another disappointing take on female entrepreneurship
86. The Tory and Labour parties fail to face the realities of Brexit
87. A mixed April for United Airlines
88. Mumbai plans the world’s tallest statue
89. The Kushners put controversial investor visas in the spotlight
90. Apocalyptic fiction and the Doomsday Clock

All descriptions for the 100 news posts are printed in the same way as above; their presence is useful for adding accuracy for our sentiment analysis by providing more words on the same topic as the titles.


In [79]:
j = 0
while j < n_rows:
    blob1 = TextBlob(str(articles[j]['description']))
    print(1 + j, ". ", blob1, sep = "")
    j += 1


1. Former Federal Bureau of Investigation Director James Comey asked the Justice Department last week for more resources for the agency’s investigation into Russian interference in the 2016 election, a U.S. official said.
2. Trump Fires FBI Director James Comey: Live Coverage of the Fallout
3. The firing of James Comey has cast a harsh spotlight on Deputy Attorney General Rod Rosenstein, who is less than two weeks into a job that he reached with bipartisan Senate support.
4. The Justice Department moved on Wednesday to find a temporary successor for fired FBI Director James Comey, as Attorney General Jeff Sessions and his top deputy interviewed five candidates amid continuing fallout over the controversial dismissal.
5. President Trump’s firing of FBI Director James Comey thrust a debate over the appointment of a special prosecutor to the forefront of the Senate’s agenda, complicating an already halting effort to pass a health-care bill and craft a tax overhaul this year.
6. President Trump weighed in publicly for the first time on his firing of FBI Director James Comey while top Senate Democrats questioned the timing of the ouster of the man who was investigating Trump campaign aides’ ties to Russia.
7. The leaders of the Senate Intelligence Committee said they had requested the information in late April, but the former national security adviser had declined to cooperate.
8. Snap Inc. reported a $2.21 billion loss in its first quarter as a publicly traded company, magnifying the uphill battle the parent of Snapchat faces in establishing a profitable business while competing with social-media giants like Facebook and Twitter.
9. The U.S. is beefing up joint intelligence efforts with Turkey to help that government better target terrorists in the region, in an apparent bid to alleviate Turkish anxieties as the Pentagon implements a plan to arm Kurdish forces operating inside Syria.
10. Whole Foods is dramatically reshaping its board in an effort to show it is open to change after an activist investor last month publicly urged the organic-grocery chain to explore a sale and speed up its turnaround efforts.
11. Donald Trump has been president for 110 days. In that time, he has fired an acting attorney general, his national security adviser, dozens of federal prosecutors, including one who was investigating him, and, on Tuesday night, the director of the FBI, James Comey.
12. There are two reasons why President Donald Trump fired James Comey, according to a source close to the now-former FBI director:
13. CNN's Jake Tapper says one of the reasons President Donald Trump fired James Comey was that the former FBI director would not give him assurance of personal loyalty.
14. Former FBI Director James Comey, who was fired Tuesday by President Donald Trump, on Wednesday sent a letter to friends and agents. Here is the text of that letter, which was obtained by CNN.
15. When asked about why President Trump was moved to fire James Comey when he previously praised him, Deputy press secretary Sarah Huckabee Sanders said that circumstances change when becoming president and throwing "a stick of dynamite" in the Department of Justice is a problem that can't be ignored.
16. FBI Director James Comey committed "atrocities" when investigating Hillary Clinton's emails, deputy White House press secretary Sarah Huckabee Sanders said Wednesday.
17. Deputy Attorney General Rod Rosenstein emerged this week as perhaps the most unlikely character in the politically charged drama over the firing of FBI Director James Comey.
18. The Senate intelligence committee Wednesday issued a subpoena to former National Security Adviser Michael Flynn for documents regarding his interactions with Russian officials.
19. To Europeans, Trump's firing of James Comey is proof that even American democracy is not immune from the threat of authoritarian rule, writes Kate Maltby.
20. News of James Comey's firing Tuesday night sent shockwaves through the FBI, where the dismissal of the generally well-liked bureau director immediately impacted the thousands of agents nationwide.
21. President Trump abruptly terminated Mr. Comey, who was leading an investigation into whether Mr. Trump’s advisers colluded with Russia to influence the election.
22. Separately, the Senate Intelligence Committee accelerated its inquiry, issuing a subpoena to Michael T. Flynn, President Trump’s former national security adviser.
23. Mr. Comey’s dismissal drew immediate rebukes from Democrats, who worried about the ramifications for the F.B.I. investigation of Russian meddling in the election.
24. The president has now decisively crippled the F.B.I.’s ability to carry out an investigation of him and his associates.
25. Mr. Kimmel’s monologue about his son last week came with a political message. Last night he joked that “it was insensitive” to say all children should have health care.
26. Not since Watergate has a president dismissed the person leading an investigation bearing on him, and the dismissal drew instant comparisons to the Saturday Night Massacre.
27. Research on Russian cosmonauts suggests that salt makes you hungry but not thirsty, and may help burn calories.
28. An enormous entitlement in the tax code props up home prices — and overwhelmingly benefits the wealthy and the upper middle class.
29. Becoming a mother is one of the most significant physical and psychological changes a woman will ever experience.
30. Ben Platt wrecks himself onstage in “Dear Evan Hansen.” Surviving it takes practice — and has made him a favorite to win a Tony Award.
31. Sorting out the &ldquo;special prosecutors&rdquo; from the &ldquo;independent commissions.&rdquo;
32. Former U.S. intelligence officials flagged a potential security breach.
33. A president who operates in the moment sees no profit in considering the lessons and contradictions of the past.
34. Besides frustrating Senate GOP leaders into giving in, Democrats have little to no leverage to force Congress to start an independent investigation into Trump-Russia ties.
35. &quot;He can do whatever he wants, within perhaps reasonable limits, so he's got the power.&quot;
36. Republicans and Democrats said Comey&rsquo;s dismissal will frustrate bipartisan efforts to investigate Russian interference in the 2016 election.
37. Flynn had declined the committee&rsquo;s request for materials it wanted to review.
38. As GOP support cracks, POTUS will likely come to regret this blunder
39. Was Comey the bad guy? Or Trump? The grounds are forever shifting under our feet these days.
40. A podcast that explores the powers and limitations of the American presidency
41. The document includes plans to nationalise parts of the energy industry and scrap tuition fees.
42. Democratic senators say the FBI director was seeking more resources for his Trump-Russia probe.
43. Russia's foreign minister said the talks were dominated by improving relations and the war in Syria.
44. Drayton Manor Theme Park will shut for a second day as Evha Jannath's family call for full inquiry.
45. A woman shot in a police raid is among three charged with terror offences and conspiracy to murder.
46. Newer medications have fewer side effects and are more efficient at stopping the virus.
47. A motorist who chased a cyclist then deliberately crashed into him, sending him flying into a tree, has been jailed for three years.
48. A motorist who chased a cyclist then deliberately crashed into him, sending him flying into a tree, has been jailed for three years.
49. Sophie Taylor died when her car hit a block of flats as she was chased by Melissa Pesticcio.
50. Snapchat's number of daily active users rose just 5% to 166 million in the first quarter of 2017.
51. US President Donald Trump defends his firing of FBI director James Comey.
52. We take you through the options for the Government wanting to avoid its budget turning into a bunch of zombies.
53. Treasury officials will brief the nation's biggest banks today about the Government's proposed new $6 billion levy. Follow live.
54. Opposition Leader Bill Shorten will use his budget reply speech to argue the Federal Government should maintain a temporary tax on high-income earners.
55. Overnight Canberra man Baxter Reid was released from US detention more than a week after overstaying his visa by 90 minutes, prompting the question: How easy is it to run afoul of US Immigration?
56. Most people have a moment in their working life that shapes them. For retired Scottish defence lawyer Len Murray, it was the hanging of a young client.
57. In a school classroom in drought-hit South Sudan, parents and their children lie on a bare concrete floor as they wait to find out if they will be among the first victims of a feared cholera outbreak.
58. Consumer advocates say the almost 66,000 complaints to Australia's telecommunications watchdog in the second of half last year are just the tip of the iceberg. Here's what to do if you're having problems with your telephone or internet provider.
59. Perth taxi drivers say they face losing their homes and their retirement plans have been shattered by the rise of Uber, as they cling desperately to the hope a government review may provide them with compensation.
60. Australians are opting for farmed or overseas-caught fish over local species because they simply don't recognise it.
61. None
62. None
63. None
64. None
65. None
66. None
67. None
68. None
69. None
70. None
71. The Senate Intelligence Committee has subpoenaed documents from Michael Flynn, President Donald Trump’s fired national security adviser, in a sign the bipartisan probe will continue full speed ahead one day after Trump terminated FBI Director James Comey.
72. Congress needs to get serious about holding the president accountable.
73. At the moment, the criminal probe into Russia’s meddling in the 2016 U.S. presidential election has all the stability of the Steinbrenner-era Yankees. With the U.S. attorney general self-sidelined, and the FBI director freshly fired, a makeshift lineup of law-enforcement officials now oversees an inquiry that has implications for American foreign policy, American politics and the Trump presidency. Calls for an outside prosecutor are getting louder.
74. Mark Mobius has a left-field theory on why volatility in global stock markets is so low.
75. Aetna Inc. will leave the few remaining states where it had been selling Obamacare plans next year, making it the latest health insurer to pull out of the health law as Republicans attack the program as failing and work to dismantle it.
76. The city of Portland, Oregon, is starting its own investigation of Uber Technologies Inc.’s use of software to evade regulators while a U.S. Justice Department criminal probe continues in that city along with Philadelphia and Austin, Texas, according to officials.
77. United Continental Holdings Inc. directors were sued by a pension fund for granting a $37 million severance package to the carrier’s former chief executive officer, who was ousted in a bribery scandal.
78. Whole Foods Market Inc., the ailing organic-grocery chain, is doing what it can to avoid a fight with activist investor Jana Partners.
79. A week after President Donald Trump suggested his administration might cut $25 million in capital funding to historically black colleges and universities, Education Secretary Betsy DeVos told Bethune-Cookman University’s class of 2017 that she “is fully committed to your success” -- a line that was met by deafening boos from the students and guests.
80. Boeing Co. said it would temporarily suspend flights of its new 737 Max jetliner because of a potential manufacturing flaw in the engines, marring the commercial debut for the fastest-selling plane in company history.
81. In both Syria and Iraq, the danger is mounting
82. The White House has changed its tune about James Comey, with far-reaching consequences
83. As remarkable is how well the country’s conservatives did
84. Critics of the Kremlin are being splashed with zelyonka, a green liquid
85. Compelling stories about successful businesswomen are scarce
86. The UK will inevitably become a less attractive home for business after Brexit. The problem can be tackled, but only if politicians face up to reality
87. Flyers, it seems, have short memories
88. But it won’t come cheap
89. But the EB-5 visa programme should be reformed and expanded, not scrapped
90. As we edge closer to catastrophe, should we expect more doom-laden literature?

PART 2: WORD CLOUD

A word cloud of news tiltles can provide us with a direct and vivid impression of the most frequently discussed topics in today's news reports. Topic/person/event that prevails among the top news pieces appears in the largest font, occupies the center space and displays the most salient colors.

In a visually pleasant way, a word cloud gives us a hint for the news sentiment of the day.

Code referred to https://github.com/amueller/word_cloud/blob/master/examples/simple.py


In [80]:
#write the csv file into a txt file called entire_text.txt
contents = csv.reader(open('result.csv','r'))
texts = open('entire_text.txt','w')
list_of_text = []
for row in contents:
    line = row[2].encode('utf-8')
    line = str(line.decode())
    list_of_text.append(line)
texts.writelines(list_of_text)

In [81]:
text=open("entire_text.txt",'r')
text=text.read()
wordcloud = WordCloud().generate(text)

In [82]:
#display the generated image
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")


Out[82]:
(-0.5, 399.5, 199.5, -0.5)

In [83]:
# increase max_font_size and change backgroud color to white 
wordcloud = WordCloud(max_font_size=40).generate(text)
wordcloud = WordCloud(max_words=200,background_color='white',max_font_size=100).generate(text)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()


PART 3: SENTIMENT ANALYSIS

We use .sentiment method from TextBlob to calculate polatiry and subjectivity of each title. The sentiment property returns an output in the form of namedtuple (Sentiment(polarity, subjectivity)). The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.


In [84]:
# a loop to show sentiment analysis results of the 100 titles
n = 0
while n < n_rows:
    print(TextBlob(articles[n]['title']).sentiment)
    n += 1


Sentiment(polarity=0.5, subjectivity=0.5)
Sentiment(polarity=0.13636363636363635, subjectivity=0.5)
Sentiment(polarity=-0.2, subjectivity=0.7)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.25, subjectivity=0.3333333333333333)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.1, subjectivity=0.25)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.2, subjectivity=0.30000000000000004)
Sentiment(polarity=0.25, subjectivity=0.3333333333333333)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=-0.5, subjectivity=0.5)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.21666666666666665, subjectivity=0.36666666666666664)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.5, subjectivity=0.5)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=-0.5, subjectivity=0.9)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.3333333333333333, subjectivity=0.5)
Sentiment(polarity=0.0, subjectivity=0.125)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=-0.05, subjectivity=0.7)
Sentiment(polarity=-0.07777777777777779, subjectivity=0.20694444444444446)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=-0.13333333333333333, subjectivity=0.5333333333333333)
Sentiment(polarity=-0.25, subjectivity=0.55)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.05000000000000002, subjectivity=0.5)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=-0.1, subjectivity=0.1)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.1465909090909091, subjectivity=0.4261363636363636)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.8)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.13636363636363635, subjectivity=0.5)
Sentiment(polarity=0.5, subjectivity=1.0)
Sentiment(polarity=0.6, subjectivity=0.9)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.5, subjectivity=0.5)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.25, subjectivity=0.3)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.375, subjectivity=1.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.13636363636363635, subjectivity=0.45454545454545453)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.2, subjectivity=0.4)
Sentiment(polarity=0.13636363636363635, subjectivity=0.45454545454545453)
Sentiment(polarity=0.1, subjectivity=0.45)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.016666666666666666, subjectivity=0.18333333333333332)
Sentiment(polarity=0.5, subjectivity=0.9)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.022222222222222213, subjectivity=0.34444444444444444)
Sentiment(polarity=-0.033333333333333326, subjectivity=0.6166666666666667)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=-0.125, subjectivity=1.0)
Sentiment(polarity=0.3, subjectivity=0.2)
Sentiment(polarity=-0.1, subjectivity=0.15)
Sentiment(polarity=-0.3, subjectivity=0.4333333333333333)
Sentiment(polarity=-0.5, subjectivity=0.29999999999999993)
Sentiment(polarity=0.0, subjectivity=0.25)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.55, subjectivity=0.95)
Sentiment(polarity=0.0, subjectivity=0.0)

From the TextBlob module, the .sentiment method returns results in the form of namedtuples. Elements in namedtuples can only be printed after being appended into the form of a list. Therefore, we use a list named tests_title to store all the results from our sentiment tests on the news titles.


In [85]:
N = 0
tests_title = []

while N < n_rows:
    tests_title.append(TextBlob(articles[N]['title']).sentiment)
    N += 1

We create a list named list_polarity_title to store polarity scores for news titles.


In [86]:
list_polarity_title = [] # this list contains all titles polarity scores.

for test in tests_title:
    list_polarity_title.append(test.polarity)

Similarly, we create a list of subjectivity scores for news titles.


In [87]:
list_subjectivity_title = [] # this list contains all titles subjectivity scores.

for test in tests_title:
    list_subjectivity_title.append(test.subjectivity)

'description'

We use .sentiment method again to calculate polarity and subjectivity of each description. As mentioned above, analysis on descritions make the final results more versatile and hopefully more accurate.


In [88]:
m = 0
while m < n_rows:
    print(TextBlob(articles[m]['description']).sentiment)
    m += 1


Sentiment(polarity=0.125, subjectivity=0.14166666666666666)
Sentiment(polarity=0.13636363636363635, subjectivity=0.5)
Sentiment(polarity=-0.10555555555555556, subjectivity=0.4222222222222222)
Sentiment(polarity=0.3666666666666667, subjectivity=0.65)
Sentiment(polarity=0.35714285714285715, subjectivity=0.5714285714285714)
Sentiment(polarity=0.25, subjectivity=0.3)
Sentiment(polarity=-0.15, subjectivity=0.3)
Sentiment(polarity=0.125, subjectivity=0.19999999999999998)
Sentiment(polarity=0.275, subjectivity=0.425)
Sentiment(polarity=0.05, subjectivity=0.25833333333333336)
Sentiment(polarity=0.02500000000000001, subjectivity=0.25)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.15)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.14166666666666666, subjectivity=0.5083333333333334)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.012500000000000011, subjectivity=0.4)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.05000000000000002, subjectivity=0.5)
Sentiment(polarity=-0.125, subjectivity=1.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.07777777777777778)
Sentiment(polarity=0.0, subjectivity=0.6666666666666666)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.2, subjectivity=0.58)
Sentiment(polarity=0.265, subjectivity=0.4935714285714286)
Sentiment(polarity=0.65, subjectivity=0.7)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.5)
Sentiment(polarity=-0.25, subjectivity=0.25)
Sentiment(polarity=-0.19583333333333333, subjectivity=0.5083333333333333)
Sentiment(polarity=0.2, subjectivity=0.35)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=1.0)
Sentiment(polarity=-0.6999999999999998, subjectivity=0.6666666666666666)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.5, subjectivity=0.5)
Sentiment(polarity=-0.125, subjectivity=0.125)
Sentiment(polarity=0.175, subjectivity=0.275)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.5, subjectivity=0.5)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.17916666666666667, subjectivity=0.47083333333333327)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0909090909090909, subjectivity=0.4292929292929293)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.4666666666666667, subjectivity=0.6666666666666667)
Sentiment(polarity=0.3, subjectivity=0.45)
Sentiment(polarity=0.15, subjectivity=0.24444444444444446)
Sentiment(polarity=-0.05555555555555555, subjectivity=0.07777777777777778)
Sentiment(polarity=-0.6, subjectivity=1.0)
Sentiment(polarity=0.0, subjectivity=0.17857142857142858)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.35, subjectivity=0.55)
Sentiment(polarity=-0.3333333333333333, subjectivity=0.6666666666666666)
Sentiment(polarity=-0.024999999999999998, subjectivity=0.24642857142857144)
Sentiment(polarity=0.0, subjectivity=0.15)
Sentiment(polarity=0.09999999999999999, subjectivity=0.3333333333333333)
Sentiment(polarity=0.06666666666666665, subjectivity=0.55)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.2, subjectivity=0.4)
Sentiment(polarity=0.06666666666666667, subjectivity=0.21666666666666667)
Sentiment(polarity=0.04545454545454545, subjectivity=0.48484848484848486)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.75, subjectivity=0.75)
Sentiment(polarity=-0.2, subjectivity=0.3)
Sentiment(polarity=0.525, subjectivity=0.7749999999999999)
Sentiment(polarity=0.15833333333333335, subjectivity=0.7666666666666666)
Sentiment(polarity=0.0, subjectivity=0.3)
Sentiment(polarity=0.4, subjectivity=0.7)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.5, subjectivity=0.5)

In [89]:
M = 0
tests_description = []

while M < n_rows:
    tests_description.append(TextBlob(articles[M]['description']).sentiment)
    M += 1

We create a list of polarity scores for news descriptions by appending each polarity score to the list named list_polarity_description.


In [90]:
list_polarity_description = [] # this list contains all descriptions' polarity scores.

for test in tests_description:
    list_polarity_description.append(test.polarity)

Same as above, we create a list of subjectivity for news descriptions.


In [91]:
list_subjectivity_description = [] # this list contains all descriptions' subjectivity scores.

for test in tests_description:
    list_subjectivity_description.append(test.subjectivity)

Now we have four lists of data:

  1. list_polarity_title
  2. list_subjectivity_title
  3. list_polarity_description
  4. list_subjectivity_description

We convert the four lists of data into one dataframe for drawing plots.


In [92]:
total_score = [list_polarity_title, list_subjectivity_title, list_polarity_description, list_subjectivity_description]
labels = ['T_polarity', 'T_subjectivity', 'D_polarity', 'D_subjectivity']
df = pd.DataFrame.from_records(total_score, index = labels)
df


Out[92]:
0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
T_polarity 0.500000 0.136364 -0.200000 0.000000 0.000000 0.00 0.00 0.250000 0.000 0.100000 ... 0.0 -0.125 0.30 -0.10 -0.300000 -0.500000 0.00 0.0 0.55 0.0
T_subjectivity 0.500000 0.500000 0.700000 0.000000 0.000000 0.00 0.00 0.333333 0.000 0.250000 ... 0.0 1.000 0.20 0.15 0.433333 0.300000 0.25 0.0 0.95 0.0
D_polarity 0.125000 0.136364 -0.105556 0.366667 0.357143 0.25 -0.15 0.125000 0.275 0.050000 ... 0.0 0.000 0.75 -0.20 0.525000 0.158333 0.00 0.4 0.00 0.5
D_subjectivity 0.141667 0.500000 0.422222 0.650000 0.571429 0.30 0.30 0.200000 0.425 0.258333 ... 0.0 0.000 0.75 0.30 0.775000 0.766667 0.30 0.7 0.00 0.5

4 rows × 90 columns

We transpose the dataframe to make it compatible with the .plot() method.


In [93]:
df = df.transpose() 
df


Out[93]:
T_polarity T_subjectivity D_polarity D_subjectivity
0 0.500000 0.500000 0.125000 0.141667
1 0.136364 0.500000 0.136364 0.500000
2 -0.200000 0.700000 -0.105556 0.422222
3 0.000000 0.000000 0.366667 0.650000
4 0.000000 0.000000 0.357143 0.571429
5 0.000000 0.000000 0.250000 0.300000
6 0.000000 0.000000 -0.150000 0.300000
7 0.250000 0.333333 0.125000 0.200000
8 0.000000 0.000000 0.275000 0.425000
9 0.100000 0.250000 0.050000 0.258333
10 0.000000 0.000000 0.025000 0.250000
11 0.000000 0.000000 0.000000 0.000000
12 0.200000 0.300000 0.000000 0.150000
13 0.250000 0.333333 0.000000 0.000000
14 0.000000 0.000000 0.141667 0.508333
15 0.000000 0.000000 0.000000 0.000000
16 -0.500000 0.500000 0.012500 0.400000
17 0.000000 0.000000 0.000000 0.000000
18 0.216667 0.366667 0.000000 0.000000
19 0.000000 0.000000 0.050000 0.500000
20 0.000000 0.000000 -0.125000 1.000000
21 0.500000 0.500000 0.000000 0.000000
22 0.000000 0.000000 0.000000 0.000000
23 0.000000 0.000000 0.000000 0.000000
24 0.000000 0.000000 0.000000 0.077778
25 0.000000 0.000000 0.000000 0.666667
26 -0.500000 0.900000 0.000000 0.000000
27 0.000000 0.000000 0.200000 0.580000
28 0.000000 0.000000 0.265000 0.493571
29 0.333333 0.500000 0.650000 0.700000
... ... ... ... ...
60 0.000000 0.000000 0.000000 0.000000
61 0.375000 1.000000 0.000000 0.000000
62 0.000000 0.000000 0.000000 0.000000
63 0.000000 0.000000 0.000000 0.000000
64 0.136364 0.454545 0.000000 0.000000
65 0.000000 0.000000 0.000000 0.000000
66 0.000000 0.000000 0.000000 0.000000
67 0.200000 0.400000 0.000000 0.000000
68 0.136364 0.454545 0.000000 0.000000
69 0.100000 0.450000 0.000000 0.000000
70 0.000000 0.000000 0.350000 0.550000
71 0.000000 0.000000 -0.333333 0.666667
72 0.000000 0.000000 -0.025000 0.246429
73 0.016667 0.183333 0.000000 0.150000
74 0.500000 0.900000 0.100000 0.333333
75 0.000000 0.000000 0.066667 0.550000
76 0.000000 0.000000 0.000000 0.000000
77 0.022222 0.344444 0.200000 0.400000
78 -0.033333 0.616667 0.066667 0.216667
79 0.000000 0.000000 0.045455 0.484848
80 0.000000 0.000000 0.000000 0.000000
81 -0.125000 1.000000 0.000000 0.000000
82 0.300000 0.200000 0.750000 0.750000
83 -0.100000 0.150000 -0.200000 0.300000
84 -0.300000 0.433333 0.525000 0.775000
85 -0.500000 0.300000 0.158333 0.766667
86 0.000000 0.250000 0.000000 0.300000
87 0.000000 0.000000 0.400000 0.700000
88 0.550000 0.950000 0.000000 0.000000
89 0.000000 0.000000 0.500000 0.500000

90 rows × 4 columns


In [94]:
# this plot shows scores for all 100 news posts.
df.plot()


Out[94]:
<matplotlib.axes._subplots.AxesSubplot at 0x11c812208>

-Analysis by news press

Apparently, the 100 news posts standing alone aren't of much information. For a better perspective, we need to group scores by the press they belong to, under the assumption that posts from the same press are much more likely to embed a uniform tone. We create a list names new_T_polarity to store the sum of polarity scores of news titles for each press. The we do the same operation on subjectivity scores.


In [95]:
c_T_polarity = df['T_polarity']
new_T_polarity = []
B = 0
C = 0
while B < n_rows:
    sum = 0
    while C < B + 10:
        sum += c_T_polarity[C]
        C += 1
    new_T_polarity.append(sum)
    B += 10
new_T_polarity
# The press are in the order as: wsj, cnn, nyt, wsp, bbc, abc, google, ft, bloomberg and economist.


Out[95]:
[0.78636363636363626,
 0.16666666666666666,
 0.33333333333333331,
 -0.51111111111111107,
 0.096590909090909116,
 1.9863636363636363,
 0.94772727272727275,
 0.50555555555555565,
 -0.17499999999999993]

In [96]:
c_T_subjectivity = df['T_subjectivity']
new_T_subjectivity = []
D = 0
E = 0
while D < n_rows:
    sum = 0
    while E < D + 10:
        sum += c_T_subjectivity[E]
        E += 1
    new_T_subjectivity.append(sum)
    D += 10
new_T_subjectivity


Out[96]:
[2.2833333333333332,
 1.5,
 1.8999999999999999,
 2.115277777777778,
 1.8261363636363637,
 3.1999999999999997,
 2.7590909090909093,
 2.0444444444444443,
 3.2833333333333332]

In [97]:
c_D_polarity = df['D_polarity']
new_D_polarity = []
F = 0
G = 0
while F < n_rows:
    sum = 0
    while G < F + 10:
        sum += c_D_polarity[G]
        G += 1
    new_D_polarity.append(sum)
    F += 10
new_D_polarity


Out[97]:
[1.4296176046176046,
 0.22916666666666671,
 0.98999999999999999,
 -0.94583333333333308,
 1.2291666666666667,
 0.35202020202020201,
 0.0,
 0.47045454545454546,
 2.1333333333333337]

In [98]:
c_D_subjectivity = df['D_subjectivity']
new_D_subjectivity = []
H = 0
I = 0
while H < n_rows:
    sum = 0
    while I < H + 10:
        sum += c_D_subjectivity[I]
        I += 1
    new_D_subjectivity.append(sum)
    H += 10
new_D_subjectivity


Out[98]:
[3.7686507936507936,
 1.8083333333333336,
 3.518015873015873,
 3.2749999999999999,
 1.8708333333333331,
 3.0467532467532465,
 0.0,
 3.597943722943723,
 4.0916666666666668]

In [99]:
total_score_bypublishhouse = [new_T_polarity, new_T_subjectivity, new_D_polarity, new_D_subjectivity]
df1 = pd.DataFrame.from_records(total_score_bypublishhouse, index = labels)
df1


Out[99]:
0 1 2 3 4 5 6 7 8
T_polarity 0.786364 0.166667 0.333333 -0.511111 0.096591 1.986364 0.947727 0.505556 -0.175000
T_subjectivity 2.283333 1.500000 1.900000 2.115278 1.826136 3.200000 2.759091 2.044444 3.283333
D_polarity 1.429618 0.229167 0.990000 -0.945833 1.229167 0.352020 0.000000 0.470455 2.133333
D_subjectivity 3.768651 1.808333 3.518016 3.275000 1.870833 3.046753 0.000000 3.597944 4.091667

In [100]:
# change the column labels to press house.
new_columns = ['wsj', 'cnn', 'nyt', 'wsp', 'guardian', 'abc', 'ft', 'bloomberg', 'economist']
df1.columns = new_columns
df1


Out[100]:
wsj cnn nyt wsp guardian abc ft bloomberg economist
T_polarity 0.786364 0.166667 0.333333 -0.511111 0.096591 1.986364 0.947727 0.505556 -0.175000
T_subjectivity 2.283333 1.500000 1.900000 2.115278 1.826136 3.200000 2.759091 2.044444 3.283333
D_polarity 1.429618 0.229167 0.990000 -0.945833 1.229167 0.352020 0.000000 0.470455 2.133333
D_subjectivity 3.768651 1.808333 3.518016 3.275000 1.870833 3.046753 0.000000 3.597944 4.091667

Graph for scores by news press


In [101]:
#colors = [(x/10.0, x/20.0, 0.75) for x in range(n_rows)]

df1.plot(kind = 'bar', legend = True, figsize = (15, 2), colormap='Paired', grid = True)

# place the legend above the subplot and use all the expended width.
plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc=3,
           ncol=10, mode="expand", borderaxespad=0.)


Out[101]:
<matplotlib.legend.Legend at 0x11c9aaeb8>

In [102]:
bar_color = 'orange'

row = df1.iloc[0]
row.plot(kind = 'bar', title = "Polarity for news titles by news press", color = bar_color, grid = True)


Out[102]:
<matplotlib.axes._subplots.AxesSubplot at 0x11ca04ef0>

-Analysis by date

We have loaded news titles and descriptions over 2 weeks and stored them in a csv file called all_news.csv. We then calculated an average news polarity score for each day. We then then graph news polarity score to see how it has changed over time.


In [103]:
contents = csv.reader(open('all_news.csv','r', encoding = "ISO-8859-1"))
result = csv.writer(open('entire_result.csv','w'))

In [104]:
result.writerow(['Date','polarity'])   
for row in contents:
    comment = row[2]  
    blob = TextBlob(comment)
    polarity = blob.sentiment.polarity
    line = [row[0],polarity]
    result.writerow(line)

In [105]:
data = pd.read_csv('entire_result.csv')
data


Out[105]:
Date polarity
0 3/24/17 -0.046528
1 3/24/17 -0.046528
2 4/9/17 0.000000
3 4/11/17 -0.250000
4 4/11/17 0.078571
5 4/12/17 0.000000
6 4/12/17 -0.400000
7 4/12/17 0.000000
8 4/12/17 0.250000
9 4/12/17 0.000000
10 4/12/17 0.033333
11 4/13/17 0.000000
12 4/13/17 -0.100000
13 4/13/17 -0.050000
14 4/13/17 0.500000
15 4/13/17 -0.200000
16 4/13/17 -0.062500
17 4/13/17 0.111111
18 4/13/17 0.100000
19 4/13/17 0.500000
20 4/13/17 0.112121
21 4/13/17 -0.100000
22 4/13/17 0.000000
23 4/13/17 0.187500
24 4/13/17 -0.011111
25 4/13/17 0.250000
26 4/13/17 -0.350000
27 4/13/17 0.000000
28 4/13/17 -0.031250
29 4/13/17 0.350000
... ... ...
909 5/3/17 -0.050000
910 5/3/17 0.137273
911 5/3/17 0.000000
912 5/3/17 0.128788
913 5/3/17 0.069444
914 5/3/17 -0.100000
915 5/3/17 0.000000
916 5/3/17 -0.100000
917 5/3/17 0.200000
918 5/3/17 0.000000
919 5/3/17 0.000000
920 5/3/17 -0.075000
921 5/3/17 0.160000
922 5/3/17 -0.100000
923 5/3/17 0.000000
924 5/3/17 0.166667
925 5/3/17 -0.250000
926 5/3/17 0.000000
927 5/3/17 0.250000
928 5/3/17 -0.050000
929 5/3/17 0.000000
930 5/3/17 0.203333
931 5/3/17 -0.305556
932 5/3/17 0.000000
933 5/3/17 0.000000
934 5/3/17 0.000000
935 5/3/17 0.000000
936 5/4/17 0.350000
937 5/4/17 -0.281818
938 5/4/17 0.000000

939 rows × 2 columns


In [106]:
#group the data by date
data=data.groupby('Date', as_index=False)['polarity'].mean()  
#convert column "Date" to a date data type 
data['Date'] = pd.to_datetime(data['Date'])
#sort the data by date ascending
data=data.sort_values(by="Date", axis=0, ascending=True, inplace=False, kind='quicksort')
data


Out[106]:
Date polarity
0 2017-03-24 -0.046528
10 2017-04-09 0.000000
1 2017-04-11 -0.085714
2 2017-04-12 -0.019444
3 2017-04-13 0.041044
4 2017-04-14 0.032893
5 2017-04-15 0.035714
6 2017-04-27 -0.021875
7 2017-04-28 0.077340
8 2017-04-29 0.006742
9 2017-04-30 0.019901
11 2017-05-01 0.025641
12 2017-05-02 0.032199
13 2017-05-03 -0.000268
14 2017-05-04 0.022727

Graph for scores by date


In [107]:
data.plot(x=data["Date"],kind = 'bar',title='Polarity for news titles by date',grid = True, color = 'orange')


Out[107]:
<matplotlib.axes._subplots.AxesSubplot at 0x11caa12e8>

Part 4: S&P 500 INDEX

Using the yahoo_finance module in Python, we will eventually compare the sentiment analysis of the news posts with the movement of the market index.


In [108]:
from yahoo_finance import Share

# '^GSPC' is the market symble for S&P 500 Index
yahoo = Share('^GSPC')
print(yahoo.get_open())


2401.58

In [109]:
print(yahoo.get_price())


2396.92

In [110]:
print(yahoo.get_trade_datetime())


2017-05-09 20:38:00 UTC+0000

In [111]:
from pprint import pprint
pprint(yahoo.get_historical('2017-04-09', '2017-05-09'))


[{'Adj_Close': '2396.919922',
  'Close': '2396.919922',
  'Date': '2017-05-09',
  'High': '2403.870117',
  'Low': '2392.439941',
  'Open': '2401.580078',
  'Symbol': '%5eGSPC',
  'Volume': '3653590000'},
 {'Adj_Close': '2399.379883',
  'Close': '2399.379883',
  'Date': '2017-05-08',
  'High': '2401.360107',
  'Low': '2393.919922',
  'Open': '2399.939941',
  'Symbol': '%5eGSPC',
  'Volume': '3429440000'},
 {'Adj_Close': '2399.290039',
  'Close': '2399.290039',
  'Date': '2017-05-05',
  'High': '2399.290039',
  'Low': '2389.379883',
  'Open': '2392.370117',
  'Symbol': '%5eGSPC',
  'Volume': '3540140000'},
 {'Adj_Close': '2389.52002',
  'Close': '2389.52002',
  'Date': '2017-05-04',
  'High': '2391.429932',
  'Low': '2380.350098',
  'Open': '2389.790039',
  'Symbol': '%5eGSPC',
  'Volume': '4362540000'},
 {'Adj_Close': '2388.129883',
  'Close': '2388.129883',
  'Date': '2017-05-03',
  'High': '2389.820068',
  'Low': '2379.75',
  'Open': '2386.50',
  'Symbol': '%5eGSPC',
  'Volume': '3893990000'},
 {'Adj_Close': '2391.169922',
  'Close': '2391.169922',
  'Date': '2017-05-02',
  'High': '2392.929932',
  'Low': '2385.820068',
  'Open': '2391.050049',
  'Symbol': '%5eGSPC',
  'Volume': '3813680000'},
 {'Adj_Close': '2388.330078',
  'Close': '2388.330078',
  'Date': '2017-05-01',
  'High': '2394.48999',
  'Low': '2384.830078',
  'Open': '2388.50',
  'Symbol': '%5eGSPC',
  'Volume': '3199240000'},
 {'Adj_Close': '2384.199951',
  'Close': '2384.199951',
  'Date': '2017-04-28',
  'High': '2393.679932',
  'Low': '2382.360107',
  'Open': '2393.679932',
  'Symbol': '%5eGSPC',
  'Volume': '3718270000'},
 {'Adj_Close': '2388.77002',
  'Close': '2388.77002',
  'Date': '2017-04-27',
  'High': '2392.100098',
  'Low': '2382.679932',
  'Open': '2389.699951',
  'Symbol': '%5eGSPC',
  'Volume': '4098460000'},
 {'Adj_Close': '2387.449951',
  'Close': '2387.449951',
  'Date': '2017-04-26',
  'High': '2398.159912',
  'Low': '2386.780029',
  'Open': '2388.97998',
  'Symbol': '%5eGSPC',
  'Volume': '4105920000'},
 {'Adj_Close': '2388.610107',
  'Close': '2388.610107',
  'Date': '2017-04-25',
  'High': '2392.47998',
  'Low': '2381.149902',
  'Open': '2381.51001',
  'Symbol': '%5eGSPC',
  'Volume': '3995240000'},
 {'Adj_Close': '2374.149902',
  'Close': '2374.149902',
  'Date': '2017-04-24',
  'High': '2376.97998',
  'Low': '2369.189941',
  'Open': '2370.330078',
  'Symbol': '%5eGSPC',
  'Volume': '3690650000'},
 {'Adj_Close': '2348.689941',
  'Close': '2348.689941',
  'Date': '2017-04-21',
  'High': '2356.179932',
  'Low': '2344.51001',
  'Open': '2354.73999',
  'Symbol': '%5eGSPC',
  'Volume': '3503360000'},
 {'Adj_Close': '2355.840088',
  'Close': '2355.840088',
  'Date': '2017-04-20',
  'High': '2361.370117',
  'Low': '2340.909912',
  'Open': '2342.689941',
  'Symbol': '%5eGSPC',
  'Volume': '3647420000'},
 {'Adj_Close': '2338.169922',
  'Close': '2338.169922',
  'Date': '2017-04-19',
  'High': '2352.629883',
  'Low': '2335.050049',
  'Open': '2346.790039',
  'Symbol': '%5eGSPC',
  'Volume': '3519900000'},
 {'Adj_Close': '2342.189941',
  'Close': '2342.189941',
  'Date': '2017-04-18',
  'High': '2348.350098',
  'Low': '2334.540039',
  'Open': '2342.530029',
  'Symbol': '%5eGSPC',
  'Volume': '3269840000'},
 {'Adj_Close': '2349.01001',
  'Close': '2349.01001',
  'Date': '2017-04-17',
  'High': '2349.139893',
  'Low': '2332.51001',
  'Open': '2332.620117',
  'Symbol': '%5eGSPC',
  'Volume': '2824710000'},
 {'Adj_Close': '2328.949951',
  'Close': '2328.949951',
  'Date': '2017-04-13',
  'High': '2348.26001',
  'Low': '2328.949951',
  'Open': '2341.97998',
  'Symbol': '%5eGSPC',
  'Volume': '3143890000'},
 {'Adj_Close': '2344.929932',
  'Close': '2344.929932',
  'Date': '2017-04-12',
  'High': '2352.719971',
  'Low': '2341.179932',
  'Open': '2352.149902',
  'Symbol': '%5eGSPC',
  'Volume': '3196950000'},
 {'Adj_Close': '2353.780029',
  'Close': '2353.780029',
  'Date': '2017-04-11',
  'High': '2355.219971',
  'Low': '2337.25',
  'Open': '2353.919922',
  'Symbol': '%5eGSPC',
  'Volume': '3117420000'},
 {'Adj_Close': '2357.159912',
  'Close': '2357.159912',
  'Date': '2017-04-10',
  'High': '2366.370117',
  'Low': '2351.50',
  'Open': '2357.159912',
  'Symbol': '%5eGSPC',
  'Volume': '2785410000'}]

We create a .csv file called yahoo.csv to store the financial data upon each import.


In [119]:
from yahoo_finance import Share
yahoo = Share('^GSPC')
dataset = yahoo.get_historical('2017-04-27','2017-05-09')
result = csv.writer(open('yahoo.csv','w'))
result.writerow(['Date','Low','High'])
for i in range(0,len(dataset)):
    line = [dataset[i]['Date'],dataset[i]['Low'],dataset[i]['High']]
    result.writerow(line)

In [120]:
yahoo = pd.read_csv('yahoo.csv')
yahoo


Out[120]:
Date Low High
0 2017-05-09 2392.439941 2403.870117
1 2017-05-08 2393.919922 2401.360107
2 2017-05-05 2389.379883 2399.290039
3 2017-05-04 2380.350098 2391.429932
4 2017-05-03 2379.750000 2389.820068
5 2017-05-02 2385.820068 2392.929932
6 2017-05-01 2384.830078 2394.489990
7 2017-04-28 2382.360107 2393.679932
8 2017-04-27 2382.679932 2392.100098

In [121]:
#convert column "Date" to a date data type
yahoo['Date'] = pd.to_datetime(yahoo['Date'])
#sort the data by date ascending
yahoo=yahoo.sort_values(by="Date", axis=0, ascending=True, inplace=False, kind='quicksort')
yahoo


Out[121]:
Date Low High
8 2017-04-27 2382.679932 2392.100098
7 2017-04-28 2382.360107 2393.679932
6 2017-05-01 2384.830078 2394.489990
5 2017-05-02 2385.820068 2392.929932
4 2017-05-03 2379.750000 2389.820068
3 2017-05-04 2380.350098 2391.429932
2 2017-05-05 2389.379883 2399.290039
1 2017-05-08 2393.919922 2401.360107
0 2017-05-09 2392.439941 2403.870117

In [122]:
type(data['Date'])
type(yahoo['Date'])


Out[122]:
pandas.core.series.Series

PART 5 CORRELATION BETWEEN NEWS POLARITY AND S&P 500


In [123]:
#join yahoo and data together on "Date"
result = pd.merge(data, yahoo,on='Date')
result


Out[123]:
Date polarity Low High
0 2017-04-27 -0.021875 2382.679932 2392.100098
1 2017-04-28 0.077340 2382.360107 2393.679932
2 2017-05-01 0.025641 2384.830078 2394.489990
3 2017-05-02 0.032199 2385.820068 2392.929932
4 2017-05-03 -0.000268 2379.750000 2389.820068
5 2017-05-04 0.022727 2380.350098 2391.429932

In [124]:
result_len = len(result)

In [125]:
yahoo.plot(x="Date",figsize=(6, 2),title='Yahoo Finance')
data.plot(x='Date',figsize=(6, 2),title='News Title Polarity')


Out[125]:
<matplotlib.axes._subplots.AxesSubplot at 0x11c8e8b00>

Estimate correlation between polarity scores and S&P500 index


In [126]:
import numpy
low=result['Low']
high=result['High']
polarity=result['polarity']
numpy.corrcoef(low, polarity) 
#from the data we have, we can conclude that news polarity and S&P500 index are positively correlated


Out[126]:
array([[ 1.        ,  0.21469213],
       [ 0.21469213,  1.        ]])

In [127]:
numpy.corrcoef(high, polarity)


Out[127]:
array([[ 1.        ,  0.54956514],
       [ 0.54956514,  1.        ]])

In [128]:
numpy.corrcoef(high, low)


Out[128]:
array([[ 1.        ,  0.77905387],
       [ 0.77905387,  1.        ]])

In [129]:
#a scatterplot for news polarity and Yahoo daily return of the market index
result.plot.scatter(x="polarity", y="Low")


Out[129]:
<matplotlib.axes._subplots.AxesSubplot at 0x11bdc8fd0>

A parametic estimation for Yahoo daily return by news polarity


In [130]:
#a parametic estimation for Yahoo daily return by news polarity
import seaborn as sns
#lmplot plots the data with the regression coefficient through it.
sns.lmplot(x="polarity", y="Low", data=result, ci=0.95) #ci stands for confidence interval


Out[130]:
<seaborn.axisgrid.FacetGrid at 0x11669c5f8>

A non-parametic estimation for Yahoo daily return by news polarity


In [131]:
import pyqt_fit.nonparam_regression as smooth
from pyqt_fit import npr_methods

In [132]:
k0 = smooth.NonParamRegression(polarity, low, method=npr_methods.SpatialAverage())
k0.fit()
grid = np.r_[-0.05:0.05:0.01]
plt.plot(grid, k0(grid), label="Spatial Averaging", linewidth=2)
plt.legend(loc='best')


Out[132]:
<matplotlib.legend.Legend at 0x119a3ab00>