Spring 2017 Data Bootcamp Final Project by Colleen Jin dj928, Yingying Chen yc1875

Analysis On Relation Between News Sentiment And Market Portfolio

title

In this project, we use two sets of data to draw insights on how media sentiment can be an indicator for the financial sector. For the financial data, we plan to use daily return of the market index (^GSPC), which is a good indicator for market fluctuation; for media sentiment, we use summarized information of news pieces from top 10 most popular press because of their stronger influence in shaping people's perception of events that are happening in the world.

Both sets of data are real-time, which means the source files are of the moment and need to be loaded each time analysis is performed. The sentiment analysis library returns a polarity score (-1.0 to 1.0) and a polarity score (0.0 to 1.0) on the news stories. Using quantified sentiment analysis, we juxtapose the two time series of data and observe if they present any correlation and search for potential causality. For example, we may test the hypothesis that when polarity among the daily news posts is higher (a.k.a., positive), the financial market that same day is more likely to rise. The rest of the notebook is a step-by-step instruction.

Modules used in this notebook:

TextBlob: its library provides an API for common natural language processing (NLP) tasks, including part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, etc.
Non-Parametric Regression: a category of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data.
WordCloud

Data sources:

News API: We use a news api provided by NewsAPI.org to load real-time news headlines (in the form of JSON metadata), then apply methods mainly from Python's TextBlob module to conduct sentiment analysis. We seleced 10 publish houses by their popularity (please see the ranking of news press here).
S&P 500 index open and closing price derived from Yahoo Finance.



In [69]:

    
%matplotlib inline                     
# import necessary packages
import pandas as pd                    
import matplotlib.pyplot as plt        
from pandas_datareader import data
from datetime import datetime
import numpy as np
from textblob import TextBlob
import csv

from wordcloud import WordCloud,ImageColorGenerator
#from scipy.misc import imread
import string

PART 1: NEWS COLLECTION - pd.read_json()

We use pd.read_json() to import real-time news information (top 10 posts from each publisher). These news items are stored separately as dataframes and combined into one collective dataframe. (News API powered by NewsAPI.org)**

The news press consists of



In [70]:

    
cnn = pd.read_json('https://newsapi.org/v1/articles?source=cnn&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518')
nyt= pd.read_json('https://newsapi.org/v1/articles?source=the-new-york-times&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518')
wsp=pd.read_json('https://newsapi.org/v1/articles?source=the-washington-post&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518')
bbc=pd.read_json("https://newsapi.org/v1/articles?source=bbc-news&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518")
abc=pd.read_json("https://newsapi.org/v1/articles?source=abc-news-au&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518")
#google = pd.read_json(" https://newsapi.org/v1/articles?source=google-news&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518")
ft = pd.read_json("https://newsapi.org/v1/articles?source=financial-times&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518")
bloomberg = pd.read_json("https://newsapi.org/v1/articles?source=bloomberg&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518")
economist = pd.read_json("https://newsapi.org/v1/articles?source=the-economist&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518")
wsj = pd.read_json("https://newsapi.org/v1/articles?source=the-wall-street-journal&sortBy=top&apiKey=bdc0623102e94a7586137f02a51e0518")



In [71]:

    
total = [wsj, cnn, nyt, wsp, bbc, abc, ft, bloomberg, economist]
total1 = pd.concat(total, ignore_index=True)
total1









    Out[71]:






  
    
      
      articles
      sortBy
      source
      status
    
  
  
    
      0
      {'title': 'James Comey Sought More Resources f...
      top
      the-wall-street-journal
      ok
    
    
      1
      {'title': 'Trump Fires FBI Director James Come...
      top
      the-wall-street-journal
      ok
    
    
      2
      {'title': 'Comey Firing Casts Harsh Spotlight ...
      top
      the-wall-street-journal
      ok
    
    
      3
      {'title': 'As the FBI Reels, Candidates Emerge...
      top
      the-wall-street-journal
      ok
    
    
      4
      {'title': 'Trump’s Firing of Comey Fans Partis...
      top
      the-wall-street-journal
      ok
    
    
      5
      {'title': 'Donald Trump Seeks to Mute Outcry f...
      top
      the-wall-street-journal
      ok
    
    
      6
      {'title': 'Senate Committee Subpoenas Document...
      top
      the-wall-street-journal
      ok
    
    
      7
      {'title': 'Snapchat Parent Posts $2.2 Billion ...
      top
      the-wall-street-journal
      ok
    
    
      8
      {'title': 'U.S. to Expand Intelligence Coopera...
      top
      the-wall-street-journal
      ok
    
    
      9
      {'title': 'Whole Foods Overhauls Board; Vows B...
      top
      the-wall-street-journal
      ok
    
    
      10
      {'title': '4 ways Trump miscalculated the Come...
      top
      cnn
      ok
    
    
      11
      {'title': 'Source close to Comey says there we...
      top
      cnn
      ok
    
    
      12
      {'title': 'Tapper: The real reasons Trump fire...
      top
      cnn
      ok
    
    
      13
      {'title': 'First on CNN: Comey sends farewell ...
      top
      cnn
      ok
    
    
      14
      {'title': 'WH: Comey tossed 'stick of dynamite...
      top
      cnn
      ok
    
    
      15
      {'title': 'Comey committed 'atrocities,' Sarah...
      top
      cnn
      ok
    
    
      16
      {'title': 'Rod Rosenstein: Trump's unlikely ha...
      top
      cnn
      ok
    
    
      17
      {'title': 'Senate intelligence committee subpo...
      top
      cnn
      ok
    
    
      18
      {'title': 'Europe view: American democracy isn...
      top
      cnn
      ok
    
    
      19
      {'title': 'Comey firing sends shockwaves throu...
      top
      cnn
      ok
    
    
      20
      {'title': 'F.B.I. Director James Comey Is Fire...
      top
      the-new-york-times
      ok
    
    
      21
      {'title': 'Days Before Firing, Comey Asked for...
      top
      the-new-york-times
      ok
    
    
      22
      {'title': 'Updates and Reactions to F.B.I. Dir...
      top
      the-new-york-times
      ok
    
    
      23
      {'title': 'Opinion | Trump’s Firing of Comey I...
      top
      the-new-york-times
      ok
    
    
      24
      {'title': 'Jimmy Kimmel Responds to Critics Ov...
      top
      the-new-york-times
      ok
    
    
      25
      {'title': 'In Trump’s Firing of James Comey, E...
      top
      the-new-york-times
      ok
    
    
      26
      {'title': 'Why Everything We Know About Salt M...
      top
      the-new-york-times
      ok
    
    
      27
      {'title': 'How Homeownership Became the Engine...
      top
      the-new-york-times
      ok
    
    
      28
      {'title': 'The Birth of a Mother', 'author': '...
      top
      the-new-york-times
      ok
    
    
      29
      {'title': 'How a 23-Year-Old With Mild Anxiety...
      top
      the-new-york-times
      ok
    
    
      ...
      ...
      ...
      ...
      ...
    
    
      60
      {'title': 'Defiant Trump courts Russia as prob...
      top
      financial-times
      ok
    
    
      61
      {'title': 'Comey dismissal unfolds in uniquely...
      top
      financial-times
      ok
    
    
      62
      {'title': 'Lavrov delivers a barbed script for...
      top
      financial-times
      ok
    
    
      63
      {'title': 'Little-known prosecutor under scrut...
      top
      financial-times
      ok
    
    
      64
      {'title': 'China ‘New Silk Road’ investment fe...
      top
      financial-times
      ok
    
    
      65
      {'title': 'Global ETF assets reach $4tn', 'aut...
      top
      financial-times
      ok
    
    
      66
      {'title': 'Snap shares slump as debut earnings...
      top
      financial-times
      ok
    
    
      67
      {'title': 'Whole Foods replaces chairman and f...
      top
      financial-times
      ok
    
    
      68
      {'title': 'Cable cowboy John Malone views a ne...
      top
      financial-times
      ok
    
    
      69
      {'title': 'Toshiba technology unit sale grows ...
      top
      financial-times
      ok
    
    
      70
      {'title': 'Flynn Subpoenaed in Russia Probe by...
      top
      bloomberg
      ok
    
    
      71
      {'title': 'After Comey, Justice Must Be Served...
      top
      bloomberg
      ok
    
    
      72
      {'title': 'What Happens to Trump-Russia Probe ...
      top
      bloomberg
      ok
    
    
      73
      {'title': 'Mobius Says Low Market Volatility I...
      top
      bloomberg
      ok
    
    
      74
      {'title': 'Aetna Is Latest Health Insurer to Q...
      top
      bloomberg
      ok
    
    
      75
      {'title': 'Uber Greyball Investigation Expands...
      top
      bloomberg
      ok
    
    
      76
      {'title': 'United Directors Sued Over Ousted C...
      top
      bloomberg
      ok
    
    
      77
      {'title': 'Whole Foods Names Panera CEO to Boa...
      top
      bloomberg
      ok
    
    
      78
      {'title': 'DeVos Booed Loudly by Graduates at ...
      top
      bloomberg
      ok
    
    
      79
      {'title': 'Boeing Halts Flights of 737 Max on ...
      top
      bloomberg
      ok
    
    
      80
      {'title': 'By attacking Kurdish allies of Amer...
      top
      the-economist
      ok
    
    
      81
      {'title': 'President Trump abruptly sacks the ...
      top
      the-economist
      ok
    
    
      82
      {'title': 'Moon Jae-in wins South Korea’s pres...
      top
      the-economist
      ok
    
    
      83
      {'title': 'Why are Russian opposition leaders’...
      top
      the-economist
      ok
    
    
      84
      {'title': '“Girlboss” is another disappointing...
      top
      the-economist
      ok
    
    
      85
      {'title': 'The Tory and Labour parties fail to...
      top
      the-economist
      ok
    
    
      86
      {'title': 'A mixed April for United Airlines',...
      top
      the-economist
      ok
    
    
      87
      {'title': 'Mumbai plans the world’s tallest st...
      top
      the-economist
      ok
    
    
      88
      {'title': 'The Kushners put controversial inve...
      top
      the-economist
      ok
    
    
      89
      {'title': 'Apocalyptic fiction and the Doomsda...
      top
      the-economist
      ok
    
  

90 rows × 4 columns

Some values may be missing in the article column. For example, if there is no imformation of the key author of news pieces from BBC, it will indicates None where the author information should have been. Therefore, we need to convert Nonetype entries to string type, because the .append() method for a list cannot pass values of Nonetype. We will use .append() method later for displaying sentiment analysis results.



In [72]:

    
k = 0
while k < len(total1):
    if total1['articles'][k]['description'] is None:
        total1['articles'][k]['description'] = 'None'
    k += 1

j = 0
while j < len(total1):
    print(type(total1['articles'][j]['description']))
    j += 1
# now all entries are of type string, regardless whether there is real contents.









    



<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>



In [73]:

    
l = 0
while l < len(total1):
    if total1['articles'][l]['title'] is None:
        total1['articles'][l]['title'] = 'None'
    l += 1

p = 0
while p < len(total1):
    print(type(total1['articles'][p]['title']))
    p += 1
# now all entries are of type string, regardless whether there is real contents.









    



<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>

Contents of the column named articles are of dict type; each row contains information including author, title, description, url, urlToImage and publishedAt, among which title is selected for main analysis.



In [74]:

    
# write the news posts into a new .csv file
n_rows = len(total1.index)
articles = total1['articles']
result = csv.writer(open('result.csv','a'))
result.writerow(['PublishedAt','Title','description'])
for i in range(0,n_rows): 
    line = [articles[i]['publishedAt'],articles[i]['title'],articles[i]['description']]
    result.writerow(line)

# print the first item in the 'articles' series as an example.
articles[0]









    Out[74]:





{'author': 'Kristina Peterson',
 'description': 'Former Federal Bureau of Investigation Director James Comey asked the Justice Department last week for more resources for the agency’s investigation into Russian interference in the 2016 election, a U.S. official said.',
 'publishedAt': '2017-05-10T16:17:00Z',
 'title': 'James Comey Sought More Resources for FBI’s Russia Probe Before Being Fired',
 'url': 'https://www.wsj.com/articles/james-comey-had-requested-more-money-for-fbi-s-russia-investigation-before-being-fired-u-s-official-1494433061',
 'urlToImage': 'https://si.wsj.net/public/resources/images/BN-TJ559_TRUMPC_TOP_20170510115539.jpg'}



In [75]:

    
# type of each entry in the 'articles' column is 'dict'
type(articles[0])









    Out[75]:





dict



In [76]:

    
# keys of the 'dict' variables are 'author', 'publishedAt', 'urlToImage', 'description', 'title', 'url'
articles[0].keys()









    Out[76]:





dict_keys(['title', 'author', 'description', 'url', 'urlToImage', 'publishedAt'])

The tags method performs part-of-speech tagging (for example, NNP stands for a singular proper noun).



In [77]:

    
blob = TextBlob(str(articles[0]['title']))
blob.tags









    Out[77]:





[('James', 'NNP'),
 ('Comey', 'NNP'),
 ('Sought', 'NNP'),
 ('More', 'NNP'),
 ('Resources', 'NNPS'),
 ('for', 'IN'),
 ('FBI’s', 'NNP'),
 ('Russia', 'NNP'),
 ('Probe', 'NNP'),
 ('Before', 'IN'),
 ('Being', 'NNP'),
 ('Fired', 'VBD')]

A loop prints all the news titles, which are later used for sentiment analysis.



In [78]:

    
i = 0
while i < n_rows:
    blob = TextBlob(articles[i]['title'])
    print(1 + i, ". ", blob, sep = "")
    i += 1









    



1. James Comey Sought More Resources for FBI’s Russia Probe Before Being Fired
2. Trump Fires FBI Director James Comey: Live Coverage of the Fallout
3. Comey Firing Casts Harsh Spotlight on Rod Rosenstein
4. As the FBI Reels, Candidates Emerge to Run Agency
5. Trump’s Firing of Comey Fans Partisan Flames in Congress
6. Donald Trump Seeks to Mute Outcry from Firing of James Comey
7. Senate Committee Subpoenas Documents from Mike Flynn in Russia Probe
8. Snapchat Parent Posts $2.2 Billion Loss in First Quarterly Report; Stock Plunges
9. U.S. to Expand Intelligence Cooperation With Turkey
10. Whole Foods Overhauls Board; Vows Big Changes
11. 4 ways Trump miscalculated the Comey firing
12. Source close to Comey says there were 2 reasons the FBI director was fired
13. Tapper: The real reasons Trump fired Comey - CNN Video
14. First on CNN: Comey sends farewell letter to friends and agents
15. WH: Comey tossed 'stick of dynamite' into DOJ - CNN Video
16. Comey committed 'atrocities,' Sarah Huckabee Sanders says
17. Rod Rosenstein: Trump's unlikely hatchet man
18. Senate intelligence committee subpoenas Michael Flynn
19. Europe view: American democracy isn't as strong as you think
20. Comey firing sends shockwaves through FBI rank-and-file
21. F.B.I. Director James Comey Is Fired by Trump
22. Days Before Firing, Comey Asked for More Resources for Russia Inquiry
23. Updates and Reactions to F.B.I. Director Comey’s Firing
24. Opinion | Trump’s Firing of Comey Is All About the Russia Inquiry
25. Jimmy Kimmel Responds to Critics Over Health Care
26. In Trump’s Firing of James Comey, Echoes of Watergate
27. Why Everything We Know About Salt May Be Wrong
28. How Homeownership Became the Engine of American Inequality
29. The Birth of a Mother
30. How a 23-Year-Old With Mild Anxiety and a Charmed Life Became the Lying, Sobbing, Lovesick Toast of Broadway
31. Here’s how an independent investigation into Trump and Russia would happen
32. Presence of Russian photographer in Oval Office raises alarms
33. Why Trump expected only applause when he told Comey, ‘You’re fired.’
34. Analysis | Mitch McConnell just shut down any hopes Democrats had of an independent Russia investigation
35. Analysis | Bob Woodward on Trump-Watergate comparisons: &#8216;Let&#8217;s see what the evidence is&#8217;
36. Furor over Comey firing grows with news that he sought resources for Russia investigation before his dismissal
37. Senate Intelligence Committee subpoenas documents from Flynn in Russia probe
38. The Daily 202: Firing FBI director Comey is already backfiring on Trump. It’s only going to get worse.
39. The weird moment on Colbert’s show that captured our political whiplash
40. Wait, ‘Can He Do That?’
41. General election 2017: Labour manifesto draft leaked
42. Trump 'considered firing Comey since taking office'
43. Trump Russia meeting: Lavrov praises Trump and Tillerson after talks
44. Drayton Manor: Park to stay closed after Evha Jannath's death
45. Women charged with terror offences and conspiracy to murder
46. HIV life expectancy 'near normal' thanks to new drugs
47. Mobile phone row driver runs cyclist over
48. Mobile phone row driver runs cyclist over
49. 'Love rival' Cardiff woman guilty over crash death
50. Snap shares slide as growth slows
51. 'They will be thanking me': Trump defends firing of FBI chief Comey
52. So the budget is out: What happens now?
53. Live: Morrison stands ground as fight brews over bank tax
54. Keep temporary tax on the wealthy, Shorten to argue in budget reply
55. Handy tips to avoid being locked up by US Immigration
56. Lawyer still haunted by client's execution in Scottish prison's 'hanging shed'
57. Classroom becomes isolation ward as spectre of cholera haunts South Sudan
58. How not to blow your top when you're on the phone to your telco
59. 'I struggle to make $100': Taxi drivers' desperation over Uber
60. Can Australians learn to love local fish?
61. Defiant Trump courts Russia as probe calls grow
62. Comey dismissal unfolds in uniquely Trumpian way
63. Lavrov delivers a barbed script for the Americans
64. Little-known prosecutor under scrutiny over FBI sacking
65. China ‘New Silk Road’ investment fell in 2016
66. Global ETF assets reach $4tn
67. Snap shares slump as debut earnings miss forecasts
68. Whole Foods replaces chairman and finance chief
69. Cable cowboy John Malone views a new landscape
70. Toshiba technology unit sale grows more complex
71. Flynn Subpoenaed in Russia Probe by Senate Intelligence Panel
72. After Comey, Justice Must Be Served
73. What Happens to Trump-Russia Probe After Comey: QuickTake Q&A
74. Mobius Says Low Market Volatility Is Tied to Social Media
75. Aetna Is Latest Health Insurer to Quit Obamacare Markets
76. Uber Greyball Investigation Expands to Multiple U.S. Cities
77. United Directors Sued Over Ousted CEO's Severance Package
78. Whole Foods Names Panera CEO to Board as It Faces Down Jana
79. DeVos Booed Loudly by Graduates at Historically Black College
80. Boeing Halts Flights of 737 Max on Fault in GE-Safran Engine
81. By attacking Kurdish allies of America, Turkey risks confrontation
82. President Trump abruptly sacks the head of the FBI
83. Moon Jae-in wins South Korea’s presidential elections by a landslide
84. Why are Russian opposition leaders’ faces turning green?
85. “Girlboss” is another disappointing take on female entrepreneurship
86. The Tory and Labour parties fail to face the realities of Brexit
87. A mixed April for United Airlines
88. Mumbai plans the world’s tallest statue
89. The Kushners put controversial investor visas in the spotlight
90. Apocalyptic fiction and the Doomsday Clock

All descriptions for the 100 news posts are printed in the same way as above; their presence is useful for adding accuracy for our sentiment analysis by providing more words on the same topic as the titles.



In [79]:

    
j = 0
while j < n_rows:
    blob1 = TextBlob(str(articles[j]['description']))
    print(1 + j, ". ", blob1, sep = "")
    j += 1









    



1. Former Federal Bureau of Investigation Director James Comey asked the Justice Department last week for more resources for the agency’s investigation into Russian interference in the 2016 election, a U.S. official said.
2. Trump Fires FBI Director James Comey: Live Coverage of the Fallout
3. The firing of James Comey has cast a harsh spotlight on Deputy Attorney General Rod Rosenstein, who is less than two weeks into a job that he reached with bipartisan Senate support.
4. The Justice Department moved on Wednesday to find a temporary successor for fired FBI Director James Comey, as Attorney General Jeff Sessions and his top deputy interviewed five candidates amid continuing fallout over the controversial dismissal.
5. President Trump’s firing of FBI Director James Comey thrust a debate over the appointment of a special prosecutor to the forefront of the Senate’s agenda, complicating an already halting effort to pass a health-care bill and craft a tax overhaul this year.
6. President Trump weighed in publicly for the first time on his firing of FBI Director James Comey while top Senate Democrats questioned the timing of the ouster of the man who was investigating Trump campaign aides’ ties to Russia.
7. The leaders of the Senate Intelligence Committee said they had requested the information in late April, but the former national security adviser had declined to cooperate.
8. Snap Inc. reported a $2.21 billion loss in its first quarter as a publicly traded company, magnifying the uphill battle the parent of Snapchat faces in establishing a profitable business while competing with social-media giants like Facebook and Twitter.
9. The U.S. is beefing up joint intelligence efforts with Turkey to help that government better target terrorists in the region, in an apparent bid to alleviate Turkish anxieties as the Pentagon implements a plan to arm Kurdish forces operating inside Syria.
10. Whole Foods is dramatically reshaping its board in an effort to show it is open to change after an activist investor last month publicly urged the organic-grocery chain to explore a sale and speed up its turnaround efforts.
11. Donald Trump has been president for 110 days. In that time, he has fired an acting attorney general, his national security adviser, dozens of federal prosecutors, including one who was investigating him, and, on Tuesday night, the director of the FBI, James Comey.
12. There are two reasons why President Donald Trump fired James Comey, according to a source close to the now-former FBI director:
13. CNN's Jake Tapper says one of the reasons President Donald Trump fired James Comey was that the former FBI director would not give him assurance of personal loyalty.
14. Former FBI Director James Comey, who was fired Tuesday by President Donald Trump, on Wednesday sent a letter to friends and agents. Here is the text of that letter, which was obtained by CNN.
15. When asked about why President Trump was moved to fire James Comey when he previously praised him, Deputy press secretary Sarah Huckabee Sanders said that circumstances change when becoming president and throwing "a stick of dynamite" in the Department of Justice is a problem that can't be ignored.
16. FBI Director James Comey committed "atrocities" when investigating Hillary Clinton's emails, deputy White House press secretary Sarah Huckabee Sanders said Wednesday.
17. Deputy Attorney General Rod Rosenstein emerged this week as perhaps the most unlikely character in the politically charged drama over the firing of FBI Director James Comey.
18. The Senate intelligence committee Wednesday issued a subpoena to former National Security Adviser Michael Flynn for documents regarding his interactions with Russian officials.
19. To Europeans, Trump's firing of James Comey is proof that even American democracy is not immune from the threat of authoritarian rule, writes Kate Maltby.
20. News of James Comey's firing Tuesday night sent shockwaves through the FBI, where the dismissal of the generally well-liked bureau director immediately impacted the thousands of agents nationwide.
21. President Trump abruptly terminated Mr. Comey, who was leading an investigation into whether Mr. Trump’s advisers colluded with Russia to influence the election.
22. Separately, the Senate Intelligence Committee accelerated its inquiry, issuing a subpoena to Michael T. Flynn, President Trump’s former national security adviser.
23. Mr. Comey’s dismissal drew immediate rebukes from Democrats, who worried about the ramifications for the F.B.I. investigation of Russian meddling in the election.
24. The president has now decisively crippled the F.B.I.’s ability to carry out an investigation of him and his associates.
25. Mr. Kimmel’s monologue about his son last week came with a political message. Last night he joked that “it was insensitive” to say all children should have health care.
26. Not since Watergate has a president dismissed the person leading an investigation bearing on him, and the dismissal drew instant comparisons to the Saturday Night Massacre.
27. Research on Russian cosmonauts suggests that salt makes you hungry but not thirsty, and may help burn calories.
28. An enormous entitlement in the tax code props up home prices — and overwhelmingly benefits the wealthy and the upper middle class.
29. Becoming a mother is one of the most significant physical and psychological changes a woman will ever experience.
30. Ben Platt wrecks himself onstage in “Dear Evan Hansen.” Surviving it takes practice — and has made him a favorite to win a Tony Award.
31. Sorting out the &ldquo;special prosecutors&rdquo; from the &ldquo;independent commissions.&rdquo;
32. Former U.S. intelligence officials flagged a potential security breach.
33. A president who operates in the moment sees no profit in considering the lessons and contradictions of the past.
34. Besides frustrating Senate GOP leaders into giving in, Democrats have little to no leverage to force Congress to start an independent investigation into Trump-Russia ties.
35. &quot;He can do whatever he wants, within perhaps reasonable limits, so he's got the power.&quot;
36. Republicans and Democrats said Comey&rsquo;s dismissal will frustrate bipartisan efforts to investigate Russian interference in the 2016 election.
37. Flynn had declined the committee&rsquo;s request for materials it wanted to review.
38. As GOP support cracks, POTUS will likely come to regret this blunder
39. Was Comey the bad guy? Or Trump? The grounds are forever shifting under our feet these days.
40. A podcast that explores the powers and limitations of the American presidency
41. The document includes plans to nationalise parts of the energy industry and scrap tuition fees.
42. Democratic senators say the FBI director was seeking more resources for his Trump-Russia probe.
43. Russia's foreign minister said the talks were dominated by improving relations and the war in Syria.
44. Drayton Manor Theme Park will shut for a second day as Evha Jannath's family call for full inquiry.
45. A woman shot in a police raid is among three charged with terror offences and conspiracy to murder.
46. Newer medications have fewer side effects and are more efficient at stopping the virus.
47. A motorist who chased a cyclist then deliberately crashed into him, sending him flying into a tree, has been jailed for three years.
48. A motorist who chased a cyclist then deliberately crashed into him, sending him flying into a tree, has been jailed for three years.
49. Sophie Taylor died when her car hit a block of flats as she was chased by Melissa Pesticcio.
50. Snapchat's number of daily active users rose just 5% to 166 million in the first quarter of 2017.
51. US President Donald Trump defends his firing of FBI director James Comey.
52. We take you through the options for the Government wanting to avoid its budget turning into a bunch of zombies.
53. Treasury officials will brief the nation's biggest banks today about the Government's proposed new $6 billion levy. Follow live.
54. Opposition Leader Bill Shorten will use his budget reply speech to argue the Federal Government should maintain a temporary tax on high-income earners.
55. Overnight Canberra man Baxter Reid was released from US detention more than a week after overstaying his visa by 90 minutes, prompting the question: How easy is it to run afoul of US Immigration?
56. Most people have a moment in their working life that shapes them. For retired Scottish defence lawyer Len Murray, it was the hanging of a young client.
57. In a school classroom in drought-hit South Sudan, parents and their children lie on a bare concrete floor as they wait to find out if they will be among the first victims of a feared cholera outbreak.
58. Consumer advocates say the almost 66,000 complaints to Australia's telecommunications watchdog in the second of half last year are just the tip of the iceberg. Here's what to do if you're having problems with your telephone or internet provider.
59. Perth taxi drivers say they face losing their homes and their retirement plans have been shattered by the rise of Uber, as they cling desperately to the hope a government review may provide them with compensation.
60. Australians are opting for farmed or overseas-caught fish over local species because they simply don't recognise it.
61. None
62. None
63. None
64. None
65. None
66. None
67. None
68. None
69. None
70. None
71. The Senate Intelligence Committee has subpoenaed documents from Michael Flynn, President Donald Trump’s fired national security adviser, in a sign the bipartisan probe will continue full speed ahead one day after Trump terminated FBI Director James Comey.
72. Congress needs to get serious about holding the president accountable.
73. At the moment, the criminal probe into Russia’s meddling in the 2016 U.S. presidential election has all the stability of the Steinbrenner-era Yankees. With the U.S. attorney general self-sidelined, and the FBI director freshly fired, a makeshift lineup of law-enforcement officials now oversees an inquiry that has implications for American foreign policy, American politics and the Trump presidency. Calls for an outside prosecutor are getting louder.
74. Mark Mobius has a left-field theory on why volatility in global stock markets is so low.
75. Aetna Inc. will leave the few remaining states where it had been selling Obamacare plans next year, making it the latest health insurer to pull out of the health law as Republicans attack the program as failing and work to dismantle it.
76. The city of Portland, Oregon, is starting its own investigation of Uber Technologies Inc.’s use of software to evade regulators while a U.S. Justice Department criminal probe continues in that city along with Philadelphia and Austin, Texas, according to officials.
77. United Continental Holdings Inc. directors were sued by a pension fund for granting a $37 million severance package to the carrier’s former chief executive officer, who was ousted in a bribery scandal.
78. Whole Foods Market Inc., the ailing organic-grocery chain, is doing what it can to avoid a fight with activist investor Jana Partners.
79. A week after President Donald Trump suggested his administration might cut $25 million in capital funding to historically black colleges and universities, Education Secretary Betsy DeVos told Bethune-Cookman University’s class of 2017 that she “is fully committed to your success” -- a line that was met by deafening boos from the students and guests.
80. Boeing Co. said it would temporarily suspend flights of its new 737 Max jetliner because of a potential manufacturing flaw in the engines, marring the commercial debut for the fastest-selling plane in company history.
81. In both Syria and Iraq, the danger is mounting
82. The White House has changed its tune about James Comey, with far-reaching consequences
83. As remarkable is how well the country’s conservatives did
84. Critics of the Kremlin are being splashed with zelyonka, a green liquid
85. Compelling stories about successful businesswomen are scarce
86. The UK will inevitably become a less attractive home for business after Brexit. The problem can be tackled, but only if politicians face up to reality
87. Flyers, it seems, have short memories
88. But it won’t come cheap
89. But the EB-5 visa programme should be reformed and expanded, not scrapped
90. As we edge closer to catastrophe, should we expect more doom-laden literature?

PART 2: WORD CLOUD

A word cloud of news tiltles can provide us with a direct and vivid impression of the most frequently discussed topics in today's news reports. Topic/person/event that prevails among the top news pieces appears in the largest font, occupies the center space and displays the most salient colors.

In a visually pleasant way, a word cloud gives us a hint for the news sentiment of the day.

Code referred to https://github.com/amueller/word_cloud/blob/master/examples/simple.py



In [80]:

    
#write the csv file into a txt file called entire_text.txt
contents = csv.reader(open('result.csv','r'))
texts = open('entire_text.txt','w')
list_of_text = []
for row in contents:
    line = row[2].encode('utf-8')
    line = str(line.decode())
    list_of_text.append(line)
texts.writelines(list_of_text)



In [81]:

    
text=open("entire_text.txt",'r')
text=text.read()
wordcloud = WordCloud().generate(text)



In [82]:

    
#display the generated image
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")









    Out[82]:





(-0.5, 399.5, 199.5, -0.5)



In [83]:

    
# increase max_font_size and change backgroud color to white 
wordcloud = WordCloud(max_font_size=40).generate(text)
wordcloud = WordCloud(max_words=200,background_color='white',max_font_size=100).generate(text)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

PART 3: SENTIMENT ANALYSIS

We use .sentiment method from TextBlob to calculate polatiry and subjectivity of each title. The sentiment property returns an output in the form of namedtuple (Sentiment(polarity, subjectivity)). The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.



In [84]:

    
# a loop to show sentiment analysis results of the 100 titles
n = 0
while n < n_rows:
    print(TextBlob(articles[n]['title']).sentiment)
    n += 1









    



Sentiment(polarity=0.5, subjectivity=0.5)
Sentiment(polarity=0.13636363636363635, subjectivity=0.5)
Sentiment(polarity=-0.2, subjectivity=0.7)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.25, subjectivity=0.3333333333333333)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.1, subjectivity=0.25)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.2, subjectivity=0.30000000000000004)
Sentiment(polarity=0.25, subjectivity=0.3333333333333333)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=-0.5, subjectivity=0.5)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.21666666666666665, subjectivity=0.36666666666666664)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.5, subjectivity=0.5)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=-0.5, subjectivity=0.9)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.3333333333333333, subjectivity=0.5)
Sentiment(polarity=0.0, subjectivity=0.125)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=-0.05, subjectivity=0.7)
Sentiment(polarity=-0.07777777777777779, subjectivity=0.20694444444444446)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=-0.13333333333333333, subjectivity=0.5333333333333333)
Sentiment(polarity=-0.25, subjectivity=0.55)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.05000000000000002, subjectivity=0.5)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=-0.1, subjectivity=0.1)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.1465909090909091, subjectivity=0.4261363636363636)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.8)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.13636363636363635, subjectivity=0.5)
Sentiment(polarity=0.5, subjectivity=1.0)
Sentiment(polarity=0.6, subjectivity=0.9)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.5, subjectivity=0.5)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.25, subjectivity=0.3)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.375, subjectivity=1.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.13636363636363635, subjectivity=0.45454545454545453)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.2, subjectivity=0.4)
Sentiment(polarity=0.13636363636363635, subjectivity=0.45454545454545453)
Sentiment(polarity=0.1, subjectivity=0.45)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.016666666666666666, subjectivity=0.18333333333333332)
Sentiment(polarity=0.5, subjectivity=0.9)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.022222222222222213, subjectivity=0.34444444444444444)
Sentiment(polarity=-0.033333333333333326, subjectivity=0.6166666666666667)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=-0.125, subjectivity=1.0)
Sentiment(polarity=0.3, subjectivity=0.2)
Sentiment(polarity=-0.1, subjectivity=0.15)
Sentiment(polarity=-0.3, subjectivity=0.4333333333333333)
Sentiment(polarity=-0.5, subjectivity=0.29999999999999993)
Sentiment(polarity=0.0, subjectivity=0.25)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.55, subjectivity=0.95)
Sentiment(polarity=0.0, subjectivity=0.0)

From the TextBlob module, the .sentiment method returns results in the form of namedtuples. Elements in namedtuples can only be printed after being appended into the form of a list. Therefore, we use a list named tests_title to store all the results from our sentiment tests on the news titles.



In [85]:

    
N = 0
tests_title = []

while N < n_rows:
    tests_title.append(TextBlob(articles[N]['title']).sentiment)
    N += 1

We create a list named list_polarity_title to store polarity scores for news titles.



In [86]:

    
list_polarity_title = [] # this list contains all titles polarity scores.

for test in tests_title:
    list_polarity_title.append(test.polarity)

Similarly, we create a list of subjectivity scores for news titles.



In [87]:

    
list_subjectivity_title = [] # this list contains all titles subjectivity scores.

for test in tests_title:
    list_subjectivity_title.append(test.subjectivity)

'description'

We use .sentiment method again to calculate polarity and subjectivity of each description. As mentioned above, analysis on descritions make the final results more versatile and hopefully more accurate.



In [88]:

    
m = 0
while m < n_rows:
    print(TextBlob(articles[m]['description']).sentiment)
    m += 1









    



Sentiment(polarity=0.125, subjectivity=0.14166666666666666)
Sentiment(polarity=0.13636363636363635, subjectivity=0.5)
Sentiment(polarity=-0.10555555555555556, subjectivity=0.4222222222222222)
Sentiment(polarity=0.3666666666666667, subjectivity=0.65)
Sentiment(polarity=0.35714285714285715, subjectivity=0.5714285714285714)
Sentiment(polarity=0.25, subjectivity=0.3)
Sentiment(polarity=-0.15, subjectivity=0.3)
Sentiment(polarity=0.125, subjectivity=0.19999999999999998)
Sentiment(polarity=0.275, subjectivity=0.425)
Sentiment(polarity=0.05, subjectivity=0.25833333333333336)
Sentiment(polarity=0.02500000000000001, subjectivity=0.25)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.15)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.14166666666666666, subjectivity=0.5083333333333334)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.012500000000000011, subjectivity=0.4)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.05000000000000002, subjectivity=0.5)
Sentiment(polarity=-0.125, subjectivity=1.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.07777777777777778)
Sentiment(polarity=0.0, subjectivity=0.6666666666666666)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.2, subjectivity=0.58)
Sentiment(polarity=0.265, subjectivity=0.4935714285714286)
Sentiment(polarity=0.65, subjectivity=0.7)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.5)
Sentiment(polarity=-0.25, subjectivity=0.25)
Sentiment(polarity=-0.19583333333333333, subjectivity=0.5083333333333333)
Sentiment(polarity=0.2, subjectivity=0.35)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=1.0)
Sentiment(polarity=-0.6999999999999998, subjectivity=0.6666666666666666)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.5, subjectivity=0.5)
Sentiment(polarity=-0.125, subjectivity=0.125)
Sentiment(polarity=0.175, subjectivity=0.275)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.5, subjectivity=0.5)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.17916666666666667, subjectivity=0.47083333333333327)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0909090909090909, subjectivity=0.4292929292929293)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.4666666666666667, subjectivity=0.6666666666666667)
Sentiment(polarity=0.3, subjectivity=0.45)
Sentiment(polarity=0.15, subjectivity=0.24444444444444446)
Sentiment(polarity=-0.05555555555555555, subjectivity=0.07777777777777778)
Sentiment(polarity=-0.6, subjectivity=1.0)
Sentiment(polarity=0.0, subjectivity=0.17857142857142858)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.35, subjectivity=0.55)
Sentiment(polarity=-0.3333333333333333, subjectivity=0.6666666666666666)
Sentiment(polarity=-0.024999999999999998, subjectivity=0.24642857142857144)
Sentiment(polarity=0.0, subjectivity=0.15)
Sentiment(polarity=0.09999999999999999, subjectivity=0.3333333333333333)
Sentiment(polarity=0.06666666666666665, subjectivity=0.55)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.2, subjectivity=0.4)
Sentiment(polarity=0.06666666666666667, subjectivity=0.21666666666666667)
Sentiment(polarity=0.04545454545454545, subjectivity=0.48484848484848486)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.75, subjectivity=0.75)
Sentiment(polarity=-0.2, subjectivity=0.3)
Sentiment(polarity=0.525, subjectivity=0.7749999999999999)
Sentiment(polarity=0.15833333333333335, subjectivity=0.7666666666666666)
Sentiment(polarity=0.0, subjectivity=0.3)
Sentiment(polarity=0.4, subjectivity=0.7)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.5, subjectivity=0.5)



In [89]:

    
M = 0
tests_description = []

while M < n_rows:
    tests_description.append(TextBlob(articles[M]['description']).sentiment)
    M += 1

We create a list of polarity scores for news descriptions by appending each polarity score to the list named list_polarity_description.



In [90]:

    
list_polarity_description = [] # this list contains all descriptions' polarity scores.

for test in tests_description:
    list_polarity_description.append(test.polarity)

Same as above, we create a list of subjectivity for news descriptions.



In [91]:

    
list_subjectivity_description = [] # this list contains all descriptions' subjectivity scores.

for test in tests_description:
    list_subjectivity_description.append(test.subjectivity)

Now we have four lists of data:

list_polarity_title
list_subjectivity_title
list_polarity_description
list_subjectivity_description

We convert the four lists of data into one dataframe for drawing plots.



In [92]:

    
total_score = [list_polarity_title, list_subjectivity_title, list_polarity_description, list_subjectivity_description]
labels = ['T_polarity', 'T_subjectivity', 'D_polarity', 'D_subjectivity']
df = pd.DataFrame.from_records(total_score, index = labels)
df









    Out[92]:






  
    
      
      0
      1
      2
      3
      4
      5
      6
      7
      8
      9
      ...
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
    
  
  
    
      T_polarity
      0.500000
      0.136364
      -0.200000
      0.000000
      0.000000
      0.00
      0.00
      0.250000
      0.000
      0.100000
      ...
      0.0
      -0.125
      0.30
      -0.10
      -0.300000
      -0.500000
      0.00
      0.0
      0.55
      0.0
    
    
      T_subjectivity
      0.500000
      0.500000
      0.700000
      0.000000
      0.000000
      0.00
      0.00
      0.333333
      0.000
      0.250000
      ...
      0.0
      1.000
      0.20
      0.15
      0.433333
      0.300000
      0.25
      0.0
      0.95
      0.0
    
    
      D_polarity
      0.125000
      0.136364
      -0.105556
      0.366667
      0.357143
      0.25
      -0.15
      0.125000
      0.275
      0.050000
      ...
      0.0
      0.000
      0.75
      -0.20
      0.525000
      0.158333
      0.00
      0.4
      0.00
      0.5
    
    
      D_subjectivity
      0.141667
      0.500000
      0.422222
      0.650000
      0.571429
      0.30
      0.30
      0.200000
      0.425
      0.258333
      ...
      0.0
      0.000
      0.75
      0.30
      0.775000
      0.766667
      0.30
      0.7
      0.00
      0.5
    
  

4 rows × 90 columns

We transpose the dataframe to make it compatible with the .plot() method.



In [93]:

    
df = df.transpose() 
df









    Out[93]:






  
    
      
      T_polarity
      T_subjectivity
      D_polarity
      D_subjectivity
    
  
  
    
      0
      0.500000
      0.500000
      0.125000
      0.141667
    
    
      1
      0.136364
      0.500000
      0.136364
      0.500000
    
    
      2
      -0.200000
      0.700000
      -0.105556
      0.422222
    
    
      3
      0.000000
      0.000000
      0.366667
      0.650000
    
    
      4
      0.000000
      0.000000
      0.357143
      0.571429
    
    
      5
      0.000000
      0.000000
      0.250000
      0.300000
    
    
      6
      0.000000
      0.000000
      -0.150000
      0.300000
    
    
      7
      0.250000
      0.333333
      0.125000
      0.200000
    
    
      8
      0.000000
      0.000000
      0.275000
      0.425000
    
    
      9
      0.100000
      0.250000
      0.050000
      0.258333
    
    
      10
      0.000000
      0.000000
      0.025000
      0.250000
    
    
      11
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      12
      0.200000
      0.300000
      0.000000
      0.150000
    
    
      13
      0.250000
      0.333333
      0.000000
      0.000000
    
    
      14
      0.000000
      0.000000
      0.141667
      0.508333
    
    
      15
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      16
      -0.500000
      0.500000
      0.012500
      0.400000
    
    
      17
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      18
      0.216667
      0.366667
      0.000000
      0.000000
    
    
      19
      0.000000
      0.000000
      0.050000
      0.500000
    
    
      20
      0.000000
      0.000000
      -0.125000
      1.000000
    
    
      21
      0.500000
      0.500000
      0.000000
      0.000000
    
    
      22
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      23
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      24
      0.000000
      0.000000
      0.000000
      0.077778
    
    
      25
      0.000000
      0.000000
      0.000000
      0.666667
    
    
      26
      -0.500000
      0.900000
      0.000000
      0.000000
    
    
      27
      0.000000
      0.000000
      0.200000
      0.580000
    
    
      28
      0.000000
      0.000000
      0.265000
      0.493571
    
    
      29
      0.333333
      0.500000
      0.650000
      0.700000
    
    
      ...
      ...
      ...
      ...
      ...
    
    
      60
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      61
      0.375000
      1.000000
      0.000000
      0.000000
    
    
      62
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      63
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      64
      0.136364
      0.454545
      0.000000
      0.000000
    
    
      65
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      66
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      67
      0.200000
      0.400000
      0.000000
      0.000000
    
    
      68
      0.136364
      0.454545
      0.000000
      0.000000
    
    
      69
      0.100000
      0.450000
      0.000000
      0.000000
    
    
      70
      0.000000
      0.000000
      0.350000
      0.550000
    
    
      71
      0.000000
      0.000000
      -0.333333
      0.666667
    
    
      72
      0.000000
      0.000000
      -0.025000
      0.246429
    
    
      73
      0.016667
      0.183333
      0.000000
      0.150000
    
    
      74
      0.500000
      0.900000
      0.100000
      0.333333
    
    
      75
      0.000000
      0.000000
      0.066667
      0.550000
    
    
      76
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      77
      0.022222
      0.344444
      0.200000
      0.400000
    
    
      78
      -0.033333
      0.616667
      0.066667
      0.216667
    
    
      79
      0.000000
      0.000000
      0.045455
      0.484848
    
    
      80
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      81
      -0.125000
      1.000000
      0.000000
      0.000000
    
    
      82
      0.300000
      0.200000
      0.750000
      0.750000
    
    
      83
      -0.100000
      0.150000
      -0.200000
      0.300000
    
    
      84
      -0.300000
      0.433333
      0.525000
      0.775000
    
    
      85
      -0.500000
      0.300000
      0.158333
      0.766667
    
    
      86
      0.000000
      0.250000
      0.000000
      0.300000
    
    
      87
      0.000000
      0.000000
      0.400000
      0.700000
    
    
      88
      0.550000
      0.950000
      0.000000
      0.000000
    
    
      89
      0.000000
      0.000000
      0.500000
      0.500000
    
  

90 rows × 4 columns



In [94]:

    
# this plot shows scores for all 100 news posts.
df.plot()









    Out[94]:





<matplotlib.axes._subplots.AxesSubplot at 0x11c812208>

-Analysis by news press

Apparently, the 100 news posts standing alone aren't of much information. For a better perspective, we need to group scores by the press they belong to, under the assumption that posts from the same press are much more likely to embed a uniform tone. We create a list names new_T_polarity to store the sum of polarity scores of news titles for each press. The we do the same operation on subjectivity scores.



In [95]:

    
c_T_polarity = df['T_polarity']
new_T_polarity = []
B = 0
C = 0
while B < n_rows:
    sum = 0
    while C < B + 10:
        sum += c_T_polarity[C]
        C += 1
    new_T_polarity.append(sum)
    B += 10
new_T_polarity
# The press are in the order as: wsj, cnn, nyt, wsp, bbc, abc, google, ft, bloomberg and economist.









    Out[95]:





[0.78636363636363626,
 0.16666666666666666,
 0.33333333333333331,
 -0.51111111111111107,
 0.096590909090909116,
 1.9863636363636363,
 0.94772727272727275,
 0.50555555555555565,
 -0.17499999999999993]



In [96]:

    
c_T_subjectivity = df['T_subjectivity']
new_T_subjectivity = []
D = 0
E = 0
while D < n_rows:
    sum = 0
    while E < D + 10:
        sum += c_T_subjectivity[E]
        E += 1
    new_T_subjectivity.append(sum)
    D += 10
new_T_subjectivity









    Out[96]:





[2.2833333333333332,
 1.5,
 1.8999999999999999,
 2.115277777777778,
 1.8261363636363637,
 3.1999999999999997,
 2.7590909090909093,
 2.0444444444444443,
 3.2833333333333332]



In [97]:

    
c_D_polarity = df['D_polarity']
new_D_polarity = []
F = 0
G = 0
while F < n_rows:
    sum = 0
    while G < F + 10:
        sum += c_D_polarity[G]
        G += 1
    new_D_polarity.append(sum)
    F += 10
new_D_polarity









    Out[97]:





[1.4296176046176046,
 0.22916666666666671,
 0.98999999999999999,
 -0.94583333333333308,
 1.2291666666666667,
 0.35202020202020201,
 0.0,
 0.47045454545454546,
 2.1333333333333337]



In [98]:

    
c_D_subjectivity = df['D_subjectivity']
new_D_subjectivity = []
H = 0
I = 0
while H < n_rows:
    sum = 0
    while I < H + 10:
        sum += c_D_subjectivity[I]
        I += 1
    new_D_subjectivity.append(sum)
    H += 10
new_D_subjectivity









    Out[98]:





[3.7686507936507936,
 1.8083333333333336,
 3.518015873015873,
 3.2749999999999999,
 1.8708333333333331,
 3.0467532467532465,
 0.0,
 3.597943722943723,
 4.0916666666666668]



In [99]:

    
total_score_bypublishhouse = [new_T_polarity, new_T_subjectivity, new_D_polarity, new_D_subjectivity]
df1 = pd.DataFrame.from_records(total_score_bypublishhouse, index = labels)
df1









    Out[99]:






  
    
      
      0
      1
      2
      3
      4
      5
      6
      7
      8
    
  
  
    
      T_polarity
      0.786364
      0.166667
      0.333333
      -0.511111
      0.096591
      1.986364
      0.947727
      0.505556
      -0.175000
    
    
      T_subjectivity
      2.283333
      1.500000
      1.900000
      2.115278
      1.826136
      3.200000
      2.759091
      2.044444
      3.283333
    
    
      D_polarity
      1.429618
      0.229167
      0.990000
      -0.945833
      1.229167
      0.352020
      0.000000
      0.470455
      2.133333
    
    
      D_subjectivity
      3.768651
      1.808333
      3.518016
      3.275000
      1.870833
      3.046753
      0.000000
      3.597944
      4.091667



In [100]:

    
# change the column labels to press house.
new_columns = ['wsj', 'cnn', 'nyt', 'wsp', 'guardian', 'abc', 'ft', 'bloomberg', 'economist']
df1.columns = new_columns
df1









    Out[100]:






  
    
      
      wsj
      cnn
      nyt
      wsp
      guardian
      abc
      ft
      bloomberg
      economist
    
  
  
    
      T_polarity
      0.786364
      0.166667
      0.333333
      -0.511111
      0.096591
      1.986364
      0.947727
      0.505556
      -0.175000
    
    
      T_subjectivity
      2.283333
      1.500000
      1.900000
      2.115278
      1.826136
      3.200000
      2.759091
      2.044444
      3.283333
    
    
      D_polarity
      1.429618
      0.229167
      0.990000
      -0.945833
      1.229167
      0.352020
      0.000000
      0.470455
      2.133333
    
    
      D_subjectivity
      3.768651
      1.808333
      3.518016
      3.275000
      1.870833
      3.046753
      0.000000
      3.597944
      4.091667

Graph for scores by news press



In [101]:

    
#colors = [(x/10.0, x/20.0, 0.75) for x in range(n_rows)]

df1.plot(kind = 'bar', legend = True, figsize = (15, 2), colormap='Paired', grid = True)

# place the legend above the subplot and use all the expended width.
plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc=3,
           ncol=10, mode="expand", borderaxespad=0.)









    Out[101]:





<matplotlib.legend.Legend at 0x11c9aaeb8>



In [102]:

    
bar_color = 'orange'

row = df1.iloc[0]
row.plot(kind = 'bar', title = "Polarity for news titles by news press", color = bar_color, grid = True)









    Out[102]:





<matplotlib.axes._subplots.AxesSubplot at 0x11ca04ef0>

-Analysis by date

We have loaded news titles and descriptions over 2 weeks and stored them in a csv file called all_news.csv. We then calculated an average news polarity score for each day. We then then graph news polarity score to see how it has changed over time.



In [103]:

    
contents = csv.reader(open('all_news.csv','r', encoding = "ISO-8859-1"))
result = csv.writer(open('entire_result.csv','w'))



In [104]:

    
result.writerow(['Date','polarity'])   
for row in contents:
    comment = row[2]  
    blob = TextBlob(comment)
    polarity = blob.sentiment.polarity
    line = [row[0],polarity]
    result.writerow(line)



In [105]:

    
data = pd.read_csv('entire_result.csv')
data









    Out[105]:






  
    
      
      Date
      polarity
    
  
  
    
      0
      3/24/17
      -0.046528
    
    
      1
      3/24/17
      -0.046528
    
    
      2
      4/9/17
      0.000000
    
    
      3
      4/11/17
      -0.250000
    
    
      4
      4/11/17
      0.078571
    
    
      5
      4/12/17
      0.000000
    
    
      6
      4/12/17
      -0.400000
    
    
      7
      4/12/17
      0.000000
    
    
      8
      4/12/17
      0.250000
    
    
      9
      4/12/17
      0.000000
    
    
      10
      4/12/17
      0.033333
    
    
      11
      4/13/17
      0.000000
    
    
      12
      4/13/17
      -0.100000
    
    
      13
      4/13/17
      -0.050000
    
    
      14
      4/13/17
      0.500000
    
    
      15
      4/13/17
      -0.200000
    
    
      16
      4/13/17
      -0.062500
    
    
      17
      4/13/17
      0.111111
    
    
      18
      4/13/17
      0.100000
    
    
      19
      4/13/17
      0.500000
    
    
      20
      4/13/17
      0.112121
    
    
      21
      4/13/17
      -0.100000
    
    
      22
      4/13/17
      0.000000
    
    
      23
      4/13/17
      0.187500
    
    
      24
      4/13/17
      -0.011111
    
    
      25
      4/13/17
      0.250000
    
    
      26
      4/13/17
      -0.350000
    
    
      27
      4/13/17
      0.000000
    
    
      28
      4/13/17
      -0.031250
    
    
      29
      4/13/17
      0.350000
    
    
      ...
      ...
      ...
    
    
      909
      5/3/17
      -0.050000
    
    
      910
      5/3/17
      0.137273
    
    
      911
      5/3/17
      0.000000
    
    
      912
      5/3/17
      0.128788
    
    
      913
      5/3/17
      0.069444
    
    
      914
      5/3/17
      -0.100000
    
    
      915
      5/3/17
      0.000000
    
    
      916
      5/3/17
      -0.100000
    
    
      917
      5/3/17
      0.200000
    
    
      918
      5/3/17
      0.000000
    
    
      919
      5/3/17
      0.000000
    
    
      920
      5/3/17
      -0.075000
    
    
      921
      5/3/17
      0.160000
    
    
      922
      5/3/17
      -0.100000
    
    
      923
      5/3/17
      0.000000
    
    
      924
      5/3/17
      0.166667
    
    
      925
      5/3/17
      -0.250000
    
    
      926
      5/3/17
      0.000000
    
    
      927
      5/3/17
      0.250000
    
    
      928
      5/3/17
      -0.050000
    
    
      929
      5/3/17
      0.000000
    
    
      930
      5/3/17
      0.203333
    
    
      931
      5/3/17
      -0.305556
    
    
      932
      5/3/17
      0.000000
    
    
      933
      5/3/17
      0.000000
    
    
      934
      5/3/17
      0.000000
    
    
      935
      5/3/17
      0.000000
    
    
      936
      5/4/17
      0.350000
    
    
      937
      5/4/17
      -0.281818
    
    
      938
      5/4/17
      0.000000
    
  

939 rows × 2 columns



In [106]:

    
#group the data by date
data=data.groupby('Date', as_index=False)['polarity'].mean()  
#convert column "Date" to a date data type 
data['Date'] = pd.to_datetime(data['Date'])
#sort the data by date ascending
data=data.sort_values(by="Date", axis=0, ascending=True, inplace=False, kind='quicksort')
data

Graph for scores by date



In [107]:

    
data.plot(x=data["Date"],kind = 'bar',title='Polarity for news titles by date',grid = True, color = 'orange')









    Out[107]:





<matplotlib.axes._subplots.AxesSubplot at 0x11caa12e8>

Part 4: S&P 500 INDEX

Using the yahoo_finance module in Python, we will eventually compare the sentiment analysis of the news posts with the movement of the market index.



In [108]:

    
from yahoo_finance import Share

# '^GSPC' is the market symble for S&P 500 Index
yahoo = Share('^GSPC')
print(yahoo.get_open())



In [109]:

    
print(yahoo.get_price())



In [110]:

    
print(yahoo.get_trade_datetime())









    



2017-05-09 20:38:00 UTC+0000



In [111]:

    
from pprint import pprint
pprint(yahoo.get_historical('2017-04-09', '2017-05-09'))









    



[{'Adj_Close': '2396.919922',
  'Close': '2396.919922',
  'Date': '2017-05-09',
  'High': '2403.870117',
  'Low': '2392.439941',
  'Open': '2401.580078',
  'Symbol': '%5eGSPC',
  'Volume': '3653590000'},
 {'Adj_Close': '2399.379883',
  'Close': '2399.379883',
  'Date': '2017-05-08',
  'High': '2401.360107',
  'Low': '2393.919922',
  'Open': '2399.939941',
  'Symbol': '%5eGSPC',
  'Volume': '3429440000'},
 {'Adj_Close': '2399.290039',
  'Close': '2399.290039',
  'Date': '2017-05-05',
  'High': '2399.290039',
  'Low': '2389.379883',
  'Open': '2392.370117',
  'Symbol': '%5eGSPC',
  'Volume': '3540140000'},
 {'Adj_Close': '2389.52002',
  'Close': '2389.52002',
  'Date': '2017-05-04',
  'High': '2391.429932',
  'Low': '2380.350098',
  'Open': '2389.790039',
  'Symbol': '%5eGSPC',
  'Volume': '4362540000'},
 {'Adj_Close': '2388.129883',
  'Close': '2388.129883',
  'Date': '2017-05-03',
  'High': '2389.820068',
  'Low': '2379.75',
  'Open': '2386.50',
  'Symbol': '%5eGSPC',
  'Volume': '3893990000'},
 {'Adj_Close': '2391.169922',
  'Close': '2391.169922',
  'Date': '2017-05-02',
  'High': '2392.929932',
  'Low': '2385.820068',
  'Open': '2391.050049',
  'Symbol': '%5eGSPC',
  'Volume': '3813680000'},
 {'Adj_Close': '2388.330078',
  'Close': '2388.330078',
  'Date': '2017-05-01',
  'High': '2394.48999',
  'Low': '2384.830078',
  'Open': '2388.50',
  'Symbol': '%5eGSPC',
  'Volume': '3199240000'},
 {'Adj_Close': '2384.199951',
  'Close': '2384.199951',
  'Date': '2017-04-28',
  'High': '2393.679932',
  'Low': '2382.360107',
  'Open': '2393.679932',
  'Symbol': '%5eGSPC',
  'Volume': '3718270000'},
 {'Adj_Close': '2388.77002',
  'Close': '2388.77002',
  'Date': '2017-04-27',
  'High': '2392.100098',
  'Low': '2382.679932',
  'Open': '2389.699951',
  'Symbol': '%5eGSPC',
  'Volume': '4098460000'},
 {'Adj_Close': '2387.449951',
  'Close': '2387.449951',
  'Date': '2017-04-26',
  'High': '2398.159912',
  'Low': '2386.780029',
  'Open': '2388.97998',
  'Symbol': '%5eGSPC',
  'Volume': '4105920000'},
 {'Adj_Close': '2388.610107',
  'Close': '2388.610107',
  'Date': '2017-04-25',
  'High': '2392.47998',
  'Low': '2381.149902',
  'Open': '2381.51001',
  'Symbol': '%5eGSPC',
  'Volume': '3995240000'},
 {'Adj_Close': '2374.149902',
  'Close': '2374.149902',
  'Date': '2017-04-24',
  'High': '2376.97998',
  'Low': '2369.189941',
  'Open': '2370.330078',
  'Symbol': '%5eGSPC',
  'Volume': '3690650000'},
 {'Adj_Close': '2348.689941',
  'Close': '2348.689941',
  'Date': '2017-04-21',
  'High': '2356.179932',
  'Low': '2344.51001',
  'Open': '2354.73999',
  'Symbol': '%5eGSPC',
  'Volume': '3503360000'},
 {'Adj_Close': '2355.840088',
  'Close': '2355.840088',
  'Date': '2017-04-20',
  'High': '2361.370117',
  'Low': '2340.909912',
  'Open': '2342.689941',
  'Symbol': '%5eGSPC',
  'Volume': '3647420000'},
 {'Adj_Close': '2338.169922',
  'Close': '2338.169922',
  'Date': '2017-04-19',
  'High': '2352.629883',
  'Low': '2335.050049',
  'Open': '2346.790039',
  'Symbol': '%5eGSPC',
  'Volume': '3519900000'},
 {'Adj_Close': '2342.189941',
  'Close': '2342.189941',
  'Date': '2017-04-18',
  'High': '2348.350098',
  'Low': '2334.540039',
  'Open': '2342.530029',
  'Symbol': '%5eGSPC',
  'Volume': '3269840000'},
 {'Adj_Close': '2349.01001',
  'Close': '2349.01001',
  'Date': '2017-04-17',
  'High': '2349.139893',
  'Low': '2332.51001',
  'Open': '2332.620117',
  'Symbol': '%5eGSPC',
  'Volume': '2824710000'},
 {'Adj_Close': '2328.949951',
  'Close': '2328.949951',
  'Date': '2017-04-13',
  'High': '2348.26001',
  'Low': '2328.949951',
  'Open': '2341.97998',
  'Symbol': '%5eGSPC',
  'Volume': '3143890000'},
 {'Adj_Close': '2344.929932',
  'Close': '2344.929932',
  'Date': '2017-04-12',
  'High': '2352.719971',
  'Low': '2341.179932',
  'Open': '2352.149902',
  'Symbol': '%5eGSPC',
  'Volume': '3196950000'},
 {'Adj_Close': '2353.780029',
  'Close': '2353.780029',
  'Date': '2017-04-11',
  'High': '2355.219971',
  'Low': '2337.25',
  'Open': '2353.919922',
  'Symbol': '%5eGSPC',
  'Volume': '3117420000'},
 {'Adj_Close': '2357.159912',
  'Close': '2357.159912',
  'Date': '2017-04-10',
  'High': '2366.370117',
  'Low': '2351.50',
  'Open': '2357.159912',
  'Symbol': '%5eGSPC',
  'Volume': '2785410000'}]

We create a .csv file called yahoo.csv to store the financial data upon each import.



In [119]:

    
from yahoo_finance import Share
yahoo = Share('^GSPC')
dataset = yahoo.get_historical('2017-04-27','2017-05-09')
result = csv.writer(open('yahoo.csv','w'))
result.writerow(['Date','Low','High'])
for i in range(0,len(dataset)):
    line = [dataset[i]['Date'],dataset[i]['Low'],dataset[i]['High']]
    result.writerow(line)



In [120]:

    
yahoo = pd.read_csv('yahoo.csv')
yahoo









    Out[120]:






  
    
      
      Date
      Low
      High
    
  
  
    
      0
      2017-05-09
      2392.439941
      2403.870117
    
    
      1
      2017-05-08
      2393.919922
      2401.360107
    
    
      2
      2017-05-05
      2389.379883
      2399.290039
    
    
      3
      2017-05-04
      2380.350098
      2391.429932
    
    
      4
      2017-05-03
      2379.750000
      2389.820068
    
    
      5
      2017-05-02
      2385.820068
      2392.929932
    
    
      6
      2017-05-01
      2384.830078
      2394.489990
    
    
      7
      2017-04-28
      2382.360107
      2393.679932
    
    
      8
      2017-04-27
      2382.679932
      2392.100098



In [121]:

    
#convert column "Date" to a date data type
yahoo['Date'] = pd.to_datetime(yahoo['Date'])
#sort the data by date ascending
yahoo=yahoo.sort_values(by="Date", axis=0, ascending=True, inplace=False, kind='quicksort')
yahoo









    Out[121]:






  
    
      
      Date
      Low
      High
    
  
  
    
      8
      2017-04-27
      2382.679932
      2392.100098
    
    
      7
      2017-04-28
      2382.360107
      2393.679932
    
    
      6
      2017-05-01
      2384.830078
      2394.489990
    
    
      5
      2017-05-02
      2385.820068
      2392.929932
    
    
      4
      2017-05-03
      2379.750000
      2389.820068
    
    
      3
      2017-05-04
      2380.350098
      2391.429932
    
    
      2
      2017-05-05
      2389.379883
      2399.290039
    
    
      1
      2017-05-08
      2393.919922
      2401.360107
    
    
      0
      2017-05-09
      2392.439941
      2403.870117



In [122]:

    
type(data['Date'])
type(yahoo['Date'])









    Out[122]:





pandas.core.series.Series

PART 5 CORRELATION BETWEEN NEWS POLARITY AND S&P 500



In [123]:

    
#join yahoo and data together on "Date"
result = pd.merge(data, yahoo,on='Date')
result









    Out[123]:






  
    
      
      Date
      polarity
      Low
      High
    
  
  
    
      0
      2017-04-27
      -0.021875
      2382.679932
      2392.100098
    
    
      1
      2017-04-28
      0.077340
      2382.360107
      2393.679932
    
    
      2
      2017-05-01
      0.025641
      2384.830078
      2394.489990
    
    
      3
      2017-05-02
      0.032199
      2385.820068
      2392.929932
    
    
      4
      2017-05-03
      -0.000268
      2379.750000
      2389.820068
    
    
      5
      2017-05-04
      0.022727
      2380.350098
      2391.429932



In [124]:

    
result_len = len(result)



In [125]:

    
yahoo.plot(x="Date",figsize=(6, 2),title='Yahoo Finance')
data.plot(x='Date',figsize=(6, 2),title='News Title Polarity')









    Out[125]:





<matplotlib.axes._subplots.AxesSubplot at 0x11c8e8b00>

Estimate correlation between polarity scores and S&P500 index



In [126]:

    
import numpy
low=result['Low']
high=result['High']
polarity=result['polarity']
numpy.corrcoef(low, polarity) 
#from the data we have, we can conclude that news polarity and S&P500 index are positively correlated









    Out[126]:





array([[ 1.        ,  0.21469213],
       [ 0.21469213,  1.        ]])



In [127]:

    
numpy.corrcoef(high, polarity)









    Out[127]:





array([[ 1.        ,  0.54956514],
       [ 0.54956514,  1.        ]])



In [128]:

    
numpy.corrcoef(high, low)









    Out[128]:





array([[ 1.        ,  0.77905387],
       [ 0.77905387,  1.        ]])



In [129]:

    
#a scatterplot for news polarity and Yahoo daily return of the market index
result.plot.scatter(x="polarity", y="Low")









    Out[129]:





<matplotlib.axes._subplots.AxesSubplot at 0x11bdc8fd0>

A parametic estimation for Yahoo daily return by news polarity



In [130]:

    
#a parametic estimation for Yahoo daily return by news polarity
import seaborn as sns
#lmplot plots the data with the regression coefficient through it.
sns.lmplot(x="polarity", y="Low", data=result, ci=0.95) #ci stands for confidence interval









    Out[130]:





<seaborn.axisgrid.FacetGrid at 0x11669c5f8>

A non-parametic estimation for Yahoo daily return by news polarity



In [131]:

    
import pyqt_fit.nonparam_regression as smooth
from pyqt_fit import npr_methods



In [132]:

    
k0 = smooth.NonParamRegression(polarity, low, method=npr_methods.SpatialAverage())
k0.fit()
grid = np.r_[-0.05:0.05:0.01]
plt.plot(grid, k0(grid), label="Spatial Averaging", linewidth=2)
plt.legend(loc='best')









    Out[132]:





<matplotlib.legend.Legend at 0x119a3ab00>

	articles	sortBy	source	status
0	{'title': 'James Comey Sought More Resources f...	top	the-wall-street-journal	ok
1	{'title': 'Trump Fires FBI Director James Come...	top	the-wall-street-journal	ok
2	{'title': 'Comey Firing Casts Harsh Spotlight ...	top	the-wall-street-journal	ok
3	{'title': 'As the FBI Reels, Candidates Emerge...	top	the-wall-street-journal	ok
4	{'title': 'Trump’s Firing of Comey Fans Partis...	top	the-wall-street-journal	ok
5	{'title': 'Donald Trump Seeks to Mute Outcry f...	top	the-wall-street-journal	ok
6	{'title': 'Senate Committee Subpoenas Document...	top	the-wall-street-journal	ok
7	{'title': 'Snapchat Parent Posts $2.2 Billion ...	top	the-wall-street-journal	ok
8	{'title': 'U.S. to Expand Intelligence Coopera...	top	the-wall-street-journal	ok
9	{'title': 'Whole Foods Overhauls Board; Vows B...	top	the-wall-street-journal	ok
10	{'title': '4 ways Trump miscalculated the Come...	top	cnn	ok
11	{'title': 'Source close to Comey says there we...	top	cnn	ok
12	{'title': 'Tapper: The real reasons Trump fire...	top	cnn	ok
13	{'title': 'First on CNN: Comey sends farewell ...	top	cnn	ok
14	{'title': 'WH: Comey tossed 'stick of dynamite...	top	cnn	ok
15	{'title': 'Comey committed 'atrocities,' Sarah...	top	cnn	ok
16	{'title': 'Rod Rosenstein: Trump's unlikely ha...	top	cnn	ok
17	{'title': 'Senate intelligence committee subpo...	top	cnn	ok
18	{'title': 'Europe view: American democracy isn...	top	cnn	ok
19	{'title': 'Comey firing sends shockwaves throu...	top	cnn	ok
20	{'title': 'F.B.I. Director James Comey Is Fire...	top	the-new-york-times	ok
21	{'title': 'Days Before Firing, Comey Asked for...	top	the-new-york-times	ok
22	{'title': 'Updates and Reactions to F.B.I. Dir...	top	the-new-york-times	ok
23	{'title': 'Opinion \| Trump’s Firing of Comey I...	top	the-new-york-times	ok
24	{'title': 'Jimmy Kimmel Responds to Critics Ov...	top	the-new-york-times	ok
25	{'title': 'In Trump’s Firing of James Comey, E...	top	the-new-york-times	ok
26	{'title': 'Why Everything We Know About Salt M...	top	the-new-york-times	ok
27	{'title': 'How Homeownership Became the Engine...	top	the-new-york-times	ok
28	{'title': 'The Birth of a Mother', 'author': '...	top	the-new-york-times	ok
29	{'title': 'How a 23-Year-Old With Mild Anxiety...	top	the-new-york-times	ok
...	...	...	...	...
60	{'title': 'Defiant Trump courts Russia as prob...	top	financial-times	ok
61	{'title': 'Comey dismissal unfolds in uniquely...	top	financial-times	ok
62	{'title': 'Lavrov delivers a barbed script for...	top	financial-times	ok
63	{'title': 'Little-known prosecutor under scrut...	top	financial-times	ok
64	{'title': 'China ‘New Silk Road’ investment fe...	top	financial-times	ok
65	{'title': 'Global ETF assets reach $4tn', 'aut...	top	financial-times	ok
66	{'title': 'Snap shares slump as debut earnings...	top	financial-times	ok
67	{'title': 'Whole Foods replaces chairman and f...	top	financial-times	ok
68	{'title': 'Cable cowboy John Malone views a ne...	top	financial-times	ok
69	{'title': 'Toshiba technology unit sale grows ...	top	financial-times	ok
70	{'title': 'Flynn Subpoenaed in Russia Probe by...	top	bloomberg	ok
71	{'title': 'After Comey, Justice Must Be Served...	top	bloomberg	ok
72	{'title': 'What Happens to Trump-Russia Probe ...	top	bloomberg	ok
73	{'title': 'Mobius Says Low Market Volatility I...	top	bloomberg	ok
74	{'title': 'Aetna Is Latest Health Insurer to Q...	top	bloomberg	ok
75	{'title': 'Uber Greyball Investigation Expands...	top	bloomberg	ok
76	{'title': 'United Directors Sued Over Ousted C...	top	bloomberg	ok
77	{'title': 'Whole Foods Names Panera CEO to Boa...	top	bloomberg	ok
78	{'title': 'DeVos Booed Loudly by Graduates at ...	top	bloomberg	ok
79	{'title': 'Boeing Halts Flights of 737 Max on ...	top	bloomberg	ok
80	{'title': 'By attacking Kurdish allies of Amer...	top	the-economist	ok
81	{'title': 'President Trump abruptly sacks the ...	top	the-economist	ok
82	{'title': 'Moon Jae-in wins South Korea’s pres...	top	the-economist	ok
83	{'title': 'Why are Russian opposition leaders’...	top	the-economist	ok
84	{'title': '“Girlboss” is another disappointing...	top	the-economist	ok
85	{'title': 'The Tory and Labour parties fail to...	top	the-economist	ok
86	{'title': 'A mixed April for United Airlines',...	top	the-economist	ok
87	{'title': 'Mumbai plans the world’s tallest st...	top	the-economist	ok
88	{'title': 'The Kushners put controversial inve...	top	the-economist	ok
89	{'title': 'Apocalyptic fiction and the Doomsda...	top	the-economist	ok

	0	1	2	3	4	5	6	7	8	9	...	81	82	83	84	85	86	87	88	89
T_polarity	0.500000	0.136364	-0.200000	0.000000	0.000000	0.00	0.00	0.250000	0.000	0.100000	...	-0.125	0.30	-0.10	-0.300000	-0.500000	0.00	0.0	0.55	0.0
T_subjectivity	0.500000	0.500000	0.700000	0.000000	0.000000	0.00	0.00	0.333333	0.000	0.250000	...	1.000	0.20	0.15	0.433333	0.300000	0.25	0.0	0.95	0.0
D_polarity	0.125000	0.136364	-0.105556	0.366667	0.357143	0.25	-0.15	0.125000	0.275	0.050000	...	0.000	0.75	-0.20	0.525000	0.158333	0.00	0.4	0.00	0.5
D_subjectivity	0.141667	0.500000	0.422222	0.650000	0.571429	0.30	0.30	0.200000	0.425	0.258333	...	0.000	0.75	0.30	0.775000	0.766667	0.30	0.7	0.00	0.5

	0	1	2	3	4	5	6	7	8
T_polarity	0.786364	0.166667	0.333333	-0.511111	0.096591	1.986364	0.947727	0.505556	-0.175000
T_subjectivity	2.283333	1.500000	1.900000	2.115278	1.826136	3.200000	2.759091	2.044444	3.283333
D_polarity	1.429618	0.229167	0.990000	-0.945833	1.229167	0.352020	0.000000	0.470455	2.133333
D_subjectivity	3.768651	1.808333	3.518016	3.275000	1.870833	3.046753	0.000000	3.597944	4.091667

	Date	polarity
0	3/24/17	-0.046528
1	3/24/17	-0.046528
2	4/9/17	0.000000
3	4/11/17	-0.250000
4	4/11/17	0.078571
5	4/12/17	0.000000
6	4/12/17	-0.400000
7	4/12/17	0.000000
8	4/12/17	0.250000
9	4/12/17	0.000000
10	4/12/17	0.033333
11	4/13/17	0.000000
12	4/13/17	-0.100000
13	4/13/17	-0.050000
14	4/13/17	0.500000
15	4/13/17	-0.200000
16	4/13/17	-0.062500
17	4/13/17	0.111111
18	4/13/17	0.100000
19	4/13/17	0.500000
20	4/13/17	0.112121
21	4/13/17	-0.100000
22	4/13/17	0.000000
23	4/13/17	0.187500
24	4/13/17	-0.011111
25	4/13/17	0.250000
26	4/13/17	-0.350000
27	4/13/17	0.000000
28	4/13/17	-0.031250
29	4/13/17	0.350000
...	...	...
909	5/3/17	-0.050000
910	5/3/17	0.137273
911	5/3/17	0.000000
912	5/3/17	0.128788
913	5/3/17	0.069444
914	5/3/17	-0.100000
915	5/3/17	0.000000
916	5/3/17	-0.100000
917	5/3/17	0.200000
918	5/3/17	0.000000
919	5/3/17	0.000000
920	5/3/17	-0.075000
921	5/3/17	0.160000
922	5/3/17	-0.100000
923	5/3/17	0.000000
924	5/3/17	0.166667
925	5/3/17	-0.250000
926	5/3/17	0.000000
927	5/3/17	0.250000
928	5/3/17	-0.050000
929	5/3/17	0.000000
930	5/3/17	0.203333
931	5/3/17	-0.305556
932	5/3/17	0.000000
933	5/3/17	0.000000
934	5/3/17	0.000000
935	5/3/17	0.000000
936	5/4/17	0.350000
937	5/4/17	-0.281818
938	5/4/17	0.000000

	Date	polarity
0	2017-03-24	-0.046528
10	2017-04-09	0.000000
1	2017-04-11	-0.085714
2	2017-04-12	-0.019444
3	2017-04-13	0.041044
4	2017-04-14	0.032893
5	2017-04-15	0.035714
6	2017-04-27	-0.021875
7	2017-04-28	0.077340
8	2017-04-29	0.006742
9	2017-04-30	0.019901
11	2017-05-01	0.025641
12	2017-05-02	0.032199
13	2017-05-03	-0.000268
14	2017-05-04	0.022727

	Date	Low	High
0	2017-05-09	2392.439941	2403.870117
1	2017-05-08	2393.919922	2401.360107
2	2017-05-05	2389.379883	2399.290039
3	2017-05-04	2380.350098	2391.429932
4	2017-05-03	2379.750000	2389.820068
5	2017-05-02	2385.820068	2392.929932
6	2017-05-01	2384.830078	2394.489990
7	2017-04-28	2382.360107	2393.679932
8	2017-04-27	2382.679932	2392.100098