examineArticle

Description: Sentence tokenises article from inputted URL, and conducts NLP analysis to draw out and PRINT important parameters.

To use, insert %run 'examineArticle.ipynb' after your import statement.

then call examineArticle("URL")

URL MUST BE A STRING

To get

{"title", "url", "authors", "date", "summary", "polarity", "subjectivity", "keywords", "images", "videos"} AS WELL AS A SENTENCE BY SENTENCE BREAKDOWN OF SENTIMENT!

Setting Up


In [1]:
from pattern.web import Twitter
from textblob import TextBlob
import nltk.data 
from nltk.tokenize import word_tokenize, sent_tokenize

#NLTK RESOURCE DOWNLOADING
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')

#PARSER
from newspaper import Article
import newspaper

#set tokenizer model
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')

Declare URL, Download, and Parse Article

Analyze Article


In [2]:
def examineArticle(LINK):
    #list, download, and parse article
    a = Article(str(LINK)) # Set language as english
    a.download()
    a.parse()
    a.nlp()    #Quality of life bold delimiters
    
    b = "\033[1m"
    endb = "\033[0;0m"

    print(b + "START OF ARTICLE - START OF ARTICLE - START OF ARTICLE - START OF ARTICLE - START OF ARTICLE" + endb)

    #Overall sentiment trackers
    count = 0
    overallScore = [0.0,0.0]

    print("\n-----\n" + b + "METADATA" + endb + "\n-----")
    print(b + "Title: " + endb, end="")
    print(a.title)
    print(b + "Language: " + endb, end="")
    print(getattr(a,"meta_lang"))
    print(b + "Author(s): " + endb, end="")
    print(a.authors)
    print(b + "Keywords: " + endb, end="")
    print(a.keywords)
    print(b + "Date: " + endb, end="")
    print(a.publish_date)
    print(b + "Top Image: " + endb, end="")
    print(a.top_image)
    print(b + "Videos: " + endb, end="")
    print(a.movies)

    print("\n-----\n" + b + "SUMMARY" + endb + "\n-----")
    print(a.summary)

    sentences = []
    
    for index, token in enumerate(tokenizer.tokenize(a.text)):
        sentences.append(TextBlob(tokenizer.tokenize(a.text)[index]))
    
    #Split article into sentences
    print("\n-----\n" + b + "ANALYSIS" + endb + "\n-----")
    for index, token in enumerate(tokenizer.tokenize(a.text)):
        
        analysis = sentences[index]
        
        #Prep overall analysis tracker, IGNORE if parameters are [0.0, 0.0] for sentence
        if analysis.sentiment.polarity != 0.0 and analysis.sentiment.subjectivity != 0.0:
            count += 1
            overallScore[0] += analysis.sentiment.polarity
            overallScore[1] += analysis.sentiment.subjectivity
            # analysis.correct() #Correct mispelt words !!! IF YOU ACTIVATE THIS IT'LL BE SLOW
            print(analysis + b) 
            #and for each sentence, analyze sentiment
            print("Polarity: " + "{0:.5f}".format(analysis.sentiment.polarity), end="    ")
            print("Subjectivity: " + "{0:.5f}".format(analysis.sentiment.subjectivity) + endb)
            print(endb + "-----")
        else:
            try:
                sentences[index + 1] = sentences[index] + " " + sentences[index + 1]
                analysis = sentences[index + 1]
            except: #In the case where the last sentence is 0.0, 0.0
                print("LAST SENTENCE")
                print(analysis)
                print("Polarity: " + "{0:.5f}".format(analysis.sentiment.polarity), end="    ")
                print("Subjectivity: " + "{0:.5f}".format(analysis.sentiment.subjectivity) + endb)
                continue
                
            
                
    #Guarding against divisions by 0
    if count == 0:
        count = 1

    print("\n-----\n" + b + "OVERALL SENTIMENT" + endb + "\n-----")
    #print(TextBlob(a.text).sentiment)
    print(b + "Polarity: " + endb, end="")
    print("{0:.5f}".format(overallScore[0]/count), end="   |   ")
    print(b + "Subjectivity: " + endb, end="")
    print("{0:.5f}".format(overallScore[1]/count), end="")
    print(endb + "\n")
    print(b + "END OF ARTICLE - END OF ARTICLE - END OF ARTICLE - END OF ARTICLE - END OF ARTICLE" + endb)
    print("--------------------------------------------------------")

In [3]:
examineArticle("http://www.npr.org/sections/health-shots/2017/10/13/557541856/halt-in-subsidies-for-health-insurers-expected-to-drive-up-costs-for-middle-clas")


START OF ARTICLE - START OF ARTICLE - START OF ARTICLE - START OF ARTICLE - START OF ARTICLE

-----
METADATA
-----
Title: Halt In Subsidies For Health Insurers Expected To Drive Up Costs For Middle Class
Language: en
Author(s): ['Alison Kodjak']
Keywords: ['middle', 'law', 'rise', 'insurers', 'end', 'subsidies', 'costs', 'tax', 'aca', 'class', 'decision', 'insurance', 'health', 'premiums', 'drive', 'payments', 'halt', 'expected']
Date: 2017-10-13 00:00:00
Top Image: https://media.npr.org/assets/img/2017/10/13/trump_aca_stuff_wide-ad15a847d0d179459d7b727f73c37945112956de.jpg?s=1400
Videos: []

-----
SUMMARY
-----
Halt In Subsidies For Health Insurers Expected To Drive Up Costs For Middle ClassEnlarge this image toggle caption Alex Wong/Getty Images Alex Wong/Getty ImagesUpdated at 11:29 a.m.
ETPresident Trump's decision Thursday to end subsidy payments to health insurance companies is expected to raise premiums for middle-class families and cost the federal government hundreds of billions of dollars.
So if the government doesn't reimburse the insurers, they'll make up the money by charging higher premiums for coverage.
So when premiums rise, those tax credits rise in tandem.
House Republicans opposed to the health law sued then-President Barack Obama, saying the payments were illegal because Congress hadn't appropriated money for them.

-----
ANALYSIS
-----
Halt In Subsidies For Health Insurers Expected To Drive Up Costs For Middle Class

Enlarge this image toggle caption Alex Wong/Getty Images Alex Wong/Getty Images

Updated at 11:29 a.m.
Polarity: -0.05000    Subjectivity: 0.20000
-----
ET

President Trump's decision Thursday to end subsidy payments to health insurance companies is expected to raise premiums for middle-class families and cost the federal government hundreds of billions of dollars.
Polarity: -0.10000    Subjectivity: 0.40000
-----
The administration said it would stop reimbursing insurers for discounts on co-payments and deductibles that they are required by law to offer to low-income consumers. The reimbursements are known as cost-sharing reduction payments, or CSRs. Insurance companies still have to give the discounts to low-income customers. So if the government doesn't reimburse the insurers, they'll make up the money by charging higher premiums for coverage.
Polarity: 0.25000    Subjectivity: 0.50000
-----
The subsidy cut was the second swipe the White House took at the Affordable Care Act insurance markets Thursday in what many critics say is a deliberate campaign to destabilize them in hopes of forcing Congress to repeal the law.
Polarity: 0.16667    Subjectivity: 0.16667
-----
"Ending the CSR payments is another sign that President Trump is doing what he can to undermine the stability of the individual market under the ACA," wrote Tim Jost, professor emeritus of law at Washington and Lee University who contributes to the Health Affairs blog. The decision will most directly affect middle-class families who buy their own insurance without financial help from the government.
Polarity: 0.24000    Subjectivity: 0.46000
-----
Consumers who earn more than 400 percent of the federal poverty level — an individual with income of about $48,000 or a family of four that makes more than $98,400 — will likely see their costs for coverage rise next year by an average of about 20 percent nationwide.
Polarity: 0.14167    Subjectivity: 0.46667
-----
People with lower incomes will be unaffected since the ACA, also known as Obamacare, provides government subsidies — in the form of tax credits — that ensure their out-of-pocket insurance costs remain stable.
Polarity: -0.05000    Subjectivity: 0.10000
-----
So when premiums rise, those tax credits rise in tandem. "We now know what Trumpcare looks like, and it's pretty ugly," said Ezekiel Emanuel, an oncologist who chairs the Department of Medical Ethics and Health Policy at the University of Pennsylvania.
Polarity: -0.15000    Subjectivity: 0.66667
-----
"The people who are particularly going to hurt are the people who don't get any subsidies.
Polarity: 0.16667    Subjectivity: 0.33333
-----
They just have to buy their own insurance," Emanuel, one of the architects of the ACA, told Morning Edition on Friday.
Polarity: 0.60000    Subjectivity: 1.00000
-----
Ironically, the decision to end the $7 billion-a-year cost-sharing payments is likely to cost the federal government more than making them — nearly $200 billion over 10 years, according to the Congressional Budget Office.
Polarity: 0.20000    Subjectivity: 0.63333
-----
That is because the ACA requires that premiums don't exceed a set percentage of a person's income. So as premiums set by insurance companies rise over time, the government has to boost its tax credits so the cost to the consumer remains the same. Last year, about 85 percent of people who bought Obamacare insurance got a tax credit, according to the Centers for Medicare and Medicaid Services. The cost-sharing payments have been at the center of a political battle over the Affordable Care Act since before Trump took office.
Polarity: -0.02500    Subjectivity: 0.09792
-----
House Republicans opposed to the health law sued then-President Barack Obama, saying the payments were illegal because Congress hadn't appropriated money for them.
Polarity: -0.50000    Subjectivity: 0.50000
-----
A judge agreed but allowed the administration to continue making the payments during an appeal. The president has been threatening all year to cut off the payments, which he refers to as a bailout. He repeated that characterization on Twitter on Friday. Insurers have been left to wonder, month to month, whether they will receive them. The White House cited that legal dispute when it announced the end of the payments late Thursday.
Polarity: -0.02500    Subjectivity: 0.20000
-----
"The bailout of insurance companies through these unlawful payments is yet another example of how the previous administration abused taxpayer dollars and skirted the law to prop up a broken system," the White House statement said.
Polarity: -0.18889    Subjectivity: 0.18889
-----
The decision to end the cost-sharing payments came just hours after Trump signed an executive order hoping to make it easier for people and small businesses to buy cheaper health insurance policies through trade groups and professional associations.
Polarity: -0.07500    Subjectivity: 0.25000
-----
Those plan would likely have fewer benefits and appeal to healthier, younger people. The two moves Thursday could split the health care market. People with low-incomes or expensive medical conditions like diabetes would stay in the ACA marketplaces, while healthier, wealthier people would go elsewhere for coverage.
Polarity: -0.12500    Subjectivity: 0.42500
-----
The result would be higher costs for people who need health care most.
Polarity: 0.37500    Subjectivity: 0.50000
-----

-----
OVERALL SENTIMENT
-----
Polarity: 0.04728   |   Subjectivity: 0.39380

END OF ARTICLE - END OF ARTICLE - END OF ARTICLE - END OF ARTICLE - END OF ARTICLE
--------------------------------------------------------