Performing Clean-up and Analysis on Native Ad Data Scraped "From Around the Web"



In [1]:

    
import pandas as pd
from datetime import datetime
import dateutil
import matplotlib.pyplot as plt
from IPython.core.display import display, HTML
import re
from urllib.parse import urlparse
import json

Data Load and Cleaning



In [2]:

    
data = pd.read_csv('../data/in/native_ad_data.csv')



In [3]:

    
data.head()









    Out[3]:






  
    
      
      _id
      headline
      link
      img
      provider
      source
      img_file
      date
      final_link
      orig_article
    
  
  
    
      0
      ObjectId(58d90ce706e10d04f7e1b3d8)
      20 Cool Moments From Joe Biden’s Time In Office
      http://scribol.com/a/news-and-politics/ways-jo...
      https://console.brax-cdn.com/creatives/98c6400...
      taboola
      http://tmz.com
      ./imgs/876aa5e83f6fb81a81908db3c02fdcc00d44400...
      2017-03-27T12:59:09.279Z
      http://scribol.com/a/news-and-politics/ways-jo...
      NaN
    
    
      1
      ObjectId(58d90ce706e10d04f7e1b3d9)
      Troubled News Anchor Does The Unthinkable On Air
      http://www.trend-chaser.com/entertainment/the-...
      https://console.brax-cdn.com/creatives/b86bbc0...
      taboola
      http://tmz.com
      ./imgs/bab1037467f1385cd865c48029db808b03a151d...
      2017-03-27T12:59:09.819Z
      http://www.trend-chaser.com/entertainment/the-...
      NaN
    
    
      2
      ObjectId(58d90ce706e10d04f7e1b3da)
      It's Almost Hard To Fathom What He look's Like...
      http://www.journalistate.com/popular/big-holly...
      http://cdn.taboolasyndication.com/libtrc/stati...
      taboola
      http://tmz.com
      ./imgs/feeb5be5a9758fcca8cef21b6fb842ccc839476...
      2017-03-27T12:59:10.750Z
      http://www.journalistate.com/popular/big-holly...
      NaN
    
    
      3
      ObjectId(58d90ce706e10d04f7e1b3db)
      Troubled News Anchor Does The Unthinkable On Air
      http://www.trend-chaser.com/entertainment/the-...
      https://console.brax-cdn.com/creatives/b86bbc0...
      taboola
      http://tmz.com
      ./imgs/bab1037467f1385cd865c48029db808b03a151d...
      2017-03-27T12:59:11.430Z
      http://www.trend-chaser.com/entertainment/the-...
      NaN
    
    
      4
      ObjectId(58d90ce706e10d04f7e1b3dc)
      Try NOT Gasp When You See Who Queen Latifah Is...
      http://zcretuzft.iflmylife.com/entertainment/o...
      http://cdn.taboolasyndication.com/libtrc/stati...
      taboola
      http://tmz.com
      ./imgs/d75401b962746864063b51f164633ffeb93931d...
      2017-03-27T12:59:11.510Z
      http://www.iflmylife.com/entertainment/other-h...
      NaN

As a side note, the headlines from zergnet all have some newlines we need to get rid of and they appear to have concatenated the headline with the provider. So let's clean those up.



In [4]:

    
data['headline'] = data['headline'].apply(lambda x: re.sub('(?<=[a-z])\.?([A-Z](.*))' , '', x.strip()))
data.head()









    Out[4]:






  
    
      
      _id
      headline
      link
      img
      provider
      source
      img_file
      date
      final_link
      orig_article
    
  
  
    
      0
      ObjectId(58d90ce706e10d04f7e1b3d8)
      20 Cool Moments From Joe Biden’s Time In Office
      http://scribol.com/a/news-and-politics/ways-jo...
      https://console.brax-cdn.com/creatives/98c6400...
      taboola
      http://tmz.com
      ./imgs/876aa5e83f6fb81a81908db3c02fdcc00d44400...
      2017-03-27T12:59:09.279Z
      http://scribol.com/a/news-and-politics/ways-jo...
      NaN
    
    
      1
      ObjectId(58d90ce706e10d04f7e1b3d9)
      Troubled News Anchor Does The Unthinkable On Air
      http://www.trend-chaser.com/entertainment/the-...
      https://console.brax-cdn.com/creatives/b86bbc0...
      taboola
      http://tmz.com
      ./imgs/bab1037467f1385cd865c48029db808b03a151d...
      2017-03-27T12:59:09.819Z
      http://www.trend-chaser.com/entertainment/the-...
      NaN
    
    
      2
      ObjectId(58d90ce706e10d04f7e1b3da)
      It's Almost Hard To Fathom What He look's Like...
      http://www.journalistate.com/popular/big-holly...
      http://cdn.taboolasyndication.com/libtrc/stati...
      taboola
      http://tmz.com
      ./imgs/feeb5be5a9758fcca8cef21b6fb842ccc839476...
      2017-03-27T12:59:10.750Z
      http://www.journalistate.com/popular/big-holly...
      NaN
    
    
      3
      ObjectId(58d90ce706e10d04f7e1b3db)
      Troubled News Anchor Does The Unthinkable On Air
      http://www.trend-chaser.com/entertainment/the-...
      https://console.brax-cdn.com/creatives/b86bbc0...
      taboola
      http://tmz.com
      ./imgs/bab1037467f1385cd865c48029db808b03a151d...
      2017-03-27T12:59:11.430Z
      http://www.trend-chaser.com/entertainment/the-...
      NaN
    
    
      4
      ObjectId(58d90ce706e10d04f7e1b3dc)
      Try NOT Gasp When You See Who Queen Latifah Is...
      http://zcretuzft.iflmylife.com/entertainment/o...
      http://cdn.taboolasyndication.com/libtrc/stati...
      taboola
      http://tmz.com
      ./imgs/d75401b962746864063b51f164633ffeb93931d...
      2017-03-27T12:59:11.510Z
      http://www.iflmylife.com/entertainment/other-h...
      NaN

OK, that's better.

The img_file column values also have ./imgs/ appended to the front of each file name. Let's get rid of those:



In [5]:

    
data['img_file'] = data['img_file'].apply(lambda x: re.sub('\.\/imgs\/' , '', str(x).strip()))

Now, let's check, do we have any null values?



In [6]:

    
for col in data.columns:
    print((col, sum(data[col].isnull())))









    



('_id', 0)
('headline', 0)
('link', 0)
('img', 0)
('provider', 0)
('source', 0)
('img_file', 0)
('date', 0)
('final_link', 0)
('orig_article', 59776)

For now only the orig_article column has nulls, as we had not collected those consistently



In [7]:

    
data.describe()









    Out[7]:






  
    
      
      _id
      headline
      link
      img
      provider
      source
      img_file
      date
      final_link
      orig_article
    
  
  
    
      count
      129399
      129399
      129399
      129399
      129399
      129399
      129399
      129399
      129399
      69623
    
    
      unique
      129399
      18022
      43315
      23843
      4
      24
      23866
      129396
      36713
      6670
    
    
      top
      ObjectId(593394dc9e1e2a636c179290)
      Here’s Why Guys Are Obsessed With This Underwear…
      https://grizly.com/lifestyle/guy-turned-backya...
      http://cdn.taboolasyndication.com/libtrc/stati...
      taboola
      http://tmz.com
      db07ff3401037653d665822c5a78617464fe4ef8.jpg
      2017-05-30T04:49:40.273Z
      https://grizly.com/lifestyle/guy-turned-backya...
      http://www.tmz.com/2017/06/02/kathy-griffin-co...
    
    
      freq
      1
      996
      588
      621
      59474
      24167
      621
      2
      588
      167

Already we can see some interesting trends here. Out of 129399 unique records, only 18022 of the headlines are unique, but 43315 of the links are unique and 23866 of the image files are unique (assuming for sure that there were issues with downloading images).

So it seems already that there are content links which might reuse the same headline, or image for different destination articles.

Also, because we want to inspect the hosts from which the articles and images are coming from, let's parse those out in the data.

Data Preparation



In [8]:

    
data['img_host'] = data['img'].apply(lambda x: urlparse(x).netloc)



In [9]:

    
data['link_host'] = data['final_link'].apply(lambda x: urlparse(x).netloc)

Next, let's classify each site by a very relaxed set of tags based on perceived political bias. I might be a little off on some, I referenced https://www.allsides.com/ where possible, but that was not entirely helpful in all cases. Otherwise, I just went with my own idea of where I felt a site fell on the political spectrum (e.g., left, right, or center). There is also a tag for tabloids, or primarily sites that probably don't really have an editorial perspective so much as a desire to publish whatever gets the most traffic.



In [10]:

    
left = ['http://www.politico.com/magazine/', 'https://www.washingtonpost.com/', 'http://www.huffingtonpost.com/', 'http://gothamist.com/news', 'http://www.metro.us/news', 'http://www.politico.com/politics', 'http://www.nydailynews.com/news', 'http://www.thedailybeast.com/']
right = ['http://www.breitbart.com', 'http://www.rt.com', 'https://nypost.com/news/', 'http://www.infowars.com/', 'https://www.therebel.media/news', 'http://observer.com/latest/']
center = ['http://www.ibtimes.com/', 'http://www.businessinsider.com/', 'http://thehill.com']
tabloid = ['http://tmz.com', 'http://www.dailymail.co.uk/', 'https://downtrend.com/', 'http://reductress.com/', 'http://preventionpulse.com/', 'http://elitedaily.com/', 'http://worldstarhiphop.com/videos/']



In [11]:

    
def get_classification(source):
    if source in left:
        return 'left'
    if source in right:
        return 'right'
    if source in center:
        return 'center'
    if source in tabloid:
        return 'tabloid'



In [12]:

    
data['source_class'] = data['source'].apply(lambda x: get_classification(x))



In [13]:

    
data.head()









    Out[13]:






  
    
      
      _id
      headline
      link
      img
      provider
      source
      img_file
      date
      final_link
      orig_article
      img_host
      link_host
      source_class
    
  
  
    
      0
      ObjectId(58d90ce706e10d04f7e1b3d8)
      20 Cool Moments From Joe Biden’s Time In Office
      http://scribol.com/a/news-and-politics/ways-jo...
      https://console.brax-cdn.com/creatives/98c6400...
      taboola
      http://tmz.com
      876aa5e83f6fb81a81908db3c02fdcc00d444000.png
      2017-03-27T12:59:09.279Z
      http://scribol.com/a/news-and-politics/ways-jo...
      NaN
      console.brax-cdn.com
      scribol.com
      tabloid
    
    
      1
      ObjectId(58d90ce706e10d04f7e1b3d9)
      Troubled News Anchor Does The Unthinkable On Air
      http://www.trend-chaser.com/entertainment/the-...
      https://console.brax-cdn.com/creatives/b86bbc0...
      taboola
      http://tmz.com
      bab1037467f1385cd865c48029db808b03a151d2.png
      2017-03-27T12:59:09.819Z
      http://www.trend-chaser.com/entertainment/the-...
      NaN
      console.brax-cdn.com
      www.trend-chaser.com
      tabloid
    
    
      2
      ObjectId(58d90ce706e10d04f7e1b3da)
      It's Almost Hard To Fathom What He look's Like...
      http://www.journalistate.com/popular/big-holly...
      http://cdn.taboolasyndication.com/libtrc/stati...
      taboola
      http://tmz.com
      feeb5be5a9758fcca8cef21b6fb842ccc8394766.jpg
      2017-03-27T12:59:10.750Z
      http://www.journalistate.com/popular/big-holly...
      NaN
      cdn.taboolasyndication.com
      www.journalistate.com
      tabloid
    
    
      3
      ObjectId(58d90ce706e10d04f7e1b3db)
      Troubled News Anchor Does The Unthinkable On Air
      http://www.trend-chaser.com/entertainment/the-...
      https://console.brax-cdn.com/creatives/b86bbc0...
      taboola
      http://tmz.com
      bab1037467f1385cd865c48029db808b03a151d2.png
      2017-03-27T12:59:11.430Z
      http://www.trend-chaser.com/entertainment/the-...
      NaN
      console.brax-cdn.com
      www.trend-chaser.com
      tabloid
    
    
      4
      ObjectId(58d90ce706e10d04f7e1b3dc)
      Try NOT Gasp When You See Who Queen Latifah Is...
      http://zcretuzft.iflmylife.com/entertainment/o...
      http://cdn.taboolasyndication.com/libtrc/stati...
      taboola
      http://tmz.com
      d75401b962746864063b51f164633ffeb93931d3.jpg
      2017-03-27T12:59:11.510Z
      http://www.iflmylife.com/entertainment/other-h...
      NaN
      cdn.taboolasyndication.com
      www.iflmylife.com
      tabloid

Now let's remove duplicates based on a subset of the columns using pandas' drop_duplicates for DataFrames



In [14]:

    
deduped = data.drop_duplicates(subset=['headline', 'link', 'img', 'provider', 'source', 'img_file', 'final_link'], keep=False)



In [15]:

    
deduped.describe()









    Out[15]:






  
    
      
      _id
      headline
      link
      img
      provider
      source
      img_file
      date
      final_link
      orig_article
      img_host
      link_host
      source_class
    
  
  
    
      count
      43630
      43630
      43630
      43630
      43630
      43630
      43630
      43630
      43630
      25177
      43630
      43630
      43630
    
    
      unique
      43630
      15219
      35541
      19311
      4
      24
      19314
      43629
      30873
      5195
      568
      2196
      4
    
    
      top
      ObjectId(59533a5706e10d0343aee04f)
      Nicole Kidman's Yacht Is Far From You'd Expect
      http://topictracker.online/?utm_campaign=us-tb...
      http://cdn.taboolasyndication.com/libtrc/stati...
      taboola
      http://tmz.com
      f18167ca58fee4ae691a28ecd39b0c1afe2689e4.jpg
      2017-05-30T04:49:40.273Z
      http://www.zergnet.com/news/694817/kim-kardash...
      http://elitedaily.com/women/elite-daily-wants-...
      images.outbrain.com
      www.zergnet.com
      tabloid
    
    
      freq
      1
      376
      110
      368
      13431
      5070
      368
      2
      126
      51
      12259
      7257
      16005

And let's just check on those null values again...



In [16]:

    
for col in deduped.columns:
    print((col, sum(deduped[col].isnull())))









    



('_id', 0)
('headline', 0)
('link', 0)
('img', 0)
('provider', 0)
('source', 0)
('img_file', 0)
('date', 0)
('final_link', 0)
('orig_article', 18453)
('img_host', 0)
('link_host', 0)
('source_class', 0)

Out of curiousity, as we're only left with 43630 records after deduping, let's take a look at the rate of success for our record collection.



In [17]:

    
(43630/129399)*100









    Out[17]:





33.71741667246269

Crud, doing a harvest yields results where only 33% of our sample is worth examining further.

Data Exploration

Let's get the top 10 headlines grouped by img



In [18]:

    
deduped['headline'].groupby(deduped['img']).value_counts().nlargest(10)









    Out[18]:





img                                                                                                         headline                                                                                                 
http://cdn.taboolasyndication.com/libtrc/static/thumbnails/21a99ebd78f2af61aeeec2074e0376c0.jpg             Nicole Kidman's Yacht Is Far From You'd Expect                                                               368
https://revcontent-p0.s3.amazonaws.com/content/images/1495720487.jpg                                        Triple Your Accuracy With This Weird Shooting Technique Used By Seal Snipers                                 238
http://cdn.taboolasyndication.com/libtrc/static/thumbnails/0dba2430aca9e98e05160cfd6e6d3171.jpg             Here Is How You Upgrade To Business Class                                                                    227
http://cdn.taboolasyndication.com/libtrc/static/thumbnails/2e967b6db0813815a899401b4746a749.jpg             Stairlifts are disrupting the multi-billion dollar retirement home industry - keeping seniors independent    197
http://cdn.taboolasyndication.com/libtrc/static/thumbnails/6b232005189e48716587f79b33347846.jpg             Tiger Woods' Yacht Is Far From You'd Expect                                                                  171
https://revcontent-p0.s3.amazonaws.com/content/images/cf94a60e6dd053bb9a83231322545e99.jpg                  28 Pictures That Show How Crazy Woodstock 1969 Was                                                           139
https://revcontent-p0.s3.amazonaws.com/p0/assets/content_images/emb/7152612145c7f9231d1e2229a5c7fce4-0.png  We Can Guess Your Education Level with Only 10 Questions                                                     132
http://img2.zergnet.com/694817_300.jpg                                                                      Kim Kardashian and North West Turn Heads On The Red Carpet                                                   125
http://cdn.taboolasyndication.com/libtrc/static/thumbnails/e70c96286da170d65cbf3fc4c9a3e400.jpg             Best Senior Living Communities Of 2017! View Pricing Here & Compare                                          121
https://revcontent-p0.s3.amazonaws.com/content/images/1497897658.jpg                                        Trump Voters Shocked After Watching This Leaked Video                                                        120
Name: headline, dtype: int64

But hang on. let's just see what the top headlines are. There's certainly overlap, but it's not a one to one relationship between headlines and their images (or at least maybe it's the same image, but coming from a different URL).



In [19]:

    
deduped['headline'].value_counts().nlargest(10)









    Out[19]:





Nicole Kidman's Yacht Is Far From You'd Expect                                                               376
Triple Your Accuracy With This Weird Shooting Technique Used By Seal Snipers                                 260
Forget Social Security if you Own a Home (Do This)                                                           231
Here Is How You Upgrade To Business Class                                                                    227
Stairlifts are disrupting the multi-billion dollar retirement home industry - keeping seniors independent    200
Tiger Woods' Yacht Is Far From You'd Expect                                                                  181
Watch Obama's Face at 0:33. This Leaked Video Will Destroy Obama's Legacy                                    169
New Jersey Landlines Get Replaced (But Not With Cell Phones)                                                 161
Best Senior Living Communities Of 2017! View Pricing Here & Compare                                          144
28 Pictures That Show How Crazy Woodstock 1969 Was                                                           139
Name: headline, dtype: int64

Note: perhaps something we will want to look into is how many different headline, image permutations there are. I am particularly interested in the reuse of images across different headlines.

And how are our sources distributed?



In [20]:

    
deduped['source'].value_counts().nlargest(25)









    Out[20]:





http://tmz.com                        5070
http://elitedaily.com/                4873
http://www.politico.com/magazine/     3151
https://www.washingtonpost.com/       2961
http://www.infowars.com/              2561
http://www.thedailybeast.com/         2455
http://www.breitbart.com              2443
https://downtrend.com/                2421
http://www.ibtimes.com/               2323
http://thehill.com                    2001
http://www.businessinsider.com/       1984
http://www.rt.com                     1819
http://www.politico.com/politics      1708
http://worldstarhiphop.com/videos/    1292
http://www.dailymail.co.uk/           1159
http://reductress.com/                1082
https://nypost.com/news/               979
http://www.nydailynews.com/news        864
http://www.huffingtonpost.com/         814
https://www.therebel.media/news        756
http://observer.com/latest/            696
http://preventionpulse.com/            108
http://gothamist.com/news               74
http://www.metro.us/news                36
Name: source, dtype: int64

TMZ is a bit over-represented here

And what about by classification



In [21]:

    
deduped['source_class'].value_counts()









    Out[21]:





tabloid    16005
left       12063
right       9254
center      6308
Name: source_class, dtype: int64

Looks like the over-representation of TMZ is pushing on Tabloids a bit. Not terribly even between left, right, and center, either.

Let's take a look at the sources again as broken down by bother provider and our classification.



In [22]:

    
deduped.groupby(['provider', 'source_class'])['source'].value_counts()









    Out[22]:





provider    source_class  source                            
outbrain    center        http://thehill.com                    2001
            left          http://www.politico.com/magazine/     3151
                          https://www.washingtonpost.com/       2961
                          http://www.thedailybeast.com/         2455
            right         https://nypost.com/news/               979
                          http://observer.com/latest/            696
revcontent  center        http://www.ibtimes.com/               2323
            left          http://www.metro.us/news                36
            right         http://www.infowars.com/              2561
            tabloid       https://downtrend.com/                2421
                          http://worldstarhiphop.com/videos/    1292
                          http://preventionpulse.com/            108
taboola     center        http://www.businessinsider.com/       1984
            left          http://www.politico.com/politics      1708
                          http://www.nydailynews.com/news        864
                          http://www.huffingtonpost.com/         814
                          http://gothamist.com/news               74
            right         http://www.breitbart.com              2443
                          http://www.rt.com                     1819
                          https://www.therebel.media/news        756
            tabloid       http://www.dailymail.co.uk/           1159
                          http://reductress.com/                1082
                          http://elitedaily.com/                 718
                          http://tmz.com                          10
zergnet     tabloid       http://tmz.com                        5060
                          http://elitedaily.com/                4155
Name: source, dtype: int64

OK so what are the most frequent and least images per classification?



In [23]:

    
IMG_MAX=5



In [24]:

    
topimgs_center = deduped['img'][deduped['source_class'].isin(['center'])].value_counts().nlargest(IMG_MAX).index.tolist()



In [25]:

    
bottomimgs_center = deduped['img'][deduped['source_class'].isin(['center'])].value_counts().nsmallest(IMG_MAX).index.tolist()



In [26]:

    
topimgs_left = deduped['img'][deduped['source_class'].isin(['left'])].value_counts().nlargest(IMG_MAX).index.tolist()



In [27]:

    
bottomimgs_left = deduped['img'][deduped['source_class'].isin(['left'])].value_counts().nsmallest(IMG_MAX).index.tolist()



In [28]:

    
topimgs_right = deduped['img'][deduped['source_class'].isin(['right'])].value_counts().nlargest(IMG_MAX).index.tolist()



In [29]:

    
bottomimgs_right = deduped['img'][deduped['source_class'].isin(['right'])].value_counts().nsmallest(IMG_MAX).index.tolist()



In [30]:

    
topimgs_tabloid = deduped['img'][deduped['source_class'].isin(['tabloid'])].value_counts().nlargest(IMG_MAX).index.tolist()



In [31]:

    
bottomimgs_tabloid = deduped['img'][deduped['source_class'].isin(['tabloid'])].value_counts().nsmallest(IMG_MAX).index.tolist()



In [32]:

    
for i in topimgs_center:
    displaystring = '<img src={} width="200"/>'.format(i)
    display(HTML(displaystring))



In [33]:

    
for i in bottomimgs_center:
    displaystring = '<img src={} width="200"/>'.format(i)
    display(HTML(displaystring))



In [34]:

    
for i in topimgs_left:
    displaystring = '<img src={} width="200"/>'.format(i)
    display(HTML(displaystring))



In [35]:

    
for i in bottomimgs_left:
    displaystring = '<img src={} width="200"/>'.format(i)
    display(HTML(displaystring))



In [36]:

    
for i in topimgs_right:
    displaystring = '<img src={} width="200"/>'.format(i)
    display(HTML(displaystring))



In [37]:

    
for i in bottomimgs_right:
    displaystring = '<img src={} width="200"/>'.format(i)
    display(HTML(displaystring))



In [38]:

    
for i in topimgs_tabloid:
    displaystring = '<img src={} width="200"/>'.format(i)
    display(HTML(displaystring))



In [39]:

    
for i in bottomimgs_tabloid:
    displaystring = '<img src={} width="200"/>'.format(i)
    display(HTML(displaystring))

Yawn! I have to admit this isnt's as interesting as I thought it might be.

Explore over time

Next perhaps let's explore trends over time. First we'll want to make a version of the Data Frame that is indexed by date



In [40]:

    
deduped_date_idx = deduped.copy(deep=False)



In [41]:

    
deduped_date_idx['date'] = pd.to_datetime(deduped_date_idx.date)



In [42]:

    
deduped_date_idx.set_index('date',inplace=True)

See what dates we're working with



In [43]:

    
"Start: {}  -  End: {}".format(deduped_date_idx.index.min(), deduped_date_idx.index.max())









    Out[43]:





'Start: 2017-03-27 12:59:09.279000  -  End: 2017-07-09 14:31:09.853000'

Let's examine the distribution of the classifications over time



In [44]:

    
deduped_date_idx['2017-03-01':'2017-07-07'].groupby('source_class').resample('M').size().plot(kind='bar')









    Out[44]:





<matplotlib.axes._subplots.AxesSubplot at 0x1057edfd0>



In [45]:

    
plt.show()

I think what we're mostly seeing here is that our scraper was most active during the month of June.

Let's see the same distribution for provider.



In [46]:

    
deduped_date_idx['2017-03-01':'2017-07-07'].groupby(['provider']).resample('M').size().plot(kind='bar')









    Out[46]:





<matplotlib.axes._subplots.AxesSubplot at 0x108de2198>



In [47]:

    
plt.show()

Same, we're seeing that our results are biased towards June.

What about if we check all results mentioning certain people



In [48]:

    
(deduped_date_idx[deduped_date_idx['headline'].str.contains('Trump')]['2017-03-01':'2017-07-07']).groupby('source_class').resample('M').size().plot(title="Headlines Containing 'Trump' By Month and Classification", kind='bar', color="pink")









    Out[48]:





<matplotlib.axes._subplots.AxesSubplot at 0x107150b70>



In [49]:

    
plt.show()



In [50]:

    
(deduped_date_idx[deduped_date_idx['headline'].str.contains('Clinton')]['2017-03-01':'2017-07-07']).groupby('source_class').resample('M').size().plot(title="Headlines Containing 'Clinton' By Month and Classification", kind='bar', color="gray")









    Out[50]:





<matplotlib.axes._subplots.AxesSubplot at 0x107434160>



In [51]:

    
plt.show()



In [52]:

    
(deduped_date_idx[deduped_date_idx['headline'].str.contains('Hillary')]['2017-03-01':'2017-07-07']).groupby('source_class').resample('M').size().plot(title="Headlines Containing 'Hillary' By Month and Classification" ,kind='bar', color="gray")









    Out[52]:





<matplotlib.axes._subplots.AxesSubplot at 0x10806da20>



In [53]:

    
plt.show()



In [54]:

    
(deduped_date_idx[deduped_date_idx['headline'].str.contains('Obama')]['2017-03-01':'2017-07-07']).groupby('source_class').resample('M').size().plot(title="Headlines Containing 'Obama' By Month and Classification", kind='bar')









    Out[54]:





<matplotlib.axes._subplots.AxesSubplot at 0x109a01128>



In [55]:

    
plt.show()

Again, seeing more of a trend around our data collection. There is an interesting trend that Trump articles are appearing on way more Tabloid articles than we might expect. Obama is appearing a lot on Right classified site articles, but again this is for June, so might just be an artifact of increased data collection. Finally, we see way more results for "Hillary" than we do "Clinton", and most of those are on Tabloid sites in April.

And let's check out some bucketed headline trends, both largest and smallest overall and for the various classifications.



In [56]:

    
(deduped_date_idx['2017-03-27':'2017-07-07'])['headline'].value_counts().nlargest(15)









    Out[56]:





Nicole Kidman's Yacht Is Far From You'd Expect                                                               348
Triple Your Accuracy With This Weird Shooting Technique Used By Seal Snipers                                 230
Here Is How You Upgrade To Business Class                                                                    227
Forget Social Security if you Own a Home (Do This)                                                           224
Stairlifts are disrupting the multi-billion dollar retirement home industry - keeping seniors independent    194
Tiger Woods' Yacht Is Far From You'd Expect                                                                  179
Watch Obama's Face at 0:33. This Leaked Video Will Destroy Obama's Legacy                                    164
New Jersey Landlines Get Replaced (But Not With Cell Phones)                                                 161
Best Senior Living Communities Of 2017! View Pricing Here & Compare                                          139
We Can Guess Your Education Level with Only 10 Questions                                                     123
Kim Kardashian and North West Turn Heads On The Red Carpet                                                   120
28 Pictures That Show How Crazy Woodstock 1969 Was                                                           119
Trump Voters Shocked After Watching This Leaked Video                                                        115
10 Surprising Things Guys Find Unattractive                                                                  106
Goldman Sachs & World Bank Confirm: Us Dollar Will Be Worthless in 100 Days                                  103
Name: headline, dtype: int64



In [57]:

    
(deduped_date_idx['2017-03-27':'2017-07-07'])['headline'].value_counts().nsmallest(15)









    Out[57]:





23 Surprising Things ‘Brady Bunch’ Producers Hid From Fans                              1
Is Taking A Career Break The New Norm?                                                  1
20 Things 'M*A*S*H' Producers Hid From Fans                                             1
Why David Caruso Got Dumped By Hollywood                                                1
Only 1 In 50 Americans Can Name These Iconic Women. Can You?                            1
Learn the Story Behind this Famous POTUS Picture                                        1
Beyonce's Most Iconic Beauty Moments of All Time                                        1
20 Wealthy Celebs Refuse to Help Their Poor Family                                      1
What 770,000 Tubes of Saliva Reveal…                                                    1
Meet The Worlds Most Powerful Leaders                                                   1
New Jersey Homeowners Use New Incentives to…                                            1
How Home Chef Can Save You $$ on Groceries                                              1
Dems, GOP brace for nail-biter in Georgia                                               1
Pippa Stuns In Romantic Floral Gown at Friend’s Wedding                                 1
Hiker Vanished In The Appalachian Trail, 2 Years Later Police Discover What Happened    1
Name: headline, dtype: int64



In [58]:

    
deduped['headline'][deduped['source_class'].isin(['center'])].value_counts().nlargest(25)









    Out[58]:





27 Stars Who Died And Not a Word Was Said                                           91
21 Celebrities Who Died And Not a Word Was Said                                     90
Men, Eliminate Your ED (Do This Once Daily)                                         90
Remember Hurley? What He Looks Like Today Is Unreal                                 75
She Never Mentions Her Other Daughter, Here's Why                                   74
Here Is How You Upgrade To Business Class                                           70
Celebs Who Died And No One Said A Word                                              68
How to Fix Cracked Feet                                                             67
Forget Social Security if you Own a Home (Do This)                                  63
He Never Mentions His Daughter - Here Is Why                                        52
We Can Guess Your Education Level with Just 10 Questions                            46
#1 Tinnitus "Trick" to Stop the Ringing (Doctors Are Speechless)                    46
20 Final Photos Taken Before Tragedy Struck                                         45
New Jersey Landlines Get Replaced (But Not With Cell Phones)                        42
14 Times Lotto Winner:Do This Every Time You Buy A Lotto Ticket (Win 1/12 Times)    40
31 Stars Who Died And Not a Word Was Said                                           39
Barron Trump's Leaked IQ Shocks the Nation!                                         36
This Simple Skin Fix May Surprise You                                               35
He Was a Huge Star, but when He Passed Away Nobody Said Anything                    35
She Died & No One Said Anything                                                     32
93% Of Lotto Winners Do This 1 Easy Trick Before Buying Lotto Tickets (Try This)    31
17 Actors Who Are Gay - No. 8 Will Shock Women                                      31
3 Signs You May Have A Fatty Liver [Watch]                                          31
Men, Try This Tonight to Fix Your ED!                                               30
How This App Can Teach You Spanish in Just 3 Weeks                                  29
Name: headline, dtype: int64



In [59]:

    
deduped['headline'][deduped['source_class'].isin(['center'])].value_counts().nsmallest(25)









    Out[59]:





Man Fulfils His Dying Fathers Crazy Wish                                   1
Don’t be the last of your friends in debt                                  1
"The sheets were baby-soft right out of the box"                           1
10 States That Would Get "Most Educated" in the Yearbook                   1
FBI employees wear ‘Comey is my homey’…                                    1
Unlimited 1.5% Cash Back Plus No Interest For 15 months Makes…             1
29 Colleges with the Biggest Decrease in Applications                      1
Huckabee Sanders: 'Republicans are going…                                  1
Congress has the ability to build a better air…                            1
Iowa GOP chairman calls Republican senator 'an…                            1
Watch Israel TV News Online                                                1
The Most Luxurious Sheets You Didn't Know You Needed                       1
Dems see surge of new candidates                                           1
10 Big Lebowski Quotes That Will Help You Parent, Man                      1
Brooklinen Sheets Are The Best: Here's Why                                 1
Bill Gates Lives In A House That Goes Beyond Human Imagination - Photos    1
25 Celebrities Who Mastered the Real Estate Game Better Than Anyone        1
10 Best Harley Davidson Motorcycles of All Time                            1
25 Pairs Of Shoes Road-Tested By Fashion Buyers                            1
Kelley Blue Book Names The 5 Best Compact SUVs of the…                     1
Learn how to be food allergy smart                                         1
9 Apple Cider Vinegar Uses Men Love                                        1
Sally Yates: I found out about travel ban by…                              1
7,000,000 Are Going Crazy Over These Furniture Discounts                   1
Graham: Trump 'doesn't collude with his own…                               1
Name: headline, dtype: int64



In [60]:

    
deduped['headline'][deduped['source_class'].isin(['left'])].value_counts().nlargest(25)









    Out[60]:





Stairlifts are disrupting the multi-billion dollar retirement home industry - keeping seniors independent    173
Forget Social Security if you Own a Home (Do This)                                                           168
Here Is How You Upgrade To Business Class                                                                    157
Best Senior Living Communities Of 2017! View Pricing Here & Compare                                          109
Forget Social Security if you Own a Home (Do…                                                                 83
Thinking About Installing Solar Panels? Read This First                                                       65
Eddie Murphy's House Is Far From What You'd Expect                                                            63
The Most Common Cancer Symptoms People Ignore                                                                 55
Forget Social Security if you Own a Home…                                                                     54
Common Cancer Symptoms That Should Never Go Unchecked                                                         50
Veterans Hit the Jackpot in 2017                                                                              38
If You Own A Home You Must Claim Your $4,240…                                                                 38
(4) Major Heart Attack Red Flags                                                                              34
How to 'Fix' Crepey Skin                                                                                      34
Why Doctors In The Know No Longer Prescribe Blood Pressure Meds                                               32
See Inside the Luxurious Senior Apartments in Clifton                                                         31
Stunning New Luxury Sedans Now Available!                                                                     30
If You Own A Home You Must Claim Your $4,240 Before Time Runs Out!                                            28
Break In The D.B. Cooper Case                                                                                 27
Could This Be The #1 Trick to Reverse Hearing Loss (Do This Tonight)                                          25
9 of 10 Senior Homes are Miserable. Here's the Top Ones in Each Category.                                     25
The Surprising Guest That Johnny Carson Couldn't Stand                                                        24
Don't Forget To Do This Every Time You Turn On Your PC...                                                     24
Tiger Woods' Yacht Is Far From You'd Expect                                                                   23
See Inside the Luxurious Senior Apartments in New York                                                        22
Name: headline, dtype: int64



In [61]:

    
deduped['headline'][deduped['source_class'].isin(['left'])].value_counts().nsmallest(25)









    Out[61]:





Hawking Reveals Shocking Prediction That Could Change Humanity – Daily                                     1
Tractor Supply Reveals the Truth About Chickens                                                            1
Nicole Richie And Joel Madden Finally Reveal Their Gorgeous Home                                           1
Get Monday's Best Friends & Family Deals on Ladies' Lingerie                                               1
The Definitive WWII Planes Quiz: Can You Ace It?                                                           1
F 22 Raptor Does Things Scientists Can't Figure…                                                           1
Scrutiny of Jared Kushner's Russia…                                                                        1
We Tested Nutrisystem: Here's What Happened                                                                1
Missing Disney Worker Disappears. Cops Uncover Truth                                                       1
7 Reasons Seniors Should Stop Going Brick and Mortar for Eyeglasses                                        1
Get a Free Pillow With   50% Off Set of Beautyrest Plush From Mattress Firm  The Purchase of a Mattress    1
Quiz: Can You Identify These American Icons?                                                               1
Take the Ultimate Civil War Quiz and Find Out How Much You Know About Your Country's Past                  1
How Much Do You Know About the…                                                                            1
Bruce Willis Still Regrets Giving Up The Role Of His Life                                                  1
THIS is What It’s Like to Sleep on the…                                                                    1
This Doctor's Surprising "Cracked Feet Fix" Is Going Viral                                                 1
Best Humongous Domestic Cats                                                                               1
Nina Hartley was a Superstar in the 80s, But Where She Ended Up…                                           1
Backyard Mower Pull Goes Wrong ( See What Happens Next )                                                   1
Quicken Loans Urges Homeowners To Switch To A 15 Year Fixed                                                1
Can You Identify These 50 Presidents By…                                                                   1
9 Things It's Ok To Hide From Your Significant Other                                                       1
One-Hit Wonders of the 80s: Can You Score Better than Average?                                             1
How 2 Boston Grads Are Disrupting a $19 Billion…                                                           1
Name: headline, dtype: int64



In [62]:

    
deduped['headline'][deduped['source_class'].isin(['right'])].value_counts().nlargest(25)









    Out[62]:





Nicole Kidman's Yacht Is Far From You'd Expect                                      365
Tiger Woods' Yacht Is Far From You'd Expect                                         140
Watch Obama's Face at 0:33. This Leaked Video Will Destroy Obama's Legacy           138
We Can Guess Your Education Level with Only 10 Questions                            132
Triple Your Accuracy With This Weird Shooting Technique Used By Seal Snipers        123
Born Before 1969? You Could Get an Extra $2,194 Monthly with This                   105
Goldman Sachs & World Bank Confirm: Us Dollar Will Be Worthless in 100 Days         103
Search For The Best New Pickup Truck                                                 92
Hemp Company Releases Legal CBD Oil Across All 50 States                             83
This Is The Shopping Site Amazon Doesn't Want You To Know About                      75
He Never Mentions His Son, Here's Why                                                73
Exclusive: Massive US Invasion of Syria Has Already Begun " Alex Jones' Infowars     71
LIVE: Russian Leader Calls For Retaliation Strikes Against US " Alex Jones' Info     69
The One Thing All Cheaters Have in Common                                            66
Disturbing Video Evidence Proves Obama Should Have Never Been President.             60
New Jersey Landlines Get Replaced (But Not With Cell Phones)                         60
Ever Googled Yourself? Do a "Deep Search" Instead!                                   58
Top (5) Medical Alerts Best & Worst Medical Alert Systems.                           53
The US Citizenship Test Question That Stumps All Americans                           50
Malia Obama's New York Apartment Is Disgusting                                       43
Why Metformin Makes You Sick (WATCH)                                                 40
9 of 10 Senior Homes are Miserable. Here's the Top Ones in Each Category.            38
New York Landlines Get Replaced (But Not with Cell Phones)                           35
Diabetes Breakthrough That Was Silenced by Drug Companies (Try It Tonight)           35
Best Senior Living Communities Of 2017! View Pricing Here & Compare                  33
Name: headline, dtype: int64



In [63]:

    
deduped['headline'][deduped['source_class'].isin(['right'])].value_counts().nsmallest(25)









    Out[63]:





Scary Common Signs of Pancreatic Cancer                                            1
Most Fearless Warriors That Existed Throughout History                             1
Is Networking And Partnering The Hardest Part Of Your Job?                         1
12 Must-Do Experiences in Las Vegas                                                1
Learn to Identify the Black Widow, Brown Recluse and Aggressive House Spider       1
10 Dangerous Secrets About Vitamins and Supplements                                1
Steve Mnuchin in 60 seconds                                                        1
Photos Captured During World War II Reveal...                                      1
21 Life Hacks To Create Healthy and Happy Lifestyle                                1
40 Futuristic Warships You Had No Idea Existed                                     1
This Place Is So Forbidden Most Didn't Even Know It Exists                         1
Man Discovers A Huge Secret In His Own Yard                                        1
17 Animals Who Were Totally Caught In The Act                                      1
Few People Can Name All of These Countries. Can You?                               1
Nicknames Quiz: How Many Presidents Do You Know?                                   1
Really? These Are 21 Foods That Are Actually Good For You                          1
Who Doesn't Love a Good Food Truck? Watch as Flavors are Turned into Chips         1
Best Shot Ever? (Watch What Happens)                                               1
The Most Dangerous Species On This Planet Today                                    1
Exciting New Pasta Recipes From Giovanni Rana                                      1
Behind the Ancestry Commercial: 'Livie From All Nations' Uncovers Her…             1
When This Captain Collapsed, The Way Soldiers Treated Her Left Millions Stunned    1
Try Not To Gasp When You Find Out Who His Partner Is                               1
11 Awesome (Kid-Friendly) Foodie Experiences in California                         1
Man Finds Baby Elephant, But When He Finds His Mother                              1
Name: headline, dtype: int64



In [64]:

    
deduped['headline'][deduped['source_class'].isin(['tabloid'])].value_counts().nlargest(25)









    Out[64]:





28 Pictures That Show How Crazy Woodstock 1969 Was                              139
Triple Your Accuracy With This Weird Shooting Technique Used By Seal Snipers    137
Kim Kardashian and North West Turn Heads On The Red Carpet                      126
Trump Voters Shocked After Watching This Leaked Video                           125
10 Surprising Things Guys Find Unattractive                                     114
9 Hair Mistakes That Make You Look Older                                        100
What Tiger Woods' Ex-Wife Looks Like Now Left Us With No Words                  100
10 Features That Attract Men The Most                                            97
After Losing 220lbs Rebel Wilson Is Gorgeous Now!                                96
10 Tricks To Always Look Good In Pictures                                        95
Anthony Bourdain Relieved to No Longer Pretend About Marriage                    87
New Pics Show Malia Obama Locking Arms With Gorgeous Guy                         86
Here's What New Dental Implants Should Cost                                      78
Stars Who Haven't Figured Out They Aren't Famous Anymore                         73
6 Clothing Items Every Short Lady Should Own                                     70
Janet Jackson Shows Off Weight Loss at Divorce Court                             57
Have You Seen These Top Senior Apartments in Clifton                             56
93% of Americans Won't See What's in This 1944 German Photo [video]              55
This is What Tiger Woods' Ex is Up to These Days                                 54
What Men Find Attractive in Different Parts of the World                         52
Amal Clooney's Stunning Pregnancy Style                                          52
Why You Should Never Wash Your Face In The Shower                                51
Famous People Who Destroyed Their Careers in a Matter of Seconds                 50
How Dr. Oz Disappointed Us With His Double Life                                  50
Why Hannah From '13 Reasons Why' Looks So Familiar                               49
Name: headline, dtype: int64



In [65]:

    
deduped['headline'][deduped['source_class'].isin(['tabloid'])].value_counts().nsmallest(25)









    Out[65]:





Kelly Osbourne Gives Update on Her Parents' Relationship                                                                1
Find All The  Accessories You'll Need To Own At Bal Harbour Shops                                                       1
Zendaya Lost it After Rihanna Shouted Out Her Met Gala Look                                                             1
44 Never Before Seen Photos of Famous People                                                                            1
Tom Cruise's $59 Million Mansion Will Take Your Breath Away                                                             1
Remember Her? What Rachel Ray Looks Like Now Will Shock You                                                             1
27 Sports Photos Taken at Just the Right Time                                                                           1
In 1920 Two Feral Girls Were Found Alone In The Jungle And Raised by Wolves. Their Story Will Leave You In Disbelief    1
Top 10 Most Beautiful Lakes In The World                                                                                1
11 Gorgeous Nail Looks to Try This Spring                                                                               1
Meet the Cutting-Edge New Acura RLX - Build & Price Yours Today                                                         1
Lady Gaga Says Rihanna Was Best-Dressed at This Year's Met Gala                                                         1
Health Conscious? Get the Best Personal Blender on the Market Here                                                      1
Why You Recognize Stick From 'Daredevil'                                                                                1
Subway Rider Allegedly Picks Wrong Person to Rub Against                                                                1
16 Silver Foxes We'd Totally Date                                                                                       1
For Years This Daughter Left Babies At Her Parents’ Doorsteps… 24 Years Later, They Finally Find Out Why                1
Inventions We Can't Live Without Today                                                                                  1
These Husband's Reactions To Their Wife's Ultrasounds Are Too Precious                                                  1
What Veruca Salt from 'Willy Wonka' Looks Like Now Is Insane                                                            1
Top Results For Payday Loans                                                                                            1
7 Steps to H elp You Build a Safer Workplace Culture                                                                    1
Everyone Who Saw Her Wondered About Her Ethnicity. A DNA Test Finally Solved the Mystery.                               1
10 Questions To Ask Yourself Before You Dye Your Hair                                                                   1
A Man Rescued This Feral Dog From the LA River. Its Response Was Startling                                              1
Name: headline, dtype: int64

Finally, we wanted to see if any headlines had more than one image. Let's check a few.



In [66]:

    
def imgs_from_headlines(headline):
    """
    A function to spit out all the different images used for a headline, assuming there's no more than 50/headline
    """
    all_images = deduped['img'][deduped['headline'].isin([headline])].value_counts().nlargest(50).index.tolist()
    for i in all_images:
        displaystring = '<img src={} width="200"/>'.format(i)
        display(HTML(displaystring))



In [67]:

    
imgs_from_headlines("Trump Voters Shocked After Watching This Leaked Video")



In [68]:

    
imgs_from_headlines("What Tiger Woods' Ex-Wife Looks Like Now Left Us With No Words")



In [69]:

    
imgs_from_headlines("Nicole Kidman's Yacht Is Far From You'd Expect")



In [70]:

    
imgs_from_headlines("He Never Mentions His Son, Here's Why")



In [71]:

    
imgs_from_headlines("Do This Tonight to Make Fungus Disappear by Morning (Try Today)")

Well, that was edifying.

Export the data



In [72]:

    
timestamp = datetime.now().strftime('%Y-%m-%d-%H_%M')



In [73]:

    
datefile = '../data/out/{}_native_ad_data_deduped.csv'.format(timestamp)



In [74]:

    
deduped.to_csv(datefile, index=False)

Finally, let's generate a json file where each item is an individual image, and for each image we are listing out all the original sources, dates, headlines, classifications, and final locations for it.



In [75]:

    
img_json_data = {}
for index, row in deduped.iterrows():
    img_json_data[row['img_file']] = {'url':row['img'],
                                 'dates':[],
                                 'sources':[],
                                 'providers':[],
                                 'classifications':[],
                                 'headlines':[],
                                 'locations':[],
                                 }



In [76]:

    
print(len(img_json_data.keys()))



In [77]:

    
for index, row in deduped.iterrows():
    record = img_json_data[row['img_file']]
    if row['date'] not in record['dates']:  
        record['dates'].append(row['date'])
    if row['headline'] not in record['headlines']:
        record['headlines'].append(row['headline'])
    if row['provider'] not in record['providers']:
        record['providers'].append(row['provider'])
    if row['source_class'] not in record['classifications']:
        record['classifications'].append(row['source_class'])
    if row['source'] not in record['sources']:
        record['sources'].append(row['source'])
    if row['final_link'] not in record['locations']:    
        record['locations'].append(row['final_link'])



In [78]:

    
for i in list(img_json_data.keys())[0:5]:
    print(img_json_data[i])









    



{'url': 'https://console.brax-cdn.com/creatives/98c6400e-f2fc-4c28-8e00-6c45914e36d5/TB15_1b309a68a23702cb95e743cea5d60029.600x500.png', 'dates': ['2017-03-27T12:59:09.279Z'], 'sources': ['http://tmz.com'], 'providers': ['taboola'], 'classifications': ['tabloid'], 'headlines': ['20 Cool Moments From Joe Biden’s Time In Office'], 'locations': ['http://scribol.com/a/news-and-politics/ways-joe-biden-made-vice-presidency-cool-again-americas-uncle/?utm_source=Taboola&utm_medium=CPC&utm_campaign=Joe_Biden_Cool_VP_US_Desktop&utm_content=tmz']}
{'url': 'http://cdn.taboolasyndication.com/libtrc/static/thumbnails/b13e719e4aff1daf7284c9bdb61e65a1.png', 'dates': ['2017-03-27T12:59:13.038Z'], 'sources': ['http://tmz.com'], 'providers': ['taboola'], 'classifications': ['tabloid'], 'headlines': ["25 Pics Donald Trump Doesn't Want You To See"], 'locations': ['http://detonate.com/pictures-that-trump-would-rather-keep-secret/?utm_source=8b4&utm_campaign=8b4_US_desktop_Trump_12_54f7_20160725_mm_3407&utm_term=tmz&utm_medium=cpc']}
{'url': 'https://revcontent-p0.s3.amazonaws.com/content/images/1490017108.jpg', 'dates': ['2017-03-27T12:59:15.114Z', '2017-03-27T12:59:16.920Z', '2017-03-28T05:08:24.588Z', '2017-03-28T05:08:25.939Z'], 'sources': ['http://worldstarhiphop.com/videos/'], 'providers': ['revcontent'], 'classifications': ['tabloid'], 'headlines': ['Do This Tonight to Make Fungus Disappear by Morning (Try Today)'], 'locations': ['http://japanesetoenailfunguscode.com/?aff_id=41345&subid=3c2i08hndnwt', 'http://japanesetoenailfunguscode.com/?aff_id=41345&subid=3clq0j7vog97', 'http://japanesetoenailfunguscode.com/?aff_id=41345&subid=355sv150j7ov', 'http://japanesetoenailfunguscode.com/?aff_id=41345&subid=3ke04pkkc34k']}
{'url': 'https://revcontent-p0.s3.amazonaws.com/content/images/1489682572.jpg', 'dates': ['2017-03-27T12:59:15.237Z', '2017-03-27T12:59:17.051Z'], 'sources': ['http://worldstarhiphop.com/videos/'], 'providers': ['revcontent'], 'classifications': ['tabloid'], 'headlines': ["Here's What New Dental Implants Should Cost You - View Pricing & Dentist Info"], 'locations': ['http://gaindentalfixdeals.com/?affid=1016&s1=10949&s2=1801868%7C239864%7Cworldstarhiphop.com', 'http://getdentaltoothfixdeals.com/?affid=1016&s1=10949&s2=1801868%7C239864%7Cworldstarhiphop.com']}
{'url': 'https://revcontent-p0.s3.amazonaws.com/content/images/1486415171.jpg', 'dates': ['2017-03-27T12:59:15.614Z', '2017-03-27T12:59:17.480Z', '2017-04-29T04:22:47.966Z', '2017-05-15T10:55:38.925Z', '2017-05-15T10:55:39.281Z', '2017-05-15T10:55:39.952Z', '2017-05-15T10:55:40.263Z'], 'sources': ['http://worldstarhiphop.com/videos/'], 'providers': ['revcontent'], 'classifications': ['tabloid'], 'headlines': ["Michael Jordan Has Pretty Much Given Up on His Son, Here's Why"], 'locations': ['http://trends.revcontent.com/click.php?d=vJdwplKu0pUY0G8mZgW7%2BfkWm%2F8rSKsQQkXQbHQYBgW3pRycVsQRgTsyi3%2FtsV6I4lap%2BjX9h1%2BEbLcUlqTQMVSNfHQQkbUicfWHb7dw91dD0inXxnglXt10FAQWZjo1Larx5KRm92nP9arlHHZz%2BdyE9Tn5guObB8L%2FJp0Dt57DtRF%2Bfok8%2BfLSpJtcjLFjO1r35UKJAuSO4bmpYbB109TRS1lWZHUtRsO0N6DTib1O4c7Cn7iEWlC7iWej6AASi16lKmBEyLqQYrzxjEwaTbWZuqglYDO6XYRqG%2FyyuQ%2BPUcik74RbX%2BuOIkungdlYncPD0dXrvhTRETfRTPb8yoZBMt7o2VPDp0qHQXHsUiJlZGHe3MWaSTXEQuYxs2U1nLhyS8NlxIo3TAJi41W2ko8JSm2oMSb48e1EVcCuKL9Ep5cB4IcwcEyc6tJvwJRH8GMfuSYVHLMatxlsKgRlvo9snwlOIEY95fOZrXoO9B84ebMGPeUfFAuDmiK8mklUF%2FsMXmkh7sPSD7uuZyVvRRPTVC%2FfaPjZuIhk9o%2FsETyg9v7kvxHi%2FUZ%2FcN%2F6ujf%2BD5QZ61baXRElk8xvSFTDpqLW4lBTNTaJ19kKloFzPuO6dHqkLOPactBW06HeOQp8%2B5rt4xTIg5%2Bkc9ndm8mTmfpkP1hP3TeEa%2F%2FjxYV1CkXG69pOW5Mbp3b%2F03%2Fhp78P1bp1%2BtlhCORKjyGSvWfl2YTMg4bnUQddHrgw9BX0diQ4ZnuQwB6Lt7oUADBf7mhl', 'http://trends.revcontent.com/click.php?d=Bh%2F%2BvKNq0Yge75HL7eZKGEBZFXmDwUSro%2FYm%2B7dIn8ge57dHmw6JOzhz5BLk%2BlyWfl5DnJgcNB9tgTW4AabntGxJSZXpRSux0HYbxrK6BXX%2FhS6B%2BJ8RTkH9bSrvgb5THbidoOPo7HaY7Oak7vZdnRbRWJnm56YHPwQ%2Fm8dE2gm00rz1qdsEqrWHzVL3zTWuCp9imlBcXnMgCVuC2NbQl2pUrPghg3%2B8S%2BvGZzh71UUl51V8et0Ch6OemcQ1xs4%2FIKymJddu2XWF6yfqRQWXBGqUXUBXCjB9AB4J0DQfcwsMtrJX7lBPFQ1zq4ngC5MplmT5jt8GXDKx%2BP2sfPRnrc5NwLqKp93wXhKM24nR%2FUQE2b0iv7ojNe06yS7bGq4yRQmymVpombe9CMCkr3YIiGAPGIvMRmIJiDGLDCTFluqk39jiyYfJKB2guhWvPAqe7Yy%2Fo9r0fZxPsERR0DN0GtLqaRyIrR6GPavWquPWv4%2F1TJJxcbDpBcwy8A4TWYvatIjMYQUoEO24L6pd9nCEVKB3cw8BmxX0PFD4bHfCrh3CbyZmR41R9jJV6y%2FuOK3cRIig2s6Vtt8WtLVtvrXi1gm5WstuEXeaYT9z8vzk5OWn7JXFr%2BdAXjarDidQh9oi6pCeJtKgeqqmshOPA3fEF64cDkj1DsRksu8pJ5mazI9Gf5w9VykJnA87JVbg8LJFjSrqE9wOtMWyUkENOxrCNeHVU8s78LdMLMPjRZE8%2FuU5KSm85e28fW2%2FXTFVRzDT', 'http://trends.revcontent.com/click.php?d=R%2FMznJavHHk6hqB%2FW8DTG%2Fqoa5q7ipZx8O4qu6LSZUqEstCK4gNm%2BdUpwvX1yhqnrmTgZRSrbRf3ranJwsVstbshJvAOU6SKqJgqUwEJ4Y0RxhDO5Z%2Baxr5J%2B9SdPcipDFAsjIFjhCHkHyZL4QeKtuT4AYFsLkFHX1HKhHnhjCWJD0TZBpolzPVxaJ4mFhc0iYvISyHbeKMUyGM8i3iJc3pxaSJOVqmBFl%2BzJwHGnSF6UoXQy1HeHPsEa0PQ192X5bwb63CC6YAQIIKawp0HL%2FanYK7RlopuQ2RxuExb7W%2BWjYdxlUQfAU2Bf%2FFk9jhk58VCCoaGydZvxPFUuvgcudDAp3ALTYRQFNW8YqJlgTkDKxuXC%2FRyU0vHU%2Bv%2BxBFxLkKb6x%2F2V352WKOXGeIvXBd1NHKHQ%2BleUOLuTOcHVQ%2BWZCx6NLcd4jhh69Oj%2F4TLSMsSVgw0imAXXW1ismZc3Q0LodL3NO5%2BMh26EXbBjkakhxhjKEwBy5Z2mh9uAI6vZrgCf6cdjBUmkedDZ%2FwN%2FIWEsTgcSb0Lf3N1Yu%2FPL4ztOjaPoK85iCN0tmt5CNN%2FHu%2BeajrsrsEhNw7Th1qKmBWyK9T4AwC%2FoM4rYN5eyjLEiyqsk404oGVxPH6PzXRhoD%2BpTZZpFtBM5m0vpVGY%2FeloXNgGpiiXMZ1%2BqgJ0ZI0RteyS9c%2FcEBgkVhuQI5EaA1%2Bg%2BH518ZotG%2BHtzUt9E2ZLmziLlbRxlEdl1dYKIB0fyn%2Blu20YZCNJayCfzgc5XgOKYLTXZPhLkGBgvb0HL9zJ9c7xs62XPFXVIjeaV3U%3D', 'http://trends.revcontent.com/click.php?d=zT7csNf77I2ykPB6J5zpnBow7C0%2FgdQO%2FFJZHtHFxUROTLZAwaNhs5xukII%2FsS%2BkQTiX6YBklaN8GYePzEF%2Bytyh1TAp%2FJaz5S%2BBRq92tyA2GXS4iKVxOlPfGeQ6xw1w8IA07RXrr8AxL9ePw2WfP2DEiAzCdFmZU%2BdyZlkx9UoF44Tr0%2B25zrnwDBugZY%2FE2I8SjVi1E%2BxDOKV%2BiFMNjyj7GSkb7xB7ZB9t9JM46MOraK1AL4IKnVZJh8GATF6jot9x4%2FDvnjbGN84j3BfK5ymQC3q0%2BoQVF0Tcr2F7w%2FCZNGbhNERPVbWz3W0XG9cM0C1X6QB0Nx2YmTKOlg0v6OuILJxlB1YPUP59%2FoPcsHjcS0SHMunMXqE%2B1ZTOQItbOkBru%2Fe2oLm1Wknzc0%2BcELsFgKcd0PZ26zrzK59vTfDy1mku3sbmZCv7nHIwga3lX3igUapt%2Bfp65NCFhyODBCRLjxVtMRcc25N5lt9yYZj10Cv5SPY3CcowmR46ExaVFbWuWwVVGD%2FrG1TUXETKiEJXWGmY%2FtXVF8r%2FQrLz3RnpVbJEK7BYsgH7jDlosuoUX6bzBVyNGDwd8vLPlpRdI52qMW9ZfPVNEQkOYc97b4gE4zmG4PNnaVgKnE4uUBFnfxrFRpUgRl7JGnrKFMpHGJCoHgSr5ByfvRw6udPyDDP00RNHzqmwYnWS1WQaISUfUE5cZ39j4RdqFG4ulrp42y%2BQfMJAtRJw%2B7YHRFU%2Fr3PPMusmJnuOKRh1qxNjn%2F2I0ePh00sekak73PJH8QHLzLP%2FXvalFM3XKsNW%2F6N8tk0%3D', 'http://trends.revcontent.com/click.php?d=zEhzqSroRD193mvTp8qDeQ78%2BC5d%2Bt95UYzMige2O7ICjvDgETAoQDoJynnLbc66YwSU9DtjcNOK4uDi3kog3qt09Q%2BDTQ%2BaLqIIOTYub2cjlwOSvk6DoPdxZyoqvqvgOI7Ebt4tM5WDK%2B7ASYWHmW1d19eyB68MjkCm%2BFfRJrL9TVpOsuUNdB68%2F62vNHvTPvzrS9gSTfRytI55EDCXFgqLiYWUl2PQGB50sWyNotDCJmMe%2BXF2zVtqU6z%2Fkmli%2BThB7Yoe1RRxwLwDevhg0Z47vQWgHeDO%2FXYAjnrRikauG9apugaS%2Bg4SiEn9DSXwQMbz6krHAPuFcor2E%2BkENd2VXCrR1%2FaGWHJCsLHXfK5PYrIV09Ay2cURGIbahPjmDLNN%2B7vwssPjOEhibj9g8pOpDCQe1Cg1aTcmEafs9oBH%2FzGIRk31eJA%2FTYVWlTxFL336%2FyMK6sbTeug1Ek04Vbr05XNmQhRhd3L884xd7OiJTFsXhXIjPt7RgH0s5YQad%2Fbn%2F6kztu1A09wZUXAaI2k1uZ0nceFi844mTe68e528EMUSUsvnx7yacN0U4XA58%2BjG3EsGm507%2Fonjda1jzG%2BCF3tfQ%2BmtQq%2BNhVXwEQ9XGwZElfzAKiie2l7lBbxonGmb8w4WITs%2FRmuxqwCXxpg0A43XGRDi1KfhJqEvMkqOWIQGUqmOqbqL5bMCT%2FgoYnKxH96FqlNkAbMODgpA7TgQCNxkeLeqj7pYui5NgNP9Yyc%2FVYnB8YOgG1BtLtdqcAeS9u2oAzmHGjpwg%2BCW%2FxNMV8C8ZDrZdbZe3BIlZiE%3D', 'http://trends.revcontent.com/click.php?d=yhgcZoUtLD2nk6vdesk71n%2F1XgC0cbYmAOE9WH9Ui1FKFhvAy5wMaM2lOv8bjt8vlg8tBas7BGZfuoW%2BYYJ%2Bd8ome7RyySnzohklQ2ZxS%2BHLu0J%2B6tB95wRKyu%2BuV9%2F1I7U6vi8VbaFy8KKWGmQ4SmNXtRwAznmv%2Bh1B4Xgy6oKTWnpOt1LhQ85ukw7ckzfTKQXuxC%2FaUUDfhc6MRIcNmOQfYTURKv%2FYDV46B12qxma28MY1O2CdopxidA6llOM%2FyWDLHiftvjlTkRhJu1lATqdhSDU2NWK%2Fs7Wz933mgAreM5DWRyLUFyJm6Hg2ZS1s2BQCsLcDisx5ffKXdCbejydeIk6YkUut3kj%2BUfLL8GJ6PwBjNb3JTYVRZ%2BBzdgpdwZXWRMOSM5JK%2FcDryya5GpJHYIbU6HUXs6tbyEP6gMxfXveJvvHQ553PQg5CI8b07BBnb6ebs6DfWTkjCJXztaB3rfFuEyR2stgcHDKayoMuEOcxHEP1WNXOycgtTAuKqsFooJj%2BItYELkUGPAXzpBTwnSCmIV%2B06x02OsmzyepKLT3mvB4vm6w6RYb%2BTYlCY6CwCisCW3f0U%2BbnbQ%2BkfpwIpV6IudYfo%2F5zbCWj1eLqJXnnGdD67Q%2FL4CLjpcHiL1OudHGgOtGlToq82ovH8zBIZAJ8qFHdXUiYRnzCBDqL7qRXSDWMEIW%2FJScgqOqJ%2FLS18EYFMZJwYVCFoI03pw%2BhLS73cg5jFJQ%2BAlYGpo7AMDFP8dvkxJ0sXp93bDyaTrIphOLmeBr1EmpwE8nTN94JST2mghn0Ze1MCYXF%2Bzw%3D', 'http://trends.revcontent.com/click.php?d=7J0uNuYGBo92nwrA0hFHhWhYZEIHeRFos5DOeIZZNLepawY1CcdFLyUeD2LZmX5kx3lQ3iYyT4sqCjAReCNPdYTOCO1omwDWoSSltVzfX5S0%2FSXj%2F5SmuU%2Fv1H3aIy8Sbt%2BW9jVyQToUOfnxOKdQ92%2FreSiycG2PLCZXZOhCt%2FCMLXqkoOl8x3z3D4BNifXuL5UAsTMHKHbvgt1S3Bu66h55oQLGVCAv9nYoSBms%2FQZdo5HlMVO7VQl9IDfcnUsPz0E%2FG7zN7xs5Y6xb9k%2FAj4qOrc2YJj1w907pjSBfMhct0hX471jpEAyIfF7MMVLj%2FVSHmNBWcrqHFaE6TCCVkKFowkglACXJCpEkzVwaOf%2FsmyVvwKa0FY9D7qbjwWI4k6IwV8OSxDBmXSdVemzjHuFH5waovG%2ByHeiYPR656BNDCTW80CiQQXag1Rp36hb8hG%2FDwP34T46st4wQw73%2F6%2B40Of7OWg7W%2FrTWbt0pyuL5FRbWMYhbJOyCCPeJ6kjpoBiEbQLnUx7%2BObAXmVX1IwKvrqYzQnzulqXPycLjRo%2FyfzqmzvspFSnbNAIfy8br6PGPbBUVZDdLCAnstgPwY0SW85cIbaYL8cTApQeLKz9yNc6TdmrHpHrXJIEcuyaPQZm%2BxyGPuUcxFi4HQyxelv6C3WpYmhsOyqsO%2F9MxuBiU9lvOww5CkapbNKoGm2GEKE82JhuUHujCmgwlfSzw8%2BDn9csKkm6k7zyE7vGL3qAQcl2ZhJs%2FM1fw%2B%2BscKICf2Kwqz8lwmzkfAFQr2SHBPFJheZan9a4HvT4P%2BC%2FIBZQ%3D']}



In [79]:

    
hl_json_data = {}
for index, row in deduped.iterrows():
    hl_json_data[row['headline']] = {'img_urls':[],
                                 'dates':[],
                                 'sources':[],
                                 'providers':[],
                                 'classifications':[],
                                 'imgs':[],
                                 'locations':[],
                                 }



In [80]:

    
print(len(hl_json_data.keys()))



In [81]:

    
for index, row in deduped.iterrows():
    record = hl_json_data[row['headline']]
    if row['img'] not in record['img_urls']:
        record['img_urls'].append(row['img'])
    if row['date'] not in record['dates']:  
        record['dates'].append(row['date'])
    if row['img_file'] not in record['imgs']:
        record['imgs'].append(row['img_file'])
    if row['provider'] not in record['providers']:
        record['providers'].append(row['provider'])
    if row['source_class'] not in record['classifications']:
        record['classifications'].append(row['source_class'])
    if row['source'] not in record['sources']:
        record['sources'].append(row['source'])
    if row['final_link'] not in record['locations']:    
        record['locations'].append(row['final_link'])



In [82]:

    
for i in list(hl_json_data.keys())[0:5]:
    print(i, " = " ,hl_json_data[i])









    



20 Cool Moments From Joe Biden’s Time In Office  =  {'img_urls': ['https://console.brax-cdn.com/creatives/98c6400e-f2fc-4c28-8e00-6c45914e36d5/TB15_1b309a68a23702cb95e743cea5d60029.600x500.png'], 'dates': ['2017-03-27T12:59:09.279Z'], 'sources': ['http://tmz.com'], 'providers': ['taboola'], 'classifications': ['tabloid'], 'imgs': ['876aa5e83f6fb81a81908db3c02fdcc00d444000.png'], 'locations': ['http://scribol.com/a/news-and-politics/ways-joe-biden-made-vice-presidency-cool-again-americas-uncle/?utm_source=Taboola&utm_medium=CPC&utm_campaign=Joe_Biden_Cool_VP_US_Desktop&utm_content=tmz']}
25 Pics Donald Trump Doesn't Want You To See  =  {'img_urls': ['http://cdn.taboolasyndication.com/libtrc/static/thumbnails/b13e719e4aff1daf7284c9bdb61e65a1.png'], 'dates': ['2017-03-27T12:59:13.038Z'], 'sources': ['http://tmz.com'], 'providers': ['taboola'], 'classifications': ['tabloid'], 'imgs': ['d3a3f2f50c84529c08bb8314ae3aa66280f0cbc7.png'], 'locations': ['http://detonate.com/pictures-that-trump-would-rather-keep-secret/?utm_source=8b4&utm_campaign=8b4_US_desktop_Trump_12_54f7_20160725_mm_3407&utm_term=tmz&utm_medium=cpc']}
Do This Tonight to Make Fungus Disappear by Morning (Try Today)  =  {'img_urls': ['https://revcontent-p0.s3.amazonaws.com/content/images/1490017108.jpg', 'https://revcontent-p0.s3.amazonaws.com/content/images/1491743806.jpg', 'https://revcontent-p0.s3.amazonaws.com/content/images/1491743305.jpg'], 'dates': ['2017-03-27T12:59:15.114Z', '2017-03-27T12:59:16.920Z', '2017-03-28T05:08:24.588Z', '2017-03-28T05:08:25.939Z', '2017-04-11T12:47:25.298Z', '2017-07-04T14:03:38.203Z', '2017-07-04T14:03:38.680Z', '2017-07-04T14:03:39.098Z', '2017-07-04T14:03:39.535Z', '2017-07-04T14:03:40.034Z', '2017-07-05T04:05:58.389Z', '2017-07-05T04:05:58.903Z', '2017-07-05T04:05:59.421Z', '2017-07-05T04:05:59.882Z', '2017-07-05T04:06:00.396Z'], 'sources': ['http://worldstarhiphop.com/videos/', 'http://www.ibtimes.com/'], 'providers': ['revcontent'], 'classifications': ['tabloid', 'center'], 'imgs': ['e2bb63d58e09bae569a90f64de24c93a2d008e34.jpg', '0c628921539854b59c97995851be9ef8d1bdb696.jpg', '356b93a452abc42620956b0b72a29f25f15c33fa.jpg'], 'locations': ['http://japanesetoenailfunguscode.com/?aff_id=41345&subid=3c2i08hndnwt', 'http://japanesetoenailfunguscode.com/?aff_id=41345&subid=3clq0j7vog97', 'http://japanesetoenailfunguscode.com/?aff_id=41345&subid=355sv150j7ov', 'http://japanesetoenailfunguscode.com/?aff_id=41345&subid=3ke04pkkc34k', 'http://topadvice.website/301-ETW.SL-me/?voluumdata=eyJvZmZlciI6Imh0dHA6XC9cL3ZvbHV1bWhpdHMuY29tXC9nby5waHA/cz0yMTU4Nzc4Njc2IiwibmV4dFBhZ2UiOiIifQ==&s=2158778676', 'http://trends.revcontent.com/click.php?d=x7%2FQosVJ78PAIitL8tCajPcUh8Rc2SRaAbUDSPneni0gnlbfCkaGGR7azcwtsJMvQfzEaEFUx%2F47V04v6u4RrI%2FDZg%2F3za%2Be7PueAZNxh%2FzCDBhjyD%2BBVdaduiM23r1E1hal5ZF6DcOPNsTwVyGkweg1F%2BXaSc4rYEHIxIDkRfAfIM1Aj06JiILpLrSPbXepp1hq7iTyUr%2FsxUWRWtrQJiv54gcDT0RhlCZD04vJUcLmgmGyj1ZnMt7bLPsDgI1lvISdknrb5mbgrtzKkuW2kyAt%2BY3fWmZv%2BxhgkXE%2FPe97El%2FD2kpgL9pYspce8prOno9MBSa9vAIqByBeB2oS6lnUVCPONyGlWBmnJj4KCXF6G2f4kiBbFgMqLblQeE96%2BAGn2%2F%2FQRtoFCy662Q9dbMaOe4grCdp7J%2FUAUl76Ebt1WhLlH1kA9ZLPZdqJ9%2BVij40tv3QyP%2BkfoQIQg2f0gWGja0VGfkqarpSi%2FwHO8atdFlBgKGMdeCjqVMez8Xg1HA7LcG0sC2cTWJXhp2gsM%2BINOybrdANJN03fNI%2FLbaVH2A8xiLGDNCvl7nqC8o9P8v6Q5KUTwhkAXeHCtG58Jlu6ODd5fvrqgkD0iEPr3MG9fyRmeImL08ON8YTdqI0riTrMd51pyhpfc3DXW%2BxhyR0I7S7ZTBADb%2BnxNnt%2FsFdCpnuKOuF%2BZdNI2ldDICAatHqKjmIa1NyfEgDpGLTdQN9p%2B477CM0HPDiDC14OX9KGPoaQcBAWYM6DcM7lzegi6h1PPLT%2Bc4q893WEkPxyCC5fhkvcLK3DF2PL7YcUFzM%3D', 'http://trends.revcontent.com/click.php?d=oFL3JtTnzEUYufrfmBU8wXAK3sGRTM%2BJPH6qCbDE%2Fgi8NRt78nZ87fZK0WNXQNQsbt08Mi2xhW%2Bbnkno6CQTFWkkHKUyRBoZ8qXFNVlwQVx%2FAFQAOmfERiPZ7MYB2SMs6fGLZLLnNWjBtz%2FFGl54xKWLvs0lXiBySPZn2wgtHDwJGaSVwbOzLowxri26nWEAKqeq%2F29sQkZv0XWgXr9vPLdjK4Q6z0xmuVJn3ZZxT90GVKPOdkZJLvfEpSPU0QhFXVxpghh3SxZwSmEvRyt4hop3JD0uQD7shQCmk0INiySX%2FOn%2FZBOSeH9%2B6MTem5pnS%2FaxT36nUFuEu7zNlfWuXXB75%2FTpHwzi2OG9%2F7qC9s4CWARIk7ZWHHZhQ%2BtR0t%2BNTDOOf5ZIzvGFjKSDQdQAP5IcMeabf%2FgxF8ERq3X3uemqgkPed4Bn6oghLavkhKH7fK7%2FJyMBK3xLVJTsX9iwwigL42VLb9bIgq6MgkCgrJWoqxQjN7U7NVCXPEgXrdpaunv093r7dXC8Fp%2FUfEKVqpCMZBj%2BMbJW3oeMrV%2Fg9864OjOC4OvAZJ9u3JQnWEYqHr7fzd8vXK9wZzUnqtQArGrKiDGAmLIl15OOmpOFLd%2BBga4AX7su3DDNch94AawRXGVNmxqVo%2FqCmkjoT28cSv%2BS%2BpEb1cER19u%2FuFNFEty4mhS96pfYFqH9jVBpg4VNQrIrdCB8sV3WF5OX%2BzQF%2BCM536FZennU1WeNAJ4hXz7mv4j3Ep0A9f%2FBVU30YlWYuGokqvnhKHMhpD%2Fzr66%2B0nBPpiQEx7z8hsvKM8u9S2w%3D', 'http://trends.revcontent.com/click.php?d=Ud%2FHgIQC%2BjI0GSC0TKZxLbEw%2B7e1R59OFqlidFC8Y55k3pCPLh8Y8MRGsZNn%2BAueZ1%2F6ifYpBlnwNdlKqLykY%2BWmqGsIRkwFUNe71EY6mT7bKYtHX8Ttdx0YkWDJMsgwEjjVV21FjmOTfZZXNe4pVxNsOCrFK8P8N5IXOQKoxVj3qT8keOWT9F98786adtf4g0oQGwJYzYD4kD67g0eDYneXA1kaKWUkjh3pkiJYbbCxJatkJ0MXA6H1yu2G%2BO%2FRSnDy6Bg0DtkKOkR0HFf45GgLndVzZrzItPx%2FnBeoo85HSsxvTy%2BDIzicUpirDt1kLhAahXZbBjIH92UfeTFxmwY4Q37b7cm6P0KS%2F7MPqr1S7glBk8pGbYicnHGjDcq3%2BdNPG8pWK%2Fu0wM0xuCNBYiM6NysfKV487x5GEEP2MM6KJO2MNitdQN%2BRx3V0gy%2Fpx2RvlZJfIYh0pT%2FHs5B0jatOrxkr7iRPaeAbqkwSua1aL7RCJXMZcW9zcvDkLYVNZ%2FTndUFmTxGoM8L%2BbsuUW45gcqQ4BylpXw3CkF2WY0ds1iGAs8wjhYM9IQ48sT%2FES2yyokdYmuZdrlIoOYPJWpYRygr%2FiKQBrhKPSWdbvSFR1qb8AuZCujBcUBv3i2729vk9sX0FH9yJZXX1SsAbA%2FnlsMfmagFTIOfcIvjZvtSG71WxvJidWkvvR7NqN%2BaNw6Ui%2FR65DLkO85hXiC6DOrwCakPs25j6yq%2FW%2B%2FCiZ6UgeE7MrNh9tBYMXeLn6czRE520XhfHyASO%2FEYfLoIF%2FI4VA2JWsZLIxsVX9J8%2BkKQ%3D', 'http://trends.revcontent.com/click.php?d=b5qdGdmj08D2mk6gieqw4cAQTs27tbxdOw9g6WIEcuURoaMGYrI2GtO1b64XLKSD9MD1VSiZMSDLFo%2BYXWkeS%2FQBI930SB8Ich%2BAl6mpFpVCu0jWHp1Siobex2DP%2B%2F6llXROjwHHyab6MMnvyf2ZvW7oikuN149vgEAKAuZIAXaSZdRYYfXNRj9GM6Q%2FGa9l19TuZgdmPV679DAQ2b9AcaR17EZPykkEnYD16WWVrhjYxY3TFDf%2BP1GFXMxpOTWdCKHuOWn7dpckvO44SKDP%2FJ4MSMjQkTRYx0BYrWknmWJfMTr7FlEenrmerNFT2m4XHLlu5m3Wf1mE44kjVcjv%2B3aRUyJbWiq3m1ixiyATrNjDMXIVjWIURUJwuwyHRZshpAk33ouRcn3xyIXWUbLYQT%2BKBDJtmS9GMHKrUrA8Rn9RzWFpFQ804Rbkx5Yq%2FlkfS%2FEClerMlPMZCccprVnXoU16WGnpc1mhp8ZQTKCMVgJSyRt2%2FgB6I2%2F%2BG4I6Mkont0Ao3zuNf4ttBnYrBSLIhZxarqWoekEKXlT%2BMpxYFTw%2BNRl1Ow3ZFoq2uPF0SXLEsSkgsexu1zEzO3uyRmaJPmxB43eRXYG%2BehstgjSHSJdF%2FEJ0qiMiV8o%2BojHDA9WHKSdqVOfG5qR6bIsBFrG%2FFAoh0c3lKT9RDrUMLjDHIhIrzH9YLTZOFn%2BaM5SJekVZxik7okArlG%2B03qEqgZO7gtOyHHYoF5WGoJRiUR6WodntUIdZ64%2BIyOOxwu9qGxboTrT0bUxCoiuG5B%2FVY96ANg%3D%3D', 'http://trends.revcontent.com/click.php?d=At17qHdUIXeTti4sK7tv%2FmrbyXYwjQKd7RE3FvW9CrkA63gC%2BsL9nYEaWF4aVF0FvADDkZAVv08eVer%2BYWQzP%2BzF66QM2BQhICZfC2kRiE7nmDgqd7AGP9Hw%2BeePRzqzXxKyGFTAIuijV8JrD3QtViCc79v43GFjHJ3LYMl%2F8FApF7OK%2B%2BWGkDjgMXMPqea%2Bqk7jQGuqqMg39TCpjN0zxRLMuziPz0Rg4oUJin8H8a%2FRQv2bhGy0iPJJXJ0kAjDGOsITL88xDO1rYblPb9rzSF9N7nLpjZlLuz0PPMsC4q8RxiGQ5JKhxIHBaU9qGuKkWRpC%2F4mXuXX8st6Ap99IXXwsTPgXugF%2BIh8lQ%2BOQDblYX01MeMD2GvCOS6t7a9WaIXBTM3dhzv2UOBEGiN40sdAmfUP7s0cg21iecRvMnq8nqbFBKrBFcKDrQFPene8M9BYa7OPG8Twd9Ht7pripWfbykxu%2BhNC2mViZcFP%2FhQz10Kkrq5lkvwyfEdzuHPILo4MJocjtKvH1C1HMlEA6TQJAJwQyO8K3KILSKVqQAQIgqkh7zaCRo75rzeo9qY7ArW%2FavZBbM08GzCj%2B%2Fc6zhsvaCjvlwNAvwNkKBPO4UAQSwY3TVvSzahqk9ZMkciGnQLRlDsWcbanLDsqqbG2EV4R%2Bl5c%2FGRf7E0SobESTvn6vTm49KP%2F6CcbwbY9ijYRDjJrVkDuTHRp9A8DqO4i1aG9MSUKXDxK%2FOy%2FZJlfAQ50tdz7oCKUEjLLMHGtRUF1RQPN1GvpPN07CDdoGnFaVYw%3D%3D', 'http://trends.revcontent.com/click.php?d=H7p9jvqg8Y0rPriUusTzzIkG7hn23JZEoKBAiTFioIIXdxfCWPvLrzfSdQ1WeuHPvSwyoQUxAsR4dc5PejgiGnoZU0du3Xoy1lR5E7jcy7EJYbVMZeWx90%2B%2B6o2SkG9SZcFmqRIKVExlGtaa5G7OXpMBequ%2FYY0NYllRPQ%2FtV5%2ByY5aKE1cRCbUr4%2Bn5jQEoYOKZfeGcCgaDwojxwZ7MW6V7vYe0tKlHSd1ABG57tilNjwSho3W2fLBbVR9oLuRFKrlxSGTeGSF7lXe%2BUHtKZDtAyWTCBUW0foFKzde9Wv4ejPaRMUU7DgAio%2Bzey0YTj%2B%2BIc4GEw5aTFU9u8UHxcVOyDFqItlGGrpMo9a9qZAOxSPW%2FnfDmJ70yaBesd%2FeUiL2KBjyrpUbkqtAd0oVpwmkzu73SojqCeuo3gTJZtRnunE4cXNVEPmk9wiuSt3miUOAIXwtR5%2BJeuUeJxoO1upv1K0M%2FC%2F4ERdpV5%2FKYb1u%2BdCabqYbgqL1Ey%2F6ouMsaT8NM9UTM9oJgnorIzQSHUDVeao8n4FTid3xm8derPFI8nRsMgArbzSbOfo1x2D2x27V6HoD4h8O4HzcV%2BWejsD%2FzOFBYjW%2FVFJU9x5EU5%2Fzm0wT6ckNT3mH6eOZrClgUFd5AneySSSD%2Bau29NIuarb91xgvVylFBtRdSqYl7Jl5pN0BtTOtoCwTBnQmo50DWsLFReYwObo8p%2BmMYsrQLEBhcvEkqSVXFs9uu9hnHzCFAKO2AZO61tmScf64eYR5T97Fg16ilyjVQOpa4gehd3P%2BZBY9Axph%2BTc2lxh1oBAc%3D', 'http://trends.revcontent.com/click.php?d=k93fr5s6%2B9BTtZsxe%2FGhrSj5bBYkmji%2FO%2FkcXN6qyNwROVcuMZHXA8UQrIS7GeDmVtKvpbQNujTBJzsWdz%2BO01VNxa%2FRVF2SXsncjMjHvNKElb5KNo1Zg2gf71aXqU3l68%2FFqCnJIRWNFnXQRg94igBgm%2B2lJqzrsErPUH%2BB4G5aYWTpencZqKwB0yuuHfwBw7hNrOI0VB6%2BCaYORLokQbZE7Fkw%2BarOQz94DaLMF5%2BO%2FAIKKVuEJIvNukoYwyX1ZylZLM8C%2FsBJrjTHaLuX5AnsWcLGFiAG2g4XiKaowm%2F2HbF00xoaL36p7FsWY0qitqcMG34E7LBUWV%2BLr6GPvoNgglItEjH0InL6c%2FKcBReQtyovQGovRS8nPprtVMu197LTvuNspjA5CCRC9qlsdeDgZ1qC18Ymi8wniAKzBmBNsys475IDkKRImvYq59f3fwJKO3u1V2MLhOIC%2BfKoMIttOqzt2N6LUX04SbEW6nehdMDKUwsj6aqF2l8QjrRWGE8yedY1jVQn1J9t3STYM1zkd4wGBO3NML2kzAmKdoK7rfzIubd0RcachqaWOuIcgnL6rdNHBJE%2FyAJQRWODshZoLwi%2F0j0f3ywqgD%2BtPiNBtcpUSyEGRNQrgsh7j6bQ9Nnib3bqC2bsncg9%2F9DXIGuWkpER2%2BAEzRjWrNKrpXkGQPeT3SL70zzFlRRj2LKmTptU0KpX9Ade3OgOnTxM%2B0bHYocBhgcumHr2xHM9aIvJC0myH68fwda9E5dTXooyZPwc2E0kyc%2FxKMxzcjeoSTFbRocgsm7bwPzIFVq71XA%3D', 'http://trends.revcontent.com/click.php?d=D7zVhol0pjaqu0QuKSDNf4tzwdp%2FAgTXGJLJieK2e39fGbQv8NxMNDHmRLR1c0WlUwJ1S293lwKpnVRjOs66UFs9VTJasbuGHThLm1CKKe7WEnIMHwkQspXlXZgRtrXTxaISRnYnZASyJWSSvwfA5dzyWgOlrNK%2BTB1a95fpOKCsvBEZ4zC2UPVyP%2FoIXgqs6Sx5%2BCLjmWJR0TDFFQ85M9DL49tPjcAMWdj6MCCi1MmoM4XCTQjauySRkETmTBkM8ep12ZQdEfKilq7qxwsOHiVHAXcjnJwKkqNedn0K2mxT1mfhwNXjJhNFrUsqRNy7A41oesEIeVIrv9rPQ4Tp3jOMqtZN81KYjnxKOaj58jQrOn9J68KzWrynOz1lGDljI%2FqOU%2Fp8hpBHLs8zVe6Vw48DFQz8e2kN3jefoMLts4GDK19PqRHrt5HPmfHQiRaJboNsj%2F9%2B1l0XgZZXG192RPcmpTuSkI0nlPsCpDUhIfENp%2F63JO7%2FUGP8V4eG84i2MeLJzNIyZioG%2FsRmilRWQrwf%2F7KOb2iL04TdhkKkQI647VIAy12guVlxJNPWkINQYfdliaAgEO8%2FMmQub8x8dacFJmrOPaczsJqKMb7GIBNhTkaqvSi7vaOJ7XsXmc%2FFyIljE1d5BiuO1JDg%2B1%2BD%2BiSJhjZnlCC6FJ81HYmTvbykhP2uU3EJKx%2B5MrCa%2B5ZdXyBGOHkI6H9ANjTxVA1XYy0rXeq4%2FyxJbr2OWfv5JjXMOs4zZQtsYZgJx4NglqnCptaU5ToR1xATnHrcNjktfwxxr831AIX7Ae%2Ba8hL4e7c%3D', 'http://trends.revcontent.com/click.php?d=llkLgynpMSuoHPGQYCk5IeOyAOSTv7I1PwhbKHT42YJ5D2N1E8UUtiL4iOlZJ6db%2B97lsGuZrdSG4LXEF2vRPBMbDtk88%2FlLMmcs2cP2nxRmCoR31Xh%2F4hlCbEp%2B8pkQl1HfrPDKg3jZnS%2BeB1xogko%2Be3HOu43zaSq5IFQguexGkl686qBGFPQTPCO%2B03RE0kDZQaPgz%2ByUALkmEe9odQZt7AkKUsI%2Fjpa7bl%2BltvAoXcngsrWdbM4kZZz%2FoTQAgoP6mXxla11MuPIQHxK1L0Xbmvyv0FraRUOpSidH2xTAKcKO45UWCw9Pv9eMZ8pHl9y0U8v7B55kaZ3kQTYjpDAhl2ZZ5yZJKARUKwPoW6R9tlskBXeHQ6hRMS9x2DWUlTUC2%2FnRmrInRSJ5ShpX9e1JgLqkSXX2Ma34ZH5V5VOKxuYs0%2FKpSZ5FhmwT762%2B%2Foi0eTXBUnfBMvvJ8MW%2FIa%2Fjo0sWM4nMZONImcRB9WfU2fzVddocxhDeDFGT75GpDubzu7RnXXRs2SKV9QfssoXI2BacrcfVyhujMGiX3A3yiHkPQ96eBeN6S0xaSPEF9Nsqg9vMPGfcQ%2B98IiNGxweJuTWK8FCMeXvznCQQzeZ2d5XHHxtLXFKJrMm7o9w6VtuIAZLjM%2BA9SOO4Bz6e8PygcMkbY%2F2tp3JKFKMYznLQum%2BuNjT3YwMWyiTDinnyuXgMPQ%2BkXM46r4cGCJ4fq87WzSIoUgSGKs%2F360oLV9UyuBNeuWgmuCugPQGT92tRHxntw3bnmVDMPChnVCl%2B9gOv%2BDS7roxOXVjrY7woWjE%3D', 'http://trends.revcontent.com/click.php?d=VpyJ0dMuxKyTm4JrCI9q7xrUyZt6E7QHaLSPji6QkIE7FHCh9GSglnKawc7mzVhouPTQMS3sBprsZiVWlC%2Bk3%2BKkQUUKVDwIWYqTt5QSVRJkoleZWvQniF%2FiTkvjz9d9fsqLyrinoJdmapt8w3eAP%2BinP70yOec60KjE0rf8DjzgtMKowU36vdAESiwXM3%2B49lAkX30aHvJrle3qDo5mA2Z7Rgovjckqlq8fqy%2FXkHfHM7O0RCO00tBtLLEgYKpXeIMJPDpbQcVElaDH0YdcSNoIZYYbXPuhsa3Z%2FxTDOa4hYvY5HZjOUs0ziSHIEAgR7tjcRLeLVwZ5XKPF0B%2F7R2DYQPp9vCoUWKLWAn0Cf22CtBNljHAggGRfStQSApKw5G0QGRgEXh6Jo9KPGc%2F%2FOXvOq9wUhF5QqKRut9PyuT62oBOP4BEXtqACKG1bsjK2tqoVN%2BOawBP3W5RP1OzHHVrItquNeAU%2B%2BbyS9oh2Tc5W7fDTBykFACynvZYVT8qhnJegpWI1NxLDn%2BxFEPnC%2F1njKsijDxwZiN66smEVRBV9Qwe8Acq5GhxhzF7%2BWCNZGdFx7i3U%2BdxmSYHcpNT6%2F8NgqPkZqlsxQ9daa%2FMWBVvxUjr7spIgqaIFrYwSi0H1T7LqoHkfhKzgJRbKYFzQjg%2Bcbgaof5ZxQMReiFhMvQtfhGYJietI3Jhf6iuAB0nEEVEFaz78P%2B59heyhGMthJFxzK%2BcFXXINq8ORAVs5jaJyZq%2BZS3XQ1CcEpstJ6fQvBLrZephQ4iPaBf%2FTOJiAQ4ZlslqjILYC4FAieHC3hsU%3D']}
Here's What New Dental Implants Should Cost You - View Pricing & Dentist Info  =  {'img_urls': ['https://revcontent-p0.s3.amazonaws.com/content/images/1489682572.jpg'], 'dates': ['2017-03-27T12:59:15.237Z', '2017-03-27T12:59:17.051Z'], 'sources': ['http://worldstarhiphop.com/videos/'], 'providers': ['revcontent'], 'classifications': ['tabloid'], 'imgs': ['f70f91d2ebf37e35480fe4f689477406adf9243e.jpg'], 'locations': ['http://gaindentalfixdeals.com/?affid=1016&s1=10949&s2=1801868%7C239864%7Cworldstarhiphop.com', 'http://getdentaltoothfixdeals.com/?affid=1016&s1=10949&s2=1801868%7C239864%7Cworldstarhiphop.com']}
Michael Jordan Has Pretty Much Given Up on His Son, Here's Why  =  {'img_urls': ['https://revcontent-p0.s3.amazonaws.com/content/images/1486415171.jpg'], 'dates': ['2017-03-27T12:59:15.614Z', '2017-03-27T12:59:17.480Z', '2017-04-29T04:22:47.966Z', '2017-05-15T10:55:38.925Z', '2017-05-15T10:55:39.281Z', '2017-05-15T10:55:39.952Z', '2017-05-15T10:55:40.263Z'], 'sources': ['http://worldstarhiphop.com/videos/'], 'providers': ['revcontent'], 'classifications': ['tabloid'], 'imgs': ['ab914b86682795c6d6624707b22b06f88f0e551a.jpg'], 'locations': ['http://trends.revcontent.com/click.php?d=vJdwplKu0pUY0G8mZgW7%2BfkWm%2F8rSKsQQkXQbHQYBgW3pRycVsQRgTsyi3%2FtsV6I4lap%2BjX9h1%2BEbLcUlqTQMVSNfHQQkbUicfWHb7dw91dD0inXxnglXt10FAQWZjo1Larx5KRm92nP9arlHHZz%2BdyE9Tn5guObB8L%2FJp0Dt57DtRF%2Bfok8%2BfLSpJtcjLFjO1r35UKJAuSO4bmpYbB109TRS1lWZHUtRsO0N6DTib1O4c7Cn7iEWlC7iWej6AASi16lKmBEyLqQYrzxjEwaTbWZuqglYDO6XYRqG%2FyyuQ%2BPUcik74RbX%2BuOIkungdlYncPD0dXrvhTRETfRTPb8yoZBMt7o2VPDp0qHQXHsUiJlZGHe3MWaSTXEQuYxs2U1nLhyS8NlxIo3TAJi41W2ko8JSm2oMSb48e1EVcCuKL9Ep5cB4IcwcEyc6tJvwJRH8GMfuSYVHLMatxlsKgRlvo9snwlOIEY95fOZrXoO9B84ebMGPeUfFAuDmiK8mklUF%2FsMXmkh7sPSD7uuZyVvRRPTVC%2FfaPjZuIhk9o%2FsETyg9v7kvxHi%2FUZ%2FcN%2F6ujf%2BD5QZ61baXRElk8xvSFTDpqLW4lBTNTaJ19kKloFzPuO6dHqkLOPactBW06HeOQp8%2B5rt4xTIg5%2Bkc9ndm8mTmfpkP1hP3TeEa%2F%2FjxYV1CkXG69pOW5Mbp3b%2F03%2Fhp78P1bp1%2BtlhCORKjyGSvWfl2YTMg4bnUQddHrgw9BX0diQ4ZnuQwB6Lt7oUADBf7mhl', 'http://trends.revcontent.com/click.php?d=Bh%2F%2BvKNq0Yge75HL7eZKGEBZFXmDwUSro%2FYm%2B7dIn8ge57dHmw6JOzhz5BLk%2BlyWfl5DnJgcNB9tgTW4AabntGxJSZXpRSux0HYbxrK6BXX%2FhS6B%2BJ8RTkH9bSrvgb5THbidoOPo7HaY7Oak7vZdnRbRWJnm56YHPwQ%2Fm8dE2gm00rz1qdsEqrWHzVL3zTWuCp9imlBcXnMgCVuC2NbQl2pUrPghg3%2B8S%2BvGZzh71UUl51V8et0Ch6OemcQ1xs4%2FIKymJddu2XWF6yfqRQWXBGqUXUBXCjB9AB4J0DQfcwsMtrJX7lBPFQ1zq4ngC5MplmT5jt8GXDKx%2BP2sfPRnrc5NwLqKp93wXhKM24nR%2FUQE2b0iv7ojNe06yS7bGq4yRQmymVpombe9CMCkr3YIiGAPGIvMRmIJiDGLDCTFluqk39jiyYfJKB2guhWvPAqe7Yy%2Fo9r0fZxPsERR0DN0GtLqaRyIrR6GPavWquPWv4%2F1TJJxcbDpBcwy8A4TWYvatIjMYQUoEO24L6pd9nCEVKB3cw8BmxX0PFD4bHfCrh3CbyZmR41R9jJV6y%2FuOK3cRIig2s6Vtt8WtLVtvrXi1gm5WstuEXeaYT9z8vzk5OWn7JXFr%2BdAXjarDidQh9oi6pCeJtKgeqqmshOPA3fEF64cDkj1DsRksu8pJ5mazI9Gf5w9VykJnA87JVbg8LJFjSrqE9wOtMWyUkENOxrCNeHVU8s78LdMLMPjRZE8%2FuU5KSm85e28fW2%2FXTFVRzDT', 'http://trends.revcontent.com/click.php?d=R%2FMznJavHHk6hqB%2FW8DTG%2Fqoa5q7ipZx8O4qu6LSZUqEstCK4gNm%2BdUpwvX1yhqnrmTgZRSrbRf3ranJwsVstbshJvAOU6SKqJgqUwEJ4Y0RxhDO5Z%2Baxr5J%2B9SdPcipDFAsjIFjhCHkHyZL4QeKtuT4AYFsLkFHX1HKhHnhjCWJD0TZBpolzPVxaJ4mFhc0iYvISyHbeKMUyGM8i3iJc3pxaSJOVqmBFl%2BzJwHGnSF6UoXQy1HeHPsEa0PQ192X5bwb63CC6YAQIIKawp0HL%2FanYK7RlopuQ2RxuExb7W%2BWjYdxlUQfAU2Bf%2FFk9jhk58VCCoaGydZvxPFUuvgcudDAp3ALTYRQFNW8YqJlgTkDKxuXC%2FRyU0vHU%2Bv%2BxBFxLkKb6x%2F2V352WKOXGeIvXBd1NHKHQ%2BleUOLuTOcHVQ%2BWZCx6NLcd4jhh69Oj%2F4TLSMsSVgw0imAXXW1ismZc3Q0LodL3NO5%2BMh26EXbBjkakhxhjKEwBy5Z2mh9uAI6vZrgCf6cdjBUmkedDZ%2FwN%2FIWEsTgcSb0Lf3N1Yu%2FPL4ztOjaPoK85iCN0tmt5CNN%2FHu%2BeajrsrsEhNw7Th1qKmBWyK9T4AwC%2FoM4rYN5eyjLEiyqsk404oGVxPH6PzXRhoD%2BpTZZpFtBM5m0vpVGY%2FeloXNgGpiiXMZ1%2BqgJ0ZI0RteyS9c%2FcEBgkVhuQI5EaA1%2Bg%2BH518ZotG%2BHtzUt9E2ZLmziLlbRxlEdl1dYKIB0fyn%2Blu20YZCNJayCfzgc5XgOKYLTXZPhLkGBgvb0HL9zJ9c7xs62XPFXVIjeaV3U%3D', 'http://trends.revcontent.com/click.php?d=zT7csNf77I2ykPB6J5zpnBow7C0%2FgdQO%2FFJZHtHFxUROTLZAwaNhs5xukII%2FsS%2BkQTiX6YBklaN8GYePzEF%2Bytyh1TAp%2FJaz5S%2BBRq92tyA2GXS4iKVxOlPfGeQ6xw1w8IA07RXrr8AxL9ePw2WfP2DEiAzCdFmZU%2BdyZlkx9UoF44Tr0%2B25zrnwDBugZY%2FE2I8SjVi1E%2BxDOKV%2BiFMNjyj7GSkb7xB7ZB9t9JM46MOraK1AL4IKnVZJh8GATF6jot9x4%2FDvnjbGN84j3BfK5ymQC3q0%2BoQVF0Tcr2F7w%2FCZNGbhNERPVbWz3W0XG9cM0C1X6QB0Nx2YmTKOlg0v6OuILJxlB1YPUP59%2FoPcsHjcS0SHMunMXqE%2B1ZTOQItbOkBru%2Fe2oLm1Wknzc0%2BcELsFgKcd0PZ26zrzK59vTfDy1mku3sbmZCv7nHIwga3lX3igUapt%2Bfp65NCFhyODBCRLjxVtMRcc25N5lt9yYZj10Cv5SPY3CcowmR46ExaVFbWuWwVVGD%2FrG1TUXETKiEJXWGmY%2FtXVF8r%2FQrLz3RnpVbJEK7BYsgH7jDlosuoUX6bzBVyNGDwd8vLPlpRdI52qMW9ZfPVNEQkOYc97b4gE4zmG4PNnaVgKnE4uUBFnfxrFRpUgRl7JGnrKFMpHGJCoHgSr5ByfvRw6udPyDDP00RNHzqmwYnWS1WQaISUfUE5cZ39j4RdqFG4ulrp42y%2BQfMJAtRJw%2B7YHRFU%2Fr3PPMusmJnuOKRh1qxNjn%2F2I0ePh00sekak73PJH8QHLzLP%2FXvalFM3XKsNW%2F6N8tk0%3D', 'http://trends.revcontent.com/click.php?d=zEhzqSroRD193mvTp8qDeQ78%2BC5d%2Bt95UYzMige2O7ICjvDgETAoQDoJynnLbc66YwSU9DtjcNOK4uDi3kog3qt09Q%2BDTQ%2BaLqIIOTYub2cjlwOSvk6DoPdxZyoqvqvgOI7Ebt4tM5WDK%2B7ASYWHmW1d19eyB68MjkCm%2BFfRJrL9TVpOsuUNdB68%2F62vNHvTPvzrS9gSTfRytI55EDCXFgqLiYWUl2PQGB50sWyNotDCJmMe%2BXF2zVtqU6z%2Fkmli%2BThB7Yoe1RRxwLwDevhg0Z47vQWgHeDO%2FXYAjnrRikauG9apugaS%2Bg4SiEn9DSXwQMbz6krHAPuFcor2E%2BkENd2VXCrR1%2FaGWHJCsLHXfK5PYrIV09Ay2cURGIbahPjmDLNN%2B7vwssPjOEhibj9g8pOpDCQe1Cg1aTcmEafs9oBH%2FzGIRk31eJA%2FTYVWlTxFL336%2FyMK6sbTeug1Ek04Vbr05XNmQhRhd3L884xd7OiJTFsXhXIjPt7RgH0s5YQad%2Fbn%2F6kztu1A09wZUXAaI2k1uZ0nceFi844mTe68e528EMUSUsvnx7yacN0U4XA58%2BjG3EsGm507%2Fonjda1jzG%2BCF3tfQ%2BmtQq%2BNhVXwEQ9XGwZElfzAKiie2l7lBbxonGmb8w4WITs%2FRmuxqwCXxpg0A43XGRDi1KfhJqEvMkqOWIQGUqmOqbqL5bMCT%2FgoYnKxH96FqlNkAbMODgpA7TgQCNxkeLeqj7pYui5NgNP9Yyc%2FVYnB8YOgG1BtLtdqcAeS9u2oAzmHGjpwg%2BCW%2FxNMV8C8ZDrZdbZe3BIlZiE%3D', 'http://trends.revcontent.com/click.php?d=yhgcZoUtLD2nk6vdesk71n%2F1XgC0cbYmAOE9WH9Ui1FKFhvAy5wMaM2lOv8bjt8vlg8tBas7BGZfuoW%2BYYJ%2Bd8ome7RyySnzohklQ2ZxS%2BHLu0J%2B6tB95wRKyu%2BuV9%2F1I7U6vi8VbaFy8KKWGmQ4SmNXtRwAznmv%2Bh1B4Xgy6oKTWnpOt1LhQ85ukw7ckzfTKQXuxC%2FaUUDfhc6MRIcNmOQfYTURKv%2FYDV46B12qxma28MY1O2CdopxidA6llOM%2FyWDLHiftvjlTkRhJu1lATqdhSDU2NWK%2Fs7Wz933mgAreM5DWRyLUFyJm6Hg2ZS1s2BQCsLcDisx5ffKXdCbejydeIk6YkUut3kj%2BUfLL8GJ6PwBjNb3JTYVRZ%2BBzdgpdwZXWRMOSM5JK%2FcDryya5GpJHYIbU6HUXs6tbyEP6gMxfXveJvvHQ553PQg5CI8b07BBnb6ebs6DfWTkjCJXztaB3rfFuEyR2stgcHDKayoMuEOcxHEP1WNXOycgtTAuKqsFooJj%2BItYELkUGPAXzpBTwnSCmIV%2B06x02OsmzyepKLT3mvB4vm6w6RYb%2BTYlCY6CwCisCW3f0U%2BbnbQ%2BkfpwIpV6IudYfo%2F5zbCWj1eLqJXnnGdD67Q%2FL4CLjpcHiL1OudHGgOtGlToq82ovH8zBIZAJ8qFHdXUiYRnzCBDqL7qRXSDWMEIW%2FJScgqOqJ%2FLS18EYFMZJwYVCFoI03pw%2BhLS73cg5jFJQ%2BAlYGpo7AMDFP8dvkxJ0sXp93bDyaTrIphOLmeBr1EmpwE8nTN94JST2mghn0Ze1MCYXF%2Bzw%3D', 'http://trends.revcontent.com/click.php?d=7J0uNuYGBo92nwrA0hFHhWhYZEIHeRFos5DOeIZZNLepawY1CcdFLyUeD2LZmX5kx3lQ3iYyT4sqCjAReCNPdYTOCO1omwDWoSSltVzfX5S0%2FSXj%2F5SmuU%2Fv1H3aIy8Sbt%2BW9jVyQToUOfnxOKdQ92%2FreSiycG2PLCZXZOhCt%2FCMLXqkoOl8x3z3D4BNifXuL5UAsTMHKHbvgt1S3Bu66h55oQLGVCAv9nYoSBms%2FQZdo5HlMVO7VQl9IDfcnUsPz0E%2FG7zN7xs5Y6xb9k%2FAj4qOrc2YJj1w907pjSBfMhct0hX471jpEAyIfF7MMVLj%2FVSHmNBWcrqHFaE6TCCVkKFowkglACXJCpEkzVwaOf%2FsmyVvwKa0FY9D7qbjwWI4k6IwV8OSxDBmXSdVemzjHuFH5waovG%2ByHeiYPR656BNDCTW80CiQQXag1Rp36hb8hG%2FDwP34T46st4wQw73%2F6%2B40Of7OWg7W%2FrTWbt0pyuL5FRbWMYhbJOyCCPeJ6kjpoBiEbQLnUx7%2BObAXmVX1IwKvrqYzQnzulqXPycLjRo%2FyfzqmzvspFSnbNAIfy8br6PGPbBUVZDdLCAnstgPwY0SW85cIbaYL8cTApQeLKz9yNc6TdmrHpHrXJIEcuyaPQZm%2BxyGPuUcxFi4HQyxelv6C3WpYmhsOyqsO%2F9MxuBiU9lvOww5CkapbNKoGm2GEKE82JhuUHujCmgwlfSzw8%2BDn9csKkm6k7zyE7vGL3qAQcl2ZhJs%2FM1fw%2B%2BscKICf2Kwqz8lwmzkfAFQr2SHBPFJheZan9a4HvT4P%2BC%2FIBZQ%3D']}



In [83]:

    
def to_json_file(json_data, prefix):
    filename = "../data/out/{}_grouped_data.json".format(prefix)
    with open(filename, 'w') as outfile:
        json.dump(json_data, outfile, indent=4)



In [84]:

    
to_json_file(img_json_data, "images")



In [85]:

    
to_json_file(hl_json_data, "headlines")

	_id	headline	link	img	provider	source	img_file	date	final_link	orig_article
0	ObjectId(58d90ce706e10d04f7e1b3d8)	20 Cool Moments From Joe Biden’s Time In Office	http://scribol.com/a/news-and-politics/ways-jo...	https://console.brax-cdn.com/creatives/98c6400...	taboola	http://tmz.com	./imgs/876aa5e83f6fb81a81908db3c02fdcc00d44400...	2017-03-27T12:59:09.279Z	http://scribol.com/a/news-and-politics/ways-jo...	NaN
1	ObjectId(58d90ce706e10d04f7e1b3d9)	Troubled News Anchor Does The Unthinkable On Air	http://www.trend-chaser.com/entertainment/the-...	https://console.brax-cdn.com/creatives/b86bbc0...	taboola	http://tmz.com	./imgs/bab1037467f1385cd865c48029db808b03a151d...	2017-03-27T12:59:09.819Z	http://www.trend-chaser.com/entertainment/the-...	NaN
2	ObjectId(58d90ce706e10d04f7e1b3da)	It's Almost Hard To Fathom What He look's Like...	http://www.journalistate.com/popular/big-holly...	http://cdn.taboolasyndication.com/libtrc/stati...	taboola	http://tmz.com	./imgs/feeb5be5a9758fcca8cef21b6fb842ccc839476...	2017-03-27T12:59:10.750Z	http://www.journalistate.com/popular/big-holly...	NaN
3	ObjectId(58d90ce706e10d04f7e1b3db)	Troubled News Anchor Does The Unthinkable On Air	http://www.trend-chaser.com/entertainment/the-...	https://console.brax-cdn.com/creatives/b86bbc0...	taboola	http://tmz.com	./imgs/bab1037467f1385cd865c48029db808b03a151d...	2017-03-27T12:59:11.430Z	http://www.trend-chaser.com/entertainment/the-...	NaN
4	ObjectId(58d90ce706e10d04f7e1b3dc)	Try NOT Gasp When You See Who Queen Latifah Is...	http://zcretuzft.iflmylife.com/entertainment/o...	http://cdn.taboolasyndication.com/libtrc/stati...	taboola	http://tmz.com	./imgs/d75401b962746864063b51f164633ffeb93931d...	2017-03-27T12:59:11.510Z	http://www.iflmylife.com/entertainment/other-h...	NaN

	_id	headline	link	img	provider	source	img_file	date	final_link	orig_article
count	129399	129399	129399	129399	129399	129399	129399	129399	129399	69623
unique	129399	18022	43315	23843	4	24	23866	129396	36713	6670
top	ObjectId(593394dc9e1e2a636c179290)	Here’s Why Guys Are Obsessed With This Underwear…	https://grizly.com/lifestyle/guy-turned-backya...	http://cdn.taboolasyndication.com/libtrc/stati...	taboola	http://tmz.com	db07ff3401037653d665822c5a78617464fe4ef8.jpg	2017-05-30T04:49:40.273Z	https://grizly.com/lifestyle/guy-turned-backya...	http://www.tmz.com/2017/06/02/kathy-griffin-co...
freq	1	996	588	621	59474	24167	621	2	588	167

	_id	headline	link	img	provider	source	img_file	date	final_link	orig_article	img_host	link_host	source_class
count	43630	43630	43630	43630	43630	43630	43630	43630	43630	25177	43630	43630	43630
unique	43630	15219	35541	19311	4	24	19314	43629	30873	5195	568	2196	4
top	ObjectId(59533a5706e10d0343aee04f)	Nicole Kidman's Yacht Is Far From You'd Expect	http://topictracker.online/?utm_campaign=us-tb...	http://cdn.taboolasyndication.com/libtrc/stati...	taboola	http://tmz.com	f18167ca58fee4ae691a28ecd39b0c1afe2689e4.jpg	2017-05-30T04:49:40.273Z	http://www.zergnet.com/news/694817/kim-kardash...	http://elitedaily.com/women/elite-daily-wants-...	images.outbrain.com	www.zergnet.com	tabloid
freq	1	376	110	368	13431	5070	368	2	126	51	12259	7257	16005