Performing Clean-up and Analysis on Native Ad Data Scraped "From Around the Web"


In [1]:
import pandas as pd
from datetime import datetime
import dateutil
import matplotlib.pyplot as plt
from IPython.core.display import display, HTML
import re
from urllib.parse import urlparse
import json

Data Load and Cleaning


In [2]:
data = pd.read_csv('../data/in/native_ad_data.csv')

In [3]:
data.head()


Out[3]:
_id headline link img provider source img_file date final_link orig_article
0 ObjectId(58d90ce706e10d04f7e1b3d8) 20 Cool Moments From Joe Biden’s Time In Office http://scribol.com/a/news-and-politics/ways-jo... https://console.brax-cdn.com/creatives/98c6400... taboola http://tmz.com ./imgs/876aa5e83f6fb81a81908db3c02fdcc00d44400... 2017-03-27T12:59:09.279Z http://scribol.com/a/news-and-politics/ways-jo... NaN
1 ObjectId(58d90ce706e10d04f7e1b3d9) Troubled News Anchor Does The Unthinkable On Air http://www.trend-chaser.com/entertainment/the-... https://console.brax-cdn.com/creatives/b86bbc0... taboola http://tmz.com ./imgs/bab1037467f1385cd865c48029db808b03a151d... 2017-03-27T12:59:09.819Z http://www.trend-chaser.com/entertainment/the-... NaN
2 ObjectId(58d90ce706e10d04f7e1b3da) It's Almost Hard To Fathom What He look's Like... http://www.journalistate.com/popular/big-holly... http://cdn.taboolasyndication.com/libtrc/stati... taboola http://tmz.com ./imgs/feeb5be5a9758fcca8cef21b6fb842ccc839476... 2017-03-27T12:59:10.750Z http://www.journalistate.com/popular/big-holly... NaN
3 ObjectId(58d90ce706e10d04f7e1b3db) Troubled News Anchor Does The Unthinkable On Air http://www.trend-chaser.com/entertainment/the-... https://console.brax-cdn.com/creatives/b86bbc0... taboola http://tmz.com ./imgs/bab1037467f1385cd865c48029db808b03a151d... 2017-03-27T12:59:11.430Z http://www.trend-chaser.com/entertainment/the-... NaN
4 ObjectId(58d90ce706e10d04f7e1b3dc) Try NOT Gasp When You See Who Queen Latifah Is... http://zcretuzft.iflmylife.com/entertainment/o... http://cdn.taboolasyndication.com/libtrc/stati... taboola http://tmz.com ./imgs/d75401b962746864063b51f164633ffeb93931d... 2017-03-27T12:59:11.510Z http://www.iflmylife.com/entertainment/other-h... NaN

As a side note, the headlines from zergnet all have some newlines we need to get rid of and they appear to have concatenated the headline with the provider. So let's clean those up.


In [4]:
data['headline'] = data['headline'].apply(lambda x: re.sub('(?<=[a-z])\.?([A-Z](.*))' , '', x.strip()))
data.head()


Out[4]:
_id headline link img provider source img_file date final_link orig_article
0 ObjectId(58d90ce706e10d04f7e1b3d8) 20 Cool Moments From Joe Biden’s Time In Office http://scribol.com/a/news-and-politics/ways-jo... https://console.brax-cdn.com/creatives/98c6400... taboola http://tmz.com ./imgs/876aa5e83f6fb81a81908db3c02fdcc00d44400... 2017-03-27T12:59:09.279Z http://scribol.com/a/news-and-politics/ways-jo... NaN
1 ObjectId(58d90ce706e10d04f7e1b3d9) Troubled News Anchor Does The Unthinkable On Air http://www.trend-chaser.com/entertainment/the-... https://console.brax-cdn.com/creatives/b86bbc0... taboola http://tmz.com ./imgs/bab1037467f1385cd865c48029db808b03a151d... 2017-03-27T12:59:09.819Z http://www.trend-chaser.com/entertainment/the-... NaN
2 ObjectId(58d90ce706e10d04f7e1b3da) It's Almost Hard To Fathom What He look's Like... http://www.journalistate.com/popular/big-holly... http://cdn.taboolasyndication.com/libtrc/stati... taboola http://tmz.com ./imgs/feeb5be5a9758fcca8cef21b6fb842ccc839476... 2017-03-27T12:59:10.750Z http://www.journalistate.com/popular/big-holly... NaN
3 ObjectId(58d90ce706e10d04f7e1b3db) Troubled News Anchor Does The Unthinkable On Air http://www.trend-chaser.com/entertainment/the-... https://console.brax-cdn.com/creatives/b86bbc0... taboola http://tmz.com ./imgs/bab1037467f1385cd865c48029db808b03a151d... 2017-03-27T12:59:11.430Z http://www.trend-chaser.com/entertainment/the-... NaN
4 ObjectId(58d90ce706e10d04f7e1b3dc) Try NOT Gasp When You See Who Queen Latifah Is... http://zcretuzft.iflmylife.com/entertainment/o... http://cdn.taboolasyndication.com/libtrc/stati... taboola http://tmz.com ./imgs/d75401b962746864063b51f164633ffeb93931d... 2017-03-27T12:59:11.510Z http://www.iflmylife.com/entertainment/other-h... NaN

OK, that's better.

The img_file column values also have ./imgs/ appended to the front of each file name. Let's get rid of those:


In [5]:
data['img_file'] = data['img_file'].apply(lambda x: re.sub('\.\/imgs\/' , '', str(x).strip()))

Now, let's check, do we have any null values?


In [6]:
for col in data.columns:
    print((col, sum(data[col].isnull())))


('_id', 0)
('headline', 0)
('link', 0)
('img', 0)
('provider', 0)
('source', 0)
('img_file', 0)
('date', 0)
('final_link', 0)
('orig_article', 59776)

For now only the orig_article column has nulls, as we had not collected those consistently


In [7]:
data.describe()


Out[7]:
_id headline link img provider source img_file date final_link orig_article
count 129399 129399 129399 129399 129399 129399 129399 129399 129399 69623
unique 129399 18022 43315 23843 4 24 23866 129396 36713 6670
top ObjectId(593394dc9e1e2a636c179290) Here’s Why Guys Are Obsessed With This Underwear… https://grizly.com/lifestyle/guy-turned-backya... http://cdn.taboolasyndication.com/libtrc/stati... taboola http://tmz.com db07ff3401037653d665822c5a78617464fe4ef8.jpg 2017-05-30T04:49:40.273Z https://grizly.com/lifestyle/guy-turned-backya... http://www.tmz.com/2017/06/02/kathy-griffin-co...
freq 1 996 588 621 59474 24167 621 2 588 167

Already we can see some interesting trends here. Out of 129399 unique records, only 18022 of the headlines are unique, but 43315 of the links are unique and 23866 of the image files are unique (assuming for sure that there were issues with downloading images).

So it seems already that there are content links which might reuse the same headline, or image for different destination articles.

Also, because we want to inspect the hosts from which the articles and images are coming from, let's parse those out in the data.

Data Preparation


In [8]:
data['img_host'] = data['img'].apply(lambda x: urlparse(x).netloc)

In [9]:
data['link_host'] = data['final_link'].apply(lambda x: urlparse(x).netloc)

Next, let's classify each site by a very relaxed set of tags based on perceived political bias. I might be a little off on some, I referenced https://www.allsides.com/ where possible, but that was not entirely helpful in all cases. Otherwise, I just went with my own idea of where I felt a site fell on the political spectrum (e.g., left, right, or center). There is also a tag for tabloids, or primarily sites that probably don't really have an editorial perspective so much as a desire to publish whatever gets the most traffic.


In [10]:
left = ['http://www.politico.com/magazine/', 'https://www.washingtonpost.com/', 'http://www.huffingtonpost.com/', 'http://gothamist.com/news', 'http://www.metro.us/news', 'http://www.politico.com/politics', 'http://www.nydailynews.com/news', 'http://www.thedailybeast.com/']
right = ['http://www.breitbart.com', 'http://www.rt.com', 'https://nypost.com/news/', 'http://www.infowars.com/', 'https://www.therebel.media/news', 'http://observer.com/latest/']
center = ['http://www.ibtimes.com/', 'http://www.businessinsider.com/', 'http://thehill.com']
tabloid = ['http://tmz.com', 'http://www.dailymail.co.uk/', 'https://downtrend.com/', 'http://reductress.com/', 'http://preventionpulse.com/', 'http://elitedaily.com/', 'http://worldstarhiphop.com/videos/']

In [11]:
def get_classification(source):
    if source in left:
        return 'left'
    if source in right:
        return 'right'
    if source in center:
        return 'center'
    if source in tabloid:
        return 'tabloid'

In [12]:
data['source_class'] = data['source'].apply(lambda x: get_classification(x))

In [13]:
data.head()


Out[13]:
_id headline link img provider source img_file date final_link orig_article img_host link_host source_class
0 ObjectId(58d90ce706e10d04f7e1b3d8) 20 Cool Moments From Joe Biden’s Time In Office http://scribol.com/a/news-and-politics/ways-jo... https://console.brax-cdn.com/creatives/98c6400... taboola http://tmz.com 876aa5e83f6fb81a81908db3c02fdcc00d444000.png 2017-03-27T12:59:09.279Z http://scribol.com/a/news-and-politics/ways-jo... NaN console.brax-cdn.com scribol.com tabloid
1 ObjectId(58d90ce706e10d04f7e1b3d9) Troubled News Anchor Does The Unthinkable On Air http://www.trend-chaser.com/entertainment/the-... https://console.brax-cdn.com/creatives/b86bbc0... taboola http://tmz.com bab1037467f1385cd865c48029db808b03a151d2.png 2017-03-27T12:59:09.819Z http://www.trend-chaser.com/entertainment/the-... NaN console.brax-cdn.com www.trend-chaser.com tabloid
2 ObjectId(58d90ce706e10d04f7e1b3da) It's Almost Hard To Fathom What He look's Like... http://www.journalistate.com/popular/big-holly... http://cdn.taboolasyndication.com/libtrc/stati... taboola http://tmz.com feeb5be5a9758fcca8cef21b6fb842ccc8394766.jpg 2017-03-27T12:59:10.750Z http://www.journalistate.com/popular/big-holly... NaN cdn.taboolasyndication.com www.journalistate.com tabloid
3 ObjectId(58d90ce706e10d04f7e1b3db) Troubled News Anchor Does The Unthinkable On Air http://www.trend-chaser.com/entertainment/the-... https://console.brax-cdn.com/creatives/b86bbc0... taboola http://tmz.com bab1037467f1385cd865c48029db808b03a151d2.png 2017-03-27T12:59:11.430Z http://www.trend-chaser.com/entertainment/the-... NaN console.brax-cdn.com www.trend-chaser.com tabloid
4 ObjectId(58d90ce706e10d04f7e1b3dc) Try NOT Gasp When You See Who Queen Latifah Is... http://zcretuzft.iflmylife.com/entertainment/o... http://cdn.taboolasyndication.com/libtrc/stati... taboola http://tmz.com d75401b962746864063b51f164633ffeb93931d3.jpg 2017-03-27T12:59:11.510Z http://www.iflmylife.com/entertainment/other-h... NaN cdn.taboolasyndication.com www.iflmylife.com tabloid

Now let's remove duplicates based on a subset of the columns using pandas' drop_duplicates for DataFrames


In [14]:
deduped = data.drop_duplicates(subset=['headline', 'link', 'img', 'provider', 'source', 'img_file', 'final_link'], keep=False)

In [15]:
deduped.describe()


Out[15]:
_id headline link img provider source img_file date final_link orig_article img_host link_host source_class
count 43630 43630 43630 43630 43630 43630 43630 43630 43630 25177 43630 43630 43630
unique 43630 15219 35541 19311 4 24 19314 43629 30873 5195 568 2196 4
top ObjectId(59533a5706e10d0343aee04f) Nicole Kidman's Yacht Is Far From You'd Expect http://topictracker.online/?utm_campaign=us-tb... http://cdn.taboolasyndication.com/libtrc/stati... taboola http://tmz.com f18167ca58fee4ae691a28ecd39b0c1afe2689e4.jpg 2017-05-30T04:49:40.273Z http://www.zergnet.com/news/694817/kim-kardash... http://elitedaily.com/women/elite-daily-wants-... images.outbrain.com www.zergnet.com tabloid
freq 1 376 110 368 13431 5070 368 2 126 51 12259 7257 16005

And let's just check on those null values again...


In [16]:
for col in deduped.columns:
    print((col, sum(deduped[col].isnull())))


('_id', 0)
('headline', 0)
('link', 0)
('img', 0)
('provider', 0)
('source', 0)
('img_file', 0)
('date', 0)
('final_link', 0)
('orig_article', 18453)
('img_host', 0)
('link_host', 0)
('source_class', 0)

Out of curiousity, as we're only left with 43630 records after deduping, let's take a look at the rate of success for our record collection.


In [17]:
(43630/129399)*100


Out[17]:
33.71741667246269

Crud, doing a harvest yields results where only 33% of our sample is worth examining further.

Data Exploration

Let's get the top 10 headlines grouped by img


In [18]:
deduped['headline'].groupby(deduped['img']).value_counts().nlargest(10)


Out[18]:
img                                                                                                         headline                                                                                                 
http://cdn.taboolasyndication.com/libtrc/static/thumbnails/21a99ebd78f2af61aeeec2074e0376c0.jpg             Nicole Kidman's Yacht Is Far From You'd Expect                                                               368
https://revcontent-p0.s3.amazonaws.com/content/images/1495720487.jpg                                        Triple Your Accuracy With This Weird Shooting Technique Used By Seal Snipers                                 238
http://cdn.taboolasyndication.com/libtrc/static/thumbnails/0dba2430aca9e98e05160cfd6e6d3171.jpg             Here Is How You Upgrade To Business Class                                                                    227
http://cdn.taboolasyndication.com/libtrc/static/thumbnails/2e967b6db0813815a899401b4746a749.jpg             Stairlifts are disrupting the multi-billion dollar retirement home industry - keeping seniors independent    197
http://cdn.taboolasyndication.com/libtrc/static/thumbnails/6b232005189e48716587f79b33347846.jpg             Tiger Woods' Yacht Is Far From You'd Expect                                                                  171
https://revcontent-p0.s3.amazonaws.com/content/images/cf94a60e6dd053bb9a83231322545e99.jpg                  28 Pictures That Show How Crazy Woodstock 1969 Was                                                           139
https://revcontent-p0.s3.amazonaws.com/p0/assets/content_images/emb/7152612145c7f9231d1e2229a5c7fce4-0.png  We Can Guess Your Education Level with Only 10 Questions                                                     132
http://img2.zergnet.com/694817_300.jpg                                                                      Kim Kardashian and North West Turn Heads On The Red Carpet                                                   125
http://cdn.taboolasyndication.com/libtrc/static/thumbnails/e70c96286da170d65cbf3fc4c9a3e400.jpg             Best Senior Living Communities Of 2017! View Pricing Here & Compare                                          121
https://revcontent-p0.s3.amazonaws.com/content/images/1497897658.jpg                                        Trump Voters Shocked After Watching This Leaked Video                                                        120
Name: headline, dtype: int64

But hang on. let's just see what the top headlines are. There's certainly overlap, but it's not a one to one relationship between headlines and their images (or at least maybe it's the same image, but coming from a different URL).


In [19]:
deduped['headline'].value_counts().nlargest(10)


Out[19]:
Nicole Kidman's Yacht Is Far From You'd Expect                                                               376
Triple Your Accuracy With This Weird Shooting Technique Used By Seal Snipers                                 260
Forget Social Security if you Own a Home (Do This)                                                           231
Here Is How You Upgrade To Business Class                                                                    227
Stairlifts are disrupting the multi-billion dollar retirement home industry - keeping seniors independent    200
Tiger Woods' Yacht Is Far From You'd Expect                                                                  181
Watch Obama's Face at 0:33. This Leaked Video Will Destroy Obama's Legacy                                    169
New Jersey Landlines Get Replaced (But Not With Cell Phones)                                                 161
Best Senior Living Communities Of 2017! View Pricing Here & Compare                                          144
28 Pictures That Show How Crazy Woodstock 1969 Was                                                           139
Name: headline, dtype: int64

Note: perhaps something we will want to look into is how many different headline, image permutations there are. I am particularly interested in the reuse of images across different headlines.

And how are our sources distributed?


In [20]:
deduped['source'].value_counts().nlargest(25)


Out[20]:
http://tmz.com                        5070
http://elitedaily.com/                4873
http://www.politico.com/magazine/     3151
https://www.washingtonpost.com/       2961
http://www.infowars.com/              2561
http://www.thedailybeast.com/         2455
http://www.breitbart.com              2443
https://downtrend.com/                2421
http://www.ibtimes.com/               2323
http://thehill.com                    2001
http://www.businessinsider.com/       1984
http://www.rt.com                     1819
http://www.politico.com/politics      1708
http://worldstarhiphop.com/videos/    1292
http://www.dailymail.co.uk/           1159
http://reductress.com/                1082
https://nypost.com/news/               979
http://www.nydailynews.com/news        864
http://www.huffingtonpost.com/         814
https://www.therebel.media/news        756
http://observer.com/latest/            696
http://preventionpulse.com/            108
http://gothamist.com/news               74
http://www.metro.us/news                36
Name: source, dtype: int64

TMZ is a bit over-represented here

And what about by classification


In [21]:
deduped['source_class'].value_counts()


Out[21]:
tabloid    16005
left       12063
right       9254
center      6308
Name: source_class, dtype: int64

Looks like the over-representation of TMZ is pushing on Tabloids a bit. Not terribly even between left, right, and center, either.

Let's take a look at the sources again as broken down by bother provider and our classification.


In [22]:
deduped.groupby(['provider', 'source_class'])['source'].value_counts()


Out[22]:
provider    source_class  source                            
outbrain    center        http://thehill.com                    2001
            left          http://www.politico.com/magazine/     3151
                          https://www.washingtonpost.com/       2961
                          http://www.thedailybeast.com/         2455
            right         https://nypost.com/news/               979
                          http://observer.com/latest/            696
revcontent  center        http://www.ibtimes.com/               2323
            left          http://www.metro.us/news                36
            right         http://www.infowars.com/              2561
            tabloid       https://downtrend.com/                2421
                          http://worldstarhiphop.com/videos/    1292
                          http://preventionpulse.com/            108
taboola     center        http://www.businessinsider.com/       1984
            left          http://www.politico.com/politics      1708
                          http://www.nydailynews.com/news        864
                          http://www.huffingtonpost.com/         814
                          http://gothamist.com/news               74
            right         http://www.breitbart.com              2443
                          http://www.rt.com                     1819
                          https://www.therebel.media/news        756
            tabloid       http://www.dailymail.co.uk/           1159
                          http://reductress.com/                1082
                          http://elitedaily.com/                 718
                          http://tmz.com                          10
zergnet     tabloid       http://tmz.com                        5060
                          http://elitedaily.com/                4155
Name: source, dtype: int64

OK so what are the most frequent and least images per classification?


In [23]:
IMG_MAX=5

In [24]:
topimgs_center = deduped['img'][deduped['source_class'].isin(['center'])].value_counts().nlargest(IMG_MAX).index.tolist()

In [25]:
bottomimgs_center = deduped['img'][deduped['source_class'].isin(['center'])].value_counts().nsmallest(IMG_MAX).index.tolist()

In [26]:
topimgs_left = deduped['img'][deduped['source_class'].isin(['left'])].value_counts().nlargest(IMG_MAX).index.tolist()

In [27]:
bottomimgs_left = deduped['img'][deduped['source_class'].isin(['left'])].value_counts().nsmallest(IMG_MAX).index.tolist()

In [28]:
topimgs_right = deduped['img'][deduped['source_class'].isin(['right'])].value_counts().nlargest(IMG_MAX).index.tolist()

In [29]:
bottomimgs_right = deduped['img'][deduped['source_class'].isin(['right'])].value_counts().nsmallest(IMG_MAX).index.tolist()

In [30]:
topimgs_tabloid = deduped['img'][deduped['source_class'].isin(['tabloid'])].value_counts().nlargest(IMG_MAX).index.tolist()

In [31]:
bottomimgs_tabloid = deduped['img'][deduped['source_class'].isin(['tabloid'])].value_counts().nsmallest(IMG_MAX).index.tolist()

In [32]:
for i in topimgs_center:
    displaystring = '<img src={} width="200"/>'.format(i)
    display(HTML(displaystring))



In [33]:
for i in bottomimgs_center:
    displaystring = '<img src={} width="200"/>'.format(i)
    display(HTML(displaystring))



In [34]:
for i in topimgs_left:
    displaystring = '<img src={} width="200"/>'.format(i)
    display(HTML(displaystring))



In [35]:
for i in bottomimgs_left:
    displaystring = '<img src={} width="200"/>'.format(i)
    display(HTML(displaystring))



In [36]:
for i in topimgs_right:
    displaystring = '<img src={} width="200"/>'.format(i)
    display(HTML(displaystring))



In [37]:
for i in bottomimgs_right:
    displaystring = '<img src={} width="200"/>'.format(i)
    display(HTML(displaystring))



In [38]:
for i in topimgs_tabloid:
    displaystring = '<img src={} width="200"/>'.format(i)
    display(HTML(displaystring))



In [39]:
for i in bottomimgs_tabloid:
    displaystring = '<img src={} width="200"/>'.format(i)
    display(HTML(displaystring))


Yawn! I have to admit this isnt's as interesting as I thought it might be.

Explore over time

Next perhaps let's explore trends over time. First we'll want to make a version of the Data Frame that is indexed by date


In [40]:
deduped_date_idx = deduped.copy(deep=False)

In [41]:
deduped_date_idx['date'] = pd.to_datetime(deduped_date_idx.date)

In [42]:
deduped_date_idx.set_index('date',inplace=True)

See what dates we're working with


In [43]:
"Start: {}  -  End: {}".format(deduped_date_idx.index.min(), deduped_date_idx.index.max())


Out[43]:
'Start: 2017-03-27 12:59:09.279000  -  End: 2017-07-09 14:31:09.853000'

Let's examine the distribution of the classifications over time


In [44]:
deduped_date_idx['2017-03-01':'2017-07-07'].groupby('source_class').resample('M').size().plot(kind='bar')


Out[44]:
<matplotlib.axes._subplots.AxesSubplot at 0x1057edfd0>

In [45]:
plt.show()


I think what we're mostly seeing here is that our scraper was most active during the month of June.

Let's see the same distribution for provider.


In [46]:
deduped_date_idx['2017-03-01':'2017-07-07'].groupby(['provider']).resample('M').size().plot(kind='bar')


Out[46]:
<matplotlib.axes._subplots.AxesSubplot at 0x108de2198>

In [47]:
plt.show()


Same, we're seeing that our results are biased towards June.

What about if we check all results mentioning certain people


In [48]:
(deduped_date_idx[deduped_date_idx['headline'].str.contains('Trump')]['2017-03-01':'2017-07-07']).groupby('source_class').resample('M').size().plot(title="Headlines Containing 'Trump' By Month and Classification", kind='bar', color="pink")


Out[48]:
<matplotlib.axes._subplots.AxesSubplot at 0x107150b70>

In [49]:
plt.show()



In [50]:
(deduped_date_idx[deduped_date_idx['headline'].str.contains('Clinton')]['2017-03-01':'2017-07-07']).groupby('source_class').resample('M').size().plot(title="Headlines Containing 'Clinton' By Month and Classification", kind='bar', color="gray")


Out[50]:
<matplotlib.axes._subplots.AxesSubplot at 0x107434160>

In [51]:
plt.show()



In [52]:
(deduped_date_idx[deduped_date_idx['headline'].str.contains('Hillary')]['2017-03-01':'2017-07-07']).groupby('source_class').resample('M').size().plot(title="Headlines Containing 'Hillary' By Month and Classification" ,kind='bar', color="gray")


Out[52]:
<matplotlib.axes._subplots.AxesSubplot at 0x10806da20>

In [53]:
plt.show()



In [54]:
(deduped_date_idx[deduped_date_idx['headline'].str.contains('Obama')]['2017-03-01':'2017-07-07']).groupby('source_class').resample('M').size().plot(title="Headlines Containing 'Obama' By Month and Classification", kind='bar')


Out[54]:
<matplotlib.axes._subplots.AxesSubplot at 0x109a01128>

In [55]:
plt.show()


Again, seeing more of a trend around our data collection. There is an interesting trend that Trump articles are appearing on way more Tabloid articles than we might expect. Obama is appearing a lot on Right classified site articles, but again this is for June, so might just be an artifact of increased data collection. Finally, we see way more results for "Hillary" than we do "Clinton", and most of those are on Tabloid sites in April.

And let's check out some bucketed headline trends, both largest and smallest overall and for the various classifications.


In [56]:
(deduped_date_idx['2017-03-27':'2017-07-07'])['headline'].value_counts().nlargest(15)


Out[56]:
Nicole Kidman's Yacht Is Far From You'd Expect                                                               348
Triple Your Accuracy With This Weird Shooting Technique Used By Seal Snipers                                 230
Here Is How You Upgrade To Business Class                                                                    227
Forget Social Security if you Own a Home (Do This)                                                           224
Stairlifts are disrupting the multi-billion dollar retirement home industry - keeping seniors independent    194
Tiger Woods' Yacht Is Far From You'd Expect                                                                  179
Watch Obama's Face at 0:33. This Leaked Video Will Destroy Obama's Legacy                                    164
New Jersey Landlines Get Replaced (But Not With Cell Phones)                                                 161
Best Senior Living Communities Of 2017! View Pricing Here & Compare                                          139
We Can Guess Your Education Level with Only 10 Questions                                                     123
Kim Kardashian and North West Turn Heads On The Red Carpet                                                   120
28 Pictures That Show How Crazy Woodstock 1969 Was                                                           119
Trump Voters Shocked After Watching This Leaked Video                                                        115
10 Surprising Things Guys Find Unattractive                                                                  106
Goldman Sachs & World Bank Confirm: Us Dollar Will Be Worthless in 100 Days                                  103
Name: headline, dtype: int64

In [57]:
(deduped_date_idx['2017-03-27':'2017-07-07'])['headline'].value_counts().nsmallest(15)


Out[57]:
23 Surprising Things ‘Brady Bunch’ Producers Hid From Fans                              1
Is Taking A Career Break The New Norm?                                                  1
20 Things 'M*A*S*H' Producers Hid From Fans                                             1
Why David Caruso Got Dumped By Hollywood                                                1
Only 1 In 50 Americans Can Name These Iconic Women. Can You?                            1
Learn the Story Behind this Famous POTUS Picture                                        1
Beyonce's Most Iconic Beauty Moments of All Time                                        1
20 Wealthy Celebs Refuse to Help Their Poor Family                                      1
What 770,000 Tubes of Saliva Reveal…                                                    1
Meet The Worlds Most Powerful Leaders                                                   1
New Jersey Homeowners Use New Incentives to…                                            1
How Home Chef Can Save You $$ on Groceries                                              1
Dems, GOP brace for nail-biter in Georgia                                               1
Pippa Stuns In Romantic Floral Gown at Friend’s Wedding                                 1
Hiker Vanished In The Appalachian Trail, 2 Years Later Police Discover What Happened    1
Name: headline, dtype: int64

In [58]:
deduped['headline'][deduped['source_class'].isin(['center'])].value_counts().nlargest(25)


Out[58]:
27 Stars Who Died And Not a Word Was Said                                           91
21 Celebrities Who Died And Not a Word Was Said                                     90
Men, Eliminate Your ED (Do This Once Daily)                                         90
Remember Hurley? What He Looks Like Today Is Unreal                                 75
She Never Mentions Her Other Daughter, Here's Why                                   74
Here Is How You Upgrade To Business Class                                           70
Celebs Who Died And No One Said A Word                                              68
How to Fix Cracked Feet                                                             67
Forget Social Security if you Own a Home (Do This)                                  63
He Never Mentions His Daughter - Here Is Why                                        52
We Can Guess Your Education Level with Just 10 Questions                            46
#1 Tinnitus "Trick" to Stop the Ringing (Doctors Are Speechless)                    46
20 Final Photos Taken Before Tragedy Struck                                         45
New Jersey Landlines Get Replaced (But Not With Cell Phones)                        42
14 Times Lotto Winner:Do This Every Time You Buy A Lotto Ticket (Win 1/12 Times)    40
31 Stars Who Died And Not a Word Was Said                                           39
Barron Trump's Leaked IQ Shocks the Nation!                                         36
This Simple Skin Fix May Surprise You                                               35
He Was a Huge Star, but when He Passed Away Nobody Said Anything                    35
She Died & No One Said Anything                                                     32
93% Of Lotto Winners Do This 1 Easy Trick Before Buying Lotto Tickets (Try This)    31
17 Actors Who Are Gay - No. 8 Will Shock Women                                      31
3 Signs You May Have A Fatty Liver [Watch]                                          31
Men, Try This Tonight to Fix Your ED!                                               30
How This App Can Teach You Spanish in Just 3 Weeks                                  29
Name: headline, dtype: int64

In [59]:
deduped['headline'][deduped['source_class'].isin(['center'])].value_counts().nsmallest(25)


Out[59]:
Man Fulfils His Dying Fathers Crazy Wish                                   1
Don’t be the last of your friends in debt                                  1
"The sheets were baby-soft right out of the box"                           1
10 States That Would Get "Most Educated" in the Yearbook                   1
FBI employees wear ‘Comey is my homey’…                                    1
Unlimited 1.5% Cash Back Plus No Interest For 15 months Makes…             1
29 Colleges with the Biggest Decrease in Applications                      1
Huckabee Sanders: 'Republicans are going…                                  1
Congress has the ability to build a better air…                            1
Iowa GOP chairman calls Republican senator 'an…                            1
Watch Israel TV News Online                                                1
The Most Luxurious Sheets You Didn't Know You Needed                       1
Dems see surge of new candidates                                           1
10 Big Lebowski Quotes That Will Help You Parent, Man                      1
Brooklinen Sheets Are The Best: Here's Why                                 1
Bill Gates Lives In A House That Goes Beyond Human Imagination - Photos    1
25 Celebrities Who Mastered the Real Estate Game Better Than Anyone        1
10 Best Harley Davidson Motorcycles of All Time                            1
25 Pairs Of Shoes Road-Tested By Fashion Buyers                            1
Kelley Blue Book Names The 5 Best Compact SUVs of the…                     1
Learn how to be food allergy smart                                         1
9 Apple Cider Vinegar Uses Men Love                                        1
Sally Yates: I found out about travel ban by…                              1
7,000,000 Are Going Crazy Over These Furniture Discounts                   1
Graham: Trump 'doesn't collude with his own…                               1
Name: headline, dtype: int64

In [60]:
deduped['headline'][deduped['source_class'].isin(['left'])].value_counts().nlargest(25)


Out[60]:
Stairlifts are disrupting the multi-billion dollar retirement home industry - keeping seniors independent    173
Forget Social Security if you Own a Home (Do This)                                                           168
Here Is How You Upgrade To Business Class                                                                    157
Best Senior Living Communities Of 2017! View Pricing Here & Compare                                          109
Forget Social Security if you Own a Home (Do…                                                                 83
Thinking About Installing Solar Panels? Read This First                                                       65
Eddie Murphy's House Is Far From What You'd Expect                                                            63
The Most Common Cancer Symptoms People Ignore                                                                 55
Forget Social Security if you Own a Home…                                                                     54
Common Cancer Symptoms That Should Never Go Unchecked                                                         50
Veterans Hit the Jackpot in 2017                                                                              38
If You Own A Home You Must Claim Your $4,240…                                                                 38
(4) Major Heart Attack Red Flags                                                                              34
How to 'Fix' Crepey Skin                                                                                      34
Why Doctors In The Know No Longer Prescribe Blood Pressure Meds                                               32
See Inside the Luxurious Senior Apartments in Clifton                                                         31
Stunning New Luxury Sedans Now Available!                                                                     30
If You Own A Home You Must Claim Your $4,240 Before Time Runs Out!                                            28
Break In The D.B. Cooper Case                                                                                 27
Could This Be The #1 Trick to Reverse Hearing Loss (Do This Tonight)                                          25
9 of 10 Senior Homes are Miserable. Here's the Top Ones in Each Category.                                     25
The Surprising Guest That Johnny Carson Couldn't Stand                                                        24
Don't Forget To Do This Every Time You Turn On Your PC...                                                     24
Tiger Woods' Yacht Is Far From You'd Expect                                                                   23
See Inside the Luxurious Senior Apartments in New York                                                        22
Name: headline, dtype: int64

In [61]:
deduped['headline'][deduped['source_class'].isin(['left'])].value_counts().nsmallest(25)


Out[61]:
Hawking Reveals Shocking Prediction That Could Change Humanity – Daily                                     1
Tractor Supply Reveals the Truth About Chickens                                                            1
Nicole Richie And Joel Madden Finally Reveal Their Gorgeous Home                                           1
Get Monday's Best Friends & Family Deals on Ladies' Lingerie                                               1
The Definitive WWII Planes Quiz: Can You Ace It?                                                           1
F 22 Raptor Does Things Scientists Can't Figure…                                                           1
Scrutiny of Jared Kushner's Russia…                                                                        1
We Tested Nutrisystem: Here's What Happened                                                                1
Missing Disney Worker Disappears. Cops Uncover Truth                                                       1
7 Reasons Seniors Should Stop Going Brick and Mortar for Eyeglasses                                        1
Get a Free Pillow With   50% Off Set of Beautyrest Plush From Mattress Firm  The Purchase of a Mattress    1
Quiz: Can You Identify These American Icons?                                                               1
Take the Ultimate Civil War Quiz and Find Out How Much You Know About Your Country's Past                  1
How Much Do You Know About the…                                                                            1
Bruce Willis Still Regrets Giving Up The Role Of His Life                                                  1
THIS is What It’s Like to Sleep on the…                                                                    1
This Doctor's Surprising "Cracked Feet Fix" Is Going Viral                                                 1
Best Humongous Domestic Cats                                                                               1
Nina Hartley was a Superstar in the 80s, But Where She Ended Up…                                           1
Backyard Mower Pull Goes Wrong ( See What Happens Next )                                                   1
Quicken Loans Urges Homeowners To Switch To A 15 Year Fixed                                                1
Can You Identify These 50 Presidents By…                                                                   1
9 Things It's Ok To Hide From Your Significant Other                                                       1
One-Hit Wonders of the 80s: Can You Score Better than Average?                                             1
How 2 Boston Grads Are Disrupting a $19 Billion…                                                           1
Name: headline, dtype: int64

In [62]:
deduped['headline'][deduped['source_class'].isin(['right'])].value_counts().nlargest(25)


Out[62]:
Nicole Kidman's Yacht Is Far From You'd Expect                                      365
Tiger Woods' Yacht Is Far From You'd Expect                                         140
Watch Obama's Face at 0:33. This Leaked Video Will Destroy Obama's Legacy           138
We Can Guess Your Education Level with Only 10 Questions                            132
Triple Your Accuracy With This Weird Shooting Technique Used By Seal Snipers        123
Born Before 1969? You Could Get an Extra $2,194 Monthly with This                   105
Goldman Sachs & World Bank Confirm: Us Dollar Will Be Worthless in 100 Days         103
Search For The Best New Pickup Truck                                                 92
Hemp Company Releases Legal CBD Oil Across All 50 States                             83
This Is The Shopping Site Amazon Doesn't Want You To Know About                      75
He Never Mentions His Son, Here's Why                                                73
Exclusive: Massive US Invasion of Syria Has Already Begun " Alex Jones' Infowars     71
LIVE: Russian Leader Calls For Retaliation Strikes Against US " Alex Jones' Info     69
The One Thing All Cheaters Have in Common                                            66
Disturbing Video Evidence Proves Obama Should Have Never Been President.             60
New Jersey Landlines Get Replaced (But Not With Cell Phones)                         60
Ever Googled Yourself? Do a "Deep Search" Instead!                                   58
Top (5) Medical Alerts Best & Worst Medical Alert Systems.                           53
The US Citizenship Test Question That Stumps All Americans                           50
Malia Obama's New York Apartment Is Disgusting                                       43
Why Metformin Makes You Sick (WATCH)                                                 40
9 of 10 Senior Homes are Miserable. Here's the Top Ones in Each Category.            38
New York Landlines Get Replaced (But Not with Cell Phones)                           35
Diabetes Breakthrough That Was Silenced by Drug Companies (Try It Tonight)           35
Best Senior Living Communities Of 2017! View Pricing Here & Compare                  33
Name: headline, dtype: int64

In [63]:
deduped['headline'][deduped['source_class'].isin(['right'])].value_counts().nsmallest(25)


Out[63]:
Scary Common Signs of Pancreatic Cancer                                            1
Most Fearless Warriors That Existed Throughout History                             1
Is Networking And Partnering The Hardest Part Of Your Job?                         1
12 Must-Do Experiences in Las Vegas                                                1
Learn to Identify the Black Widow, Brown Recluse and Aggressive House Spider       1
10 Dangerous Secrets About Vitamins and Supplements                                1
Steve Mnuchin in 60 seconds                                                        1
Photos Captured During World War II Reveal...                                      1
21 Life Hacks To Create Healthy and Happy Lifestyle                                1
40 Futuristic Warships You Had No Idea Existed                                     1
This Place Is So Forbidden Most Didn't Even Know It Exists                         1
Man Discovers A Huge Secret In His Own Yard                                        1
17 Animals Who Were Totally Caught In The Act                                      1
Few People Can Name All of These Countries. Can You?                               1
Nicknames Quiz: How Many Presidents Do You Know?                                   1
Really? These Are 21 Foods That Are Actually Good For You                          1
Who Doesn't Love a Good Food Truck? Watch as Flavors are Turned into Chips         1
Best Shot Ever? (Watch What Happens)                                               1
The Most Dangerous Species On This Planet Today                                    1
Exciting New Pasta Recipes From Giovanni Rana                                      1
Behind the Ancestry Commercial: 'Livie From All Nations' Uncovers Her…             1
When This Captain Collapsed, The Way Soldiers Treated Her Left Millions Stunned    1
Try Not To Gasp When You Find Out Who His Partner Is                               1
11 Awesome (Kid-Friendly) Foodie Experiences in California                         1
Man Finds Baby Elephant, But When He Finds His Mother                              1
Name: headline, dtype: int64

In [64]:
deduped['headline'][deduped['source_class'].isin(['tabloid'])].value_counts().nlargest(25)


Out[64]:
28 Pictures That Show How Crazy Woodstock 1969 Was                              139
Triple Your Accuracy With This Weird Shooting Technique Used By Seal Snipers    137
Kim Kardashian and North West Turn Heads On The Red Carpet                      126
Trump Voters Shocked After Watching This Leaked Video                           125
10 Surprising Things Guys Find Unattractive                                     114
9 Hair Mistakes That Make You Look Older                                        100
What Tiger Woods' Ex-Wife Looks Like Now Left Us With No Words                  100
10 Features That Attract Men The Most                                            97
After Losing 220lbs Rebel Wilson Is Gorgeous Now!                                96
10 Tricks To Always Look Good In Pictures                                        95
Anthony Bourdain Relieved to No Longer Pretend About Marriage                    87
New Pics Show Malia Obama Locking Arms With Gorgeous Guy                         86
Here's What New Dental Implants Should Cost                                      78
Stars Who Haven't Figured Out They Aren't Famous Anymore                         73
6 Clothing Items Every Short Lady Should Own                                     70
Janet Jackson Shows Off Weight Loss at Divorce Court                             57
Have You Seen These Top Senior Apartments in Clifton                             56
93% of Americans Won't See What's in This 1944 German Photo [video]              55
This is What Tiger Woods' Ex is Up to These Days                                 54
What Men Find Attractive in Different Parts of the World                         52
Amal Clooney's Stunning Pregnancy Style                                          52
Why You Should Never Wash Your Face In The Shower                                51
Famous People Who Destroyed Their Careers in a Matter of Seconds                 50
How Dr. Oz Disappointed Us With His Double Life                                  50
Why Hannah From '13 Reasons Why' Looks So Familiar                               49
Name: headline, dtype: int64

In [65]:
deduped['headline'][deduped['source_class'].isin(['tabloid'])].value_counts().nsmallest(25)


Out[65]:
Kelly Osbourne Gives Update on Her Parents' Relationship                                                                1
Find All The  Accessories You'll Need To Own At Bal Harbour Shops                                                       1
Zendaya Lost it After Rihanna Shouted Out Her Met Gala Look                                                             1
44 Never Before Seen Photos of Famous People                                                                            1
Tom Cruise's $59 Million Mansion Will Take Your Breath Away                                                             1
Remember Her? What Rachel Ray Looks Like Now Will Shock You                                                             1
27 Sports Photos Taken at Just the Right Time                                                                           1
In 1920 Two Feral Girls Were Found Alone In The Jungle And Raised by Wolves. Their Story Will Leave You In Disbelief    1
Top 10 Most Beautiful Lakes In The World                                                                                1
11 Gorgeous Nail Looks to Try This Spring                                                                               1
Meet the Cutting-Edge New Acura RLX - Build & Price Yours Today                                                         1
Lady Gaga Says Rihanna Was Best-Dressed at This Year's Met Gala                                                         1
Health Conscious? Get the Best Personal Blender on the Market Here                                                      1
Why You Recognize Stick From 'Daredevil'                                                                                1
Subway Rider Allegedly Picks Wrong Person to Rub Against                                                                1
16 Silver Foxes We'd Totally Date                                                                                       1
For Years This Daughter Left Babies At Her Parents’ Doorsteps… 24 Years Later, They Finally Find Out Why                1
Inventions We Can't Live Without Today                                                                                  1
These Husband's Reactions To Their Wife's Ultrasounds Are Too Precious                                                  1
What Veruca Salt from 'Willy Wonka' Looks Like Now Is Insane                                                            1
Top Results For Payday Loans                                                                                            1
7 Steps to H elp You Build a Safer Workplace Culture                                                                    1
Everyone Who Saw Her Wondered About Her Ethnicity. A DNA Test Finally Solved the Mystery.                               1
10 Questions To Ask Yourself Before You Dye Your Hair                                                                   1
A Man Rescued This Feral Dog From the LA River. Its Response Was Startling                                              1
Name: headline, dtype: int64

Finally, we wanted to see if any headlines had more than one image. Let's check a few.


In [66]:
def imgs_from_headlines(headline):
    """
    A function to spit out all the different images used for a headline, assuming there's no more than 50/headline
    """
    all_images = deduped['img'][deduped['headline'].isin([headline])].value_counts().nlargest(50).index.tolist()
    for i in all_images:
        displaystring = '<img src={} width="200"/>'.format(i)
        display(HTML(displaystring))

In [67]:
imgs_from_headlines("Trump Voters Shocked After Watching This Leaked Video")



In [68]:
imgs_from_headlines("What Tiger Woods' Ex-Wife Looks Like Now Left Us With No Words")



In [69]:
imgs_from_headlines("Nicole Kidman's Yacht Is Far From You'd Expect")



In [70]:
imgs_from_headlines("He Never Mentions His Son, Here's Why")



In [71]:
imgs_from_headlines("Do This Tonight to Make Fungus Disappear by Morning (Try Today)")


Well, that was edifying.

Export the data


In [72]:
timestamp = datetime.now().strftime('%Y-%m-%d-%H_%M')

In [73]:
datefile = '../data/out/{}_native_ad_data_deduped.csv'.format(timestamp)

In [74]:
deduped.to_csv(datefile, index=False)

Finally, let's generate a json file where each item is an individual image, and for each image we are listing out all the original sources, dates, headlines, classifications, and final locations for it.


In [75]:
img_json_data = {}
for index, row in deduped.iterrows():
    img_json_data[row['img_file']] = {'url':row['img'],
                                 'dates':[],
                                 'sources':[],
                                 'providers':[],
                                 'classifications':[],
                                 'headlines':[],
                                 'locations':[],
                                 }

In [76]:
print(len(img_json_data.keys()))


19314

In [77]:
for index, row in deduped.iterrows():
    record = img_json_data[row['img_file']]
    if row['date'] not in record['dates']:  
        record['dates'].append(row['date'])
    if row['headline'] not in record['headlines']:
        record['headlines'].append(row['headline'])
    if row['provider'] not in record['providers']:
        record['providers'].append(row['provider'])
    if row['source_class'] not in record['classifications']:
        record['classifications'].append(row['source_class'])
    if row['source'] not in record['sources']:
        record['sources'].append(row['source'])
    if row['final_link'] not in record['locations']:    
        record['locations'].append(row['final_link'])

In [78]:
for i in list(img_json_data.keys())[0:5]:
    print(img_json_data[i])


{'url': 'https://console.brax-cdn.com/creatives/98c6400e-f2fc-4c28-8e00-6c45914e36d5/TB15_1b309a68a23702cb95e743cea5d60029.600x500.png', 'dates': ['2017-03-27T12:59:09.279Z'], 'sources': ['http://tmz.com'], 'providers': ['taboola'], 'classifications': ['tabloid'], 'headlines': ['20 Cool Moments From Joe Biden’s Time In Office'], 'locations': ['http://scribol.com/a/news-and-politics/ways-joe-biden-made-vice-presidency-cool-again-americas-uncle/?utm_source=Taboola&utm_medium=CPC&utm_campaign=Joe_Biden_Cool_VP_US_Desktop&utm_content=tmz']}
{'url': 'http://cdn.taboolasyndication.com/libtrc/static/thumbnails/b13e719e4aff1daf7284c9bdb61e65a1.png', 'dates': ['2017-03-27T12:59:13.038Z'], 'sources': ['http://tmz.com'], 'providers': ['taboola'], 'classifications': ['tabloid'], 'headlines': ["25 Pics Donald Trump Doesn't Want You To See"], 'locations': ['http://detonate.com/pictures-that-trump-would-rather-keep-secret/?utm_source=8b4&utm_campaign=8b4_US_desktop_Trump_12_54f7_20160725_mm_3407&utm_term=tmz&utm_medium=cpc']}
{'url': 'https://revcontent-p0.s3.amazonaws.com/content/images/1490017108.jpg', 'dates': ['2017-03-27T12:59:15.114Z', '2017-03-27T12:59:16.920Z', '2017-03-28T05:08:24.588Z', '2017-03-28T05:08:25.939Z'], 'sources': ['http://worldstarhiphop.com/videos/'], 'providers': ['revcontent'], 'classifications': ['tabloid'], 'headlines': ['Do This Tonight to Make Fungus Disappear by Morning (Try Today)'], 'locations': ['http://japanesetoenailfunguscode.com/?aff_id=41345&subid=3c2i08hndnwt', 'http://japanesetoenailfunguscode.com/?aff_id=41345&subid=3clq0j7vog97', 'http://japanesetoenailfunguscode.com/?aff_id=41345&subid=355sv150j7ov', 'http://japanesetoenailfunguscode.com/?aff_id=41345&subid=3ke04pkkc34k']}
{'url': 'https://revcontent-p0.s3.amazonaws.com/content/images/1489682572.jpg', 'dates': ['2017-03-27T12:59:15.237Z', '2017-03-27T12:59:17.051Z'], 'sources': ['http://worldstarhiphop.com/videos/'], 'providers': ['revcontent'], 'classifications': ['tabloid'], 'headlines': ["Here's What New Dental Implants Should Cost You - View Pricing & Dentist Info"], 'locations': ['http://gaindentalfixdeals.com/?affid=1016&s1=10949&s2=1801868%7C239864%7Cworldstarhiphop.com', 'http://getdentaltoothfixdeals.com/?affid=1016&s1=10949&s2=1801868%7C239864%7Cworldstarhiphop.com']}
{'url': 'https://revcontent-p0.s3.amazonaws.com/content/images/1486415171.jpg', 'dates': ['2017-03-27T12:59:15.614Z', '2017-03-27T12:59:17.480Z', '2017-04-29T04:22:47.966Z', '2017-05-15T10:55:38.925Z', '2017-05-15T10:55:39.281Z', '2017-05-15T10:55:39.952Z', '2017-05-15T10:55:40.263Z'], 'sources': ['http://worldstarhiphop.com/videos/'], 'providers': ['revcontent'], 'classifications': ['tabloid'], 'headlines': ["Michael Jordan Has Pretty Much Given Up on His Son, Here's Why"], 'locations': ['http://trends.revcontent.com/click.php?d=vJdwplKu0pUY0G8mZgW7%2BfkWm%2F8rSKsQQkXQbHQYBgW3pRycVsQRgTsyi3%2FtsV6I4lap%2BjX9h1%2BEbLcUlqTQMVSNfHQQkbUicfWHb7dw91dD0inXxnglXt10FAQWZjo1Larx5KRm92nP9arlHHZz%2BdyE9Tn5guObB8L%2FJp0Dt57DtRF%2Bfok8%2BfLSpJtcjLFjO1r35UKJAuSO4bmpYbB109TRS1lWZHUtRsO0N6DTib1O4c7Cn7iEWlC7iWej6AASi16lKmBEyLqQYrzxjEwaTbWZuqglYDO6XYRqG%2FyyuQ%2BPUcik74RbX%2BuOIkungdlYncPD0dXrvhTRETfRTPb8yoZBMt7o2VPDp0qHQXHsUiJlZGHe3MWaSTXEQuYxs2U1nLhyS8NlxIo3TAJi41W2ko8JSm2oMSb48e1EVcCuKL9Ep5cB4IcwcEyc6tJvwJRH8GMfuSYVHLMatxlsKgRlvo9snwlOIEY95fOZrXoO9B84ebMGPeUfFAuDmiK8mklUF%2FsMXmkh7sPSD7uuZyVvRRPTVC%2FfaPjZuIhk9o%2FsETyg9v7kvxHi%2FUZ%2FcN%2F6ujf%2BD5QZ61baXRElk8xvSFTDpqLW4lBTNTaJ19kKloFzPuO6dHqkLOPactBW06HeOQp8%2B5rt4xTIg5%2Bkc9ndm8mTmfpkP1hP3TeEa%2F%2FjxYV1CkXG69pOW5Mbp3b%2F03%2Fhp78P1bp1%2BtlhCORKjyGSvWfl2YTMg4bnUQddHrgw9BX0diQ4ZnuQwB6Lt7oUADBf7mhl', 'http://trends.revcontent.com/click.php?d=Bh%2F%2BvKNq0Yge75HL7eZKGEBZFXmDwUSro%2FYm%2B7dIn8ge57dHmw6JOzhz5BLk%2BlyWfl5DnJgcNB9tgTW4AabntGxJSZXpRSux0HYbxrK6BXX%2FhS6B%2BJ8RTkH9bSrvgb5THbidoOPo7HaY7Oak7vZdnRbRWJnm56YHPwQ%2Fm8dE2gm00rz1qdsEqrWHzVL3zTWuCp9imlBcXnMgCVuC2NbQl2pUrPghg3%2B8S%2BvGZzh71UUl51V8et0Ch6OemcQ1xs4%2FIKymJddu2XWF6yfqRQWXBGqUXUBXCjB9AB4J0DQfcwsMtrJX7lBPFQ1zq4ngC5MplmT5jt8GXDKx%2BP2sfPRnrc5NwLqKp93wXhKM24nR%2FUQE2b0iv7ojNe06yS7bGq4yRQmymVpombe9CMCkr3YIiGAPGIvMRmIJiDGLDCTFluqk39jiyYfJKB2guhWvPAqe7Yy%2Fo9r0fZxPsERR0DN0GtLqaRyIrR6GPavWquPWv4%2F1TJJxcbDpBcwy8A4TWYvatIjMYQUoEO24L6pd9nCEVKB3cw8BmxX0PFD4bHfCrh3CbyZmR41R9jJV6y%2FuOK3cRIig2s6Vtt8WtLVtvrXi1gm5WstuEXeaYT9z8vzk5OWn7JXFr%2BdAXjarDidQh9oi6pCeJtKgeqqmshOPA3fEF64cDkj1DsRksu8pJ5mazI9Gf5w9VykJnA87JVbg8LJFjSrqE9wOtMWyUkENOxrCNeHVU8s78LdMLMPjRZE8%2FuU5KSm85e28fW2%2FXTFVRzDT', 'http://trends.revcontent.com/click.php?d=R%2FMznJavHHk6hqB%2FW8DTG%2Fqoa5q7ipZx8O4qu6LSZUqEstCK4gNm%2BdUpwvX1yhqnrmTgZRSrbRf3ranJwsVstbshJvAOU6SKqJgqUwEJ4Y0RxhDO5Z%2Baxr5J%2B9SdPcipDFAsjIFjhCHkHyZL4QeKtuT4AYFsLkFHX1HKhHnhjCWJD0TZBpolzPVxaJ4mFhc0iYvISyHbeKMUyGM8i3iJc3pxaSJOVqmBFl%2BzJwHGnSF6UoXQy1HeHPsEa0PQ192X5bwb63CC6YAQIIKawp0HL%2FanYK7RlopuQ2RxuExb7W%2BWjYdxlUQfAU2Bf%2FFk9jhk58VCCoaGydZvxPFUuvgcudDAp3ALTYRQFNW8YqJlgTkDKxuXC%2FRyU0vHU%2Bv%2BxBFxLkKb6x%2F2V352WKOXGeIvXBd1NHKHQ%2BleUOLuTOcHVQ%2BWZCx6NLcd4jhh69Oj%2F4TLSMsSVgw0imAXXW1ismZc3Q0LodL3NO5%2BMh26EXbBjkakhxhjKEwBy5Z2mh9uAI6vZrgCf6cdjBUmkedDZ%2FwN%2FIWEsTgcSb0Lf3N1Yu%2FPL4ztOjaPoK85iCN0tmt5CNN%2FHu%2BeajrsrsEhNw7Th1qKmBWyK9T4AwC%2FoM4rYN5eyjLEiyqsk404oGVxPH6PzXRhoD%2BpTZZpFtBM5m0vpVGY%2FeloXNgGpiiXMZ1%2BqgJ0ZI0RteyS9c%2FcEBgkVhuQI5EaA1%2Bg%2BH518ZotG%2BHtzUt9E2ZLmziLlbRxlEdl1dYKIB0fyn%2Blu20YZCNJayCfzgc5XgOKYLTXZPhLkGBgvb0HL9zJ9c7xs62XPFXVIjeaV3U%3D', 'http://trends.revcontent.com/click.php?d=zT7csNf77I2ykPB6J5zpnBow7C0%2FgdQO%2FFJZHtHFxUROTLZAwaNhs5xukII%2FsS%2BkQTiX6YBklaN8GYePzEF%2Bytyh1TAp%2FJaz5S%2BBRq92tyA2GXS4iKVxOlPfGeQ6xw1w8IA07RXrr8AxL9ePw2WfP2DEiAzCdFmZU%2BdyZlkx9UoF44Tr0%2B25zrnwDBugZY%2FE2I8SjVi1E%2BxDOKV%2BiFMNjyj7GSkb7xB7ZB9t9JM46MOraK1AL4IKnVZJh8GATF6jot9x4%2FDvnjbGN84j3BfK5ymQC3q0%2BoQVF0Tcr2F7w%2FCZNGbhNERPVbWz3W0XG9cM0C1X6QB0Nx2YmTKOlg0v6OuILJxlB1YPUP59%2FoPcsHjcS0SHMunMXqE%2B1ZTOQItbOkBru%2Fe2oLm1Wknzc0%2BcELsFgKcd0PZ26zrzK59vTfDy1mku3sbmZCv7nHIwga3lX3igUapt%2Bfp65NCFhyODBCRLjxVtMRcc25N5lt9yYZj10Cv5SPY3CcowmR46ExaVFbWuWwVVGD%2FrG1TUXETKiEJXWGmY%2FtXVF8r%2FQrLz3RnpVbJEK7BYsgH7jDlosuoUX6bzBVyNGDwd8vLPlpRdI52qMW9ZfPVNEQkOYc97b4gE4zmG4PNnaVgKnE4uUBFnfxrFRpUgRl7JGnrKFMpHGJCoHgSr5ByfvRw6udPyDDP00RNHzqmwYnWS1WQaISUfUE5cZ39j4RdqFG4ulrp42y%2BQfMJAtRJw%2B7YHRFU%2Fr3PPMusmJnuOKRh1qxNjn%2F2I0ePh00sekak73PJH8QHLzLP%2FXvalFM3XKsNW%2F6N8tk0%3D', 'http://trends.revcontent.com/click.php?d=zEhzqSroRD193mvTp8qDeQ78%2BC5d%2Bt95UYzMige2O7ICjvDgETAoQDoJynnLbc66YwSU9DtjcNOK4uDi3kog3qt09Q%2BDTQ%2BaLqIIOTYub2cjlwOSvk6DoPdxZyoqvqvgOI7Ebt4tM5WDK%2B7ASYWHmW1d19eyB68MjkCm%2BFfRJrL9TVpOsuUNdB68%2F62vNHvTPvzrS9gSTfRytI55EDCXFgqLiYWUl2PQGB50sWyNotDCJmMe%2BXF2zVtqU6z%2Fkmli%2BThB7Yoe1RRxwLwDevhg0Z47vQWgHeDO%2FXYAjnrRikauG9apugaS%2Bg4SiEn9DSXwQMbz6krHAPuFcor2E%2BkENd2VXCrR1%2FaGWHJCsLHXfK5PYrIV09Ay2cURGIbahPjmDLNN%2B7vwssPjOEhibj9g8pOpDCQe1Cg1aTcmEafs9oBH%2FzGIRk31eJA%2FTYVWlTxFL336%2FyMK6sbTeug1Ek04Vbr05XNmQhRhd3L884xd7OiJTFsXhXIjPt7RgH0s5YQad%2Fbn%2F6kztu1A09wZUXAaI2k1uZ0nceFi844mTe68e528EMUSUsvnx7yacN0U4XA58%2BjG3EsGm507%2Fonjda1jzG%2BCF3tfQ%2BmtQq%2BNhVXwEQ9XGwZElfzAKiie2l7lBbxonGmb8w4WITs%2FRmuxqwCXxpg0A43XGRDi1KfhJqEvMkqOWIQGUqmOqbqL5bMCT%2FgoYnKxH96FqlNkAbMODgpA7TgQCNxkeLeqj7pYui5NgNP9Yyc%2FVYnB8YOgG1BtLtdqcAeS9u2oAzmHGjpwg%2BCW%2FxNMV8C8ZDrZdbZe3BIlZiE%3D', 'http://trends.revcontent.com/click.php?d=yhgcZoUtLD2nk6vdesk71n%2F1XgC0cbYmAOE9WH9Ui1FKFhvAy5wMaM2lOv8bjt8vlg8tBas7BGZfuoW%2BYYJ%2Bd8ome7RyySnzohklQ2ZxS%2BHLu0J%2B6tB95wRKyu%2BuV9%2F1I7U6vi8VbaFy8KKWGmQ4SmNXtRwAznmv%2Bh1B4Xgy6oKTWnpOt1LhQ85ukw7ckzfTKQXuxC%2FaUUDfhc6MRIcNmOQfYTURKv%2FYDV46B12qxma28MY1O2CdopxidA6llOM%2FyWDLHiftvjlTkRhJu1lATqdhSDU2NWK%2Fs7Wz933mgAreM5DWRyLUFyJm6Hg2ZS1s2BQCsLcDisx5ffKXdCbejydeIk6YkUut3kj%2BUfLL8GJ6PwBjNb3JTYVRZ%2BBzdgpdwZXWRMOSM5JK%2FcDryya5GpJHYIbU6HUXs6tbyEP6gMxfXveJvvHQ553PQg5CI8b07BBnb6ebs6DfWTkjCJXztaB3rfFuEyR2stgcHDKayoMuEOcxHEP1WNXOycgtTAuKqsFooJj%2BItYELkUGPAXzpBTwnSCmIV%2B06x02OsmzyepKLT3mvB4vm6w6RYb%2BTYlCY6CwCisCW3f0U%2BbnbQ%2BkfpwIpV6IudYfo%2F5zbCWj1eLqJXnnGdD67Q%2FL4CLjpcHiL1OudHGgOtGlToq82ovH8zBIZAJ8qFHdXUiYRnzCBDqL7qRXSDWMEIW%2FJScgqOqJ%2FLS18EYFMZJwYVCFoI03pw%2BhLS73cg5jFJQ%2BAlYGpo7AMDFP8dvkxJ0sXp93bDyaTrIphOLmeBr1EmpwE8nTN94JST2mghn0Ze1MCYXF%2Bzw%3D', 'http://trends.revcontent.com/click.php?d=7J0uNuYGBo92nwrA0hFHhWhYZEIHeRFos5DOeIZZNLepawY1CcdFLyUeD2LZmX5kx3lQ3iYyT4sqCjAReCNPdYTOCO1omwDWoSSltVzfX5S0%2FSXj%2F5SmuU%2Fv1H3aIy8Sbt%2BW9jVyQToUOfnxOKdQ92%2FreSiycG2PLCZXZOhCt%2FCMLXqkoOl8x3z3D4BNifXuL5UAsTMHKHbvgt1S3Bu66h55oQLGVCAv9nYoSBms%2FQZdo5HlMVO7VQl9IDfcnUsPz0E%2FG7zN7xs5Y6xb9k%2FAj4qOrc2YJj1w907pjSBfMhct0hX471jpEAyIfF7MMVLj%2FVSHmNBWcrqHFaE6TCCVkKFowkglACXJCpEkzVwaOf%2FsmyVvwKa0FY9D7qbjwWI4k6IwV8OSxDBmXSdVemzjHuFH5waovG%2ByHeiYPR656BNDCTW80CiQQXag1Rp36hb8hG%2FDwP34T46st4wQw73%2F6%2B40Of7OWg7W%2FrTWbt0pyuL5FRbWMYhbJOyCCPeJ6kjpoBiEbQLnUx7%2BObAXmVX1IwKvrqYzQnzulqXPycLjRo%2FyfzqmzvspFSnbNAIfy8br6PGPbBUVZDdLCAnstgPwY0SW85cIbaYL8cTApQeLKz9yNc6TdmrHpHrXJIEcuyaPQZm%2BxyGPuUcxFi4HQyxelv6C3WpYmhsOyqsO%2F9MxuBiU9lvOww5CkapbNKoGm2GEKE82JhuUHujCmgwlfSzw8%2BDn9csKkm6k7zyE7vGL3qAQcl2ZhJs%2FM1fw%2B%2BscKICf2Kwqz8lwmzkfAFQr2SHBPFJheZan9a4HvT4P%2BC%2FIBZQ%3D']}

In [79]:
hl_json_data = {}
for index, row in deduped.iterrows():
    hl_json_data[row['headline']] = {'img_urls':[],
                                 'dates':[],
                                 'sources':[],
                                 'providers':[],
                                 'classifications':[],
                                 'imgs':[],
                                 'locations':[],
                                 }

In [80]:
print(len(hl_json_data.keys()))


15219

In [81]:
for index, row in deduped.iterrows():
    record = hl_json_data[row['headline']]
    if row['img'] not in record['img_urls']:
        record['img_urls'].append(row['img'])
    if row['date'] not in record['dates']:  
        record['dates'].append(row['date'])
    if row['img_file'] not in record['imgs']:
        record['imgs'].append(row['img_file'])
    if row['provider'] not in record['providers']:
        record['providers'].append(row['provider'])
    if row['source_class'] not in record['classifications']:
        record['classifications'].append(row['source_class'])
    if row['source'] not in record['sources']:
        record['sources'].append(row['source'])
    if row['final_link'] not in record['locations']:    
        record['locations'].append(row['final_link'])

In [82]:
for i in list(hl_json_data.keys())[0:5]:
    print(i, " = " ,hl_json_data[i])


20 Cool Moments From Joe Biden’s Time In Office  =  {'img_urls': ['https://console.brax-cdn.com/creatives/98c6400e-f2fc-4c28-8e00-6c45914e36d5/TB15_1b309a68a23702cb95e743cea5d60029.600x500.png'], 'dates': ['2017-03-27T12:59:09.279Z'], 'sources': ['http://tmz.com'], 'providers': ['taboola'], 'classifications': ['tabloid'], 'imgs': ['876aa5e83f6fb81a81908db3c02fdcc00d444000.png'], 'locations': ['http://scribol.com/a/news-and-politics/ways-joe-biden-made-vice-presidency-cool-again-americas-uncle/?utm_source=Taboola&utm_medium=CPC&utm_campaign=Joe_Biden_Cool_VP_US_Desktop&utm_content=tmz']}
25 Pics Donald Trump Doesn't Want You To See  =  {'img_urls': ['http://cdn.taboolasyndication.com/libtrc/static/thumbnails/b13e719e4aff1daf7284c9bdb61e65a1.png'], 'dates': ['2017-03-27T12:59:13.038Z'], 'sources': ['http://tmz.com'], 'providers': ['taboola'], 'classifications': ['tabloid'], 'imgs': ['d3a3f2f50c84529c08bb8314ae3aa66280f0cbc7.png'], 'locations': ['http://detonate.com/pictures-that-trump-would-rather-keep-secret/?utm_source=8b4&utm_campaign=8b4_US_desktop_Trump_12_54f7_20160725_mm_3407&utm_term=tmz&utm_medium=cpc']}
Do This Tonight to Make Fungus Disappear by Morning (Try Today)  =  {'img_urls': ['https://revcontent-p0.s3.amazonaws.com/content/images/1490017108.jpg', 'https://revcontent-p0.s3.amazonaws.com/content/images/1491743806.jpg', 'https://revcontent-p0.s3.amazonaws.com/content/images/1491743305.jpg'], 'dates': ['2017-03-27T12:59:15.114Z', '2017-03-27T12:59:16.920Z', '2017-03-28T05:08:24.588Z', '2017-03-28T05:08:25.939Z', '2017-04-11T12:47:25.298Z', '2017-07-04T14:03:38.203Z', '2017-07-04T14:03:38.680Z', '2017-07-04T14:03:39.098Z', '2017-07-04T14:03:39.535Z', '2017-07-04T14:03:40.034Z', '2017-07-05T04:05:58.389Z', '2017-07-05T04:05:58.903Z', '2017-07-05T04:05:59.421Z', '2017-07-05T04:05:59.882Z', '2017-07-05T04:06:00.396Z'], 'sources': ['http://worldstarhiphop.com/videos/', 'http://www.ibtimes.com/'], 'providers': ['revcontent'], 'classifications': ['tabloid', 'center'], 'imgs': ['e2bb63d58e09bae569a90f64de24c93a2d008e34.jpg', '0c628921539854b59c97995851be9ef8d1bdb696.jpg', '356b93a452abc42620956b0b72a29f25f15c33fa.jpg'], 'locations': ['http://japanesetoenailfunguscode.com/?aff_id=41345&subid=3c2i08hndnwt', 'http://japanesetoenailfunguscode.com/?aff_id=41345&subid=3clq0j7vog97', 'http://japanesetoenailfunguscode.com/?aff_id=41345&subid=355sv150j7ov', 'http://japanesetoenailfunguscode.com/?aff_id=41345&subid=3ke04pkkc34k', 'http://topadvice.website/301-ETW.SL-me/?voluumdata=eyJvZmZlciI6Imh0dHA6XC9cL3ZvbHV1bWhpdHMuY29tXC9nby5waHA/cz0yMTU4Nzc4Njc2IiwibmV4dFBhZ2UiOiIifQ==&s=2158778676', 'http://trends.revcontent.com/click.php?d=x7%2FQosVJ78PAIitL8tCajPcUh8Rc2SRaAbUDSPneni0gnlbfCkaGGR7azcwtsJMvQfzEaEFUx%2F47V04v6u4RrI%2FDZg%2F3za%2Be7PueAZNxh%2FzCDBhjyD%2BBVdaduiM23r1E1hal5ZF6DcOPNsTwVyGkweg1F%2BXaSc4rYEHIxIDkRfAfIM1Aj06JiILpLrSPbXepp1hq7iTyUr%2FsxUWRWtrQJiv54gcDT0RhlCZD04vJUcLmgmGyj1ZnMt7bLPsDgI1lvISdknrb5mbgrtzKkuW2kyAt%2BY3fWmZv%2BxhgkXE%2FPe97El%2FD2kpgL9pYspce8prOno9MBSa9vAIqByBeB2oS6lnUVCPONyGlWBmnJj4KCXF6G2f4kiBbFgMqLblQeE96%2BAGn2%2F%2FQRtoFCy662Q9dbMaOe4grCdp7J%2FUAUl76Ebt1WhLlH1kA9ZLPZdqJ9%2BVij40tv3QyP%2BkfoQIQg2f0gWGja0VGfkqarpSi%2FwHO8atdFlBgKGMdeCjqVMez8Xg1HA7LcG0sC2cTWJXhp2gsM%2BINOybrdANJN03fNI%2FLbaVH2A8xiLGDNCvl7nqC8o9P8v6Q5KUTwhkAXeHCtG58Jlu6ODd5fvrqgkD0iEPr3MG9fyRmeImL08ON8YTdqI0riTrMd51pyhpfc3DXW%2BxhyR0I7S7ZTBADb%2BnxNnt%2FsFdCpnuKOuF%2BZdNI2ldDICAatHqKjmIa1NyfEgDpGLTdQN9p%2B477CM0HPDiDC14OX9KGPoaQcBAWYM6DcM7lzegi6h1PPLT%2Bc4q893WEkPxyCC5fhkvcLK3DF2PL7YcUFzM%3D', 'http://trends.revcontent.com/click.php?d=oFL3JtTnzEUYufrfmBU8wXAK3sGRTM%2BJPH6qCbDE%2Fgi8NRt78nZ87fZK0WNXQNQsbt08Mi2xhW%2Bbnkno6CQTFWkkHKUyRBoZ8qXFNVlwQVx%2FAFQAOmfERiPZ7MYB2SMs6fGLZLLnNWjBtz%2FFGl54xKWLvs0lXiBySPZn2wgtHDwJGaSVwbOzLowxri26nWEAKqeq%2F29sQkZv0XWgXr9vPLdjK4Q6z0xmuVJn3ZZxT90GVKPOdkZJLvfEpSPU0QhFXVxpghh3SxZwSmEvRyt4hop3JD0uQD7shQCmk0INiySX%2FOn%2FZBOSeH9%2B6MTem5pnS%2FaxT36nUFuEu7zNlfWuXXB75%2FTpHwzi2OG9%2F7qC9s4CWARIk7ZWHHZhQ%2BtR0t%2BNTDOOf5ZIzvGFjKSDQdQAP5IcMeabf%2FgxF8ERq3X3uemqgkPed4Bn6oghLavkhKH7fK7%2FJyMBK3xLVJTsX9iwwigL42VLb9bIgq6MgkCgrJWoqxQjN7U7NVCXPEgXrdpaunv093r7dXC8Fp%2FUfEKVqpCMZBj%2BMbJW3oeMrV%2Fg9864OjOC4OvAZJ9u3JQnWEYqHr7fzd8vXK9wZzUnqtQArGrKiDGAmLIl15OOmpOFLd%2BBga4AX7su3DDNch94AawRXGVNmxqVo%2FqCmkjoT28cSv%2BS%2BpEb1cER19u%2FuFNFEty4mhS96pfYFqH9jVBpg4VNQrIrdCB8sV3WF5OX%2BzQF%2BCM536FZennU1WeNAJ4hXz7mv4j3Ep0A9f%2FBVU30YlWYuGokqvnhKHMhpD%2Fzr66%2B0nBPpiQEx7z8hsvKM8u9S2w%3D', 'http://trends.revcontent.com/click.php?d=Ud%2FHgIQC%2BjI0GSC0TKZxLbEw%2B7e1R59OFqlidFC8Y55k3pCPLh8Y8MRGsZNn%2BAueZ1%2F6ifYpBlnwNdlKqLykY%2BWmqGsIRkwFUNe71EY6mT7bKYtHX8Ttdx0YkWDJMsgwEjjVV21FjmOTfZZXNe4pVxNsOCrFK8P8N5IXOQKoxVj3qT8keOWT9F98786adtf4g0oQGwJYzYD4kD67g0eDYneXA1kaKWUkjh3pkiJYbbCxJatkJ0MXA6H1yu2G%2BO%2FRSnDy6Bg0DtkKOkR0HFf45GgLndVzZrzItPx%2FnBeoo85HSsxvTy%2BDIzicUpirDt1kLhAahXZbBjIH92UfeTFxmwY4Q37b7cm6P0KS%2F7MPqr1S7glBk8pGbYicnHGjDcq3%2BdNPG8pWK%2Fu0wM0xuCNBYiM6NysfKV487x5GEEP2MM6KJO2MNitdQN%2BRx3V0gy%2Fpx2RvlZJfIYh0pT%2FHs5B0jatOrxkr7iRPaeAbqkwSua1aL7RCJXMZcW9zcvDkLYVNZ%2FTndUFmTxGoM8L%2BbsuUW45gcqQ4BylpXw3CkF2WY0ds1iGAs8wjhYM9IQ48sT%2FES2yyokdYmuZdrlIoOYPJWpYRygr%2FiKQBrhKPSWdbvSFR1qb8AuZCujBcUBv3i2729vk9sX0FH9yJZXX1SsAbA%2FnlsMfmagFTIOfcIvjZvtSG71WxvJidWkvvR7NqN%2BaNw6Ui%2FR65DLkO85hXiC6DOrwCakPs25j6yq%2FW%2B%2FCiZ6UgeE7MrNh9tBYMXeLn6czRE520XhfHyASO%2FEYfLoIF%2FI4VA2JWsZLIxsVX9J8%2BkKQ%3D', 'http://trends.revcontent.com/click.php?d=b5qdGdmj08D2mk6gieqw4cAQTs27tbxdOw9g6WIEcuURoaMGYrI2GtO1b64XLKSD9MD1VSiZMSDLFo%2BYXWkeS%2FQBI930SB8Ich%2BAl6mpFpVCu0jWHp1Siobex2DP%2B%2F6llXROjwHHyab6MMnvyf2ZvW7oikuN149vgEAKAuZIAXaSZdRYYfXNRj9GM6Q%2FGa9l19TuZgdmPV679DAQ2b9AcaR17EZPykkEnYD16WWVrhjYxY3TFDf%2BP1GFXMxpOTWdCKHuOWn7dpckvO44SKDP%2FJ4MSMjQkTRYx0BYrWknmWJfMTr7FlEenrmerNFT2m4XHLlu5m3Wf1mE44kjVcjv%2B3aRUyJbWiq3m1ixiyATrNjDMXIVjWIURUJwuwyHRZshpAk33ouRcn3xyIXWUbLYQT%2BKBDJtmS9GMHKrUrA8Rn9RzWFpFQ804Rbkx5Yq%2FlkfS%2FEClerMlPMZCccprVnXoU16WGnpc1mhp8ZQTKCMVgJSyRt2%2FgB6I2%2F%2BG4I6Mkont0Ao3zuNf4ttBnYrBSLIhZxarqWoekEKXlT%2BMpxYFTw%2BNRl1Ow3ZFoq2uPF0SXLEsSkgsexu1zEzO3uyRmaJPmxB43eRXYG%2BehstgjSHSJdF%2FEJ0qiMiV8o%2BojHDA9WHKSdqVOfG5qR6bIsBFrG%2FFAoh0c3lKT9RDrUMLjDHIhIrzH9YLTZOFn%2BaM5SJekVZxik7okArlG%2B03qEqgZO7gtOyHHYoF5WGoJRiUR6WodntUIdZ64%2BIyOOxwu9qGxboTrT0bUxCoiuG5B%2FVY96ANg%3D%3D', 'http://trends.revcontent.com/click.php?d=At17qHdUIXeTti4sK7tv%2FmrbyXYwjQKd7RE3FvW9CrkA63gC%2BsL9nYEaWF4aVF0FvADDkZAVv08eVer%2BYWQzP%2BzF66QM2BQhICZfC2kRiE7nmDgqd7AGP9Hw%2BeePRzqzXxKyGFTAIuijV8JrD3QtViCc79v43GFjHJ3LYMl%2F8FApF7OK%2B%2BWGkDjgMXMPqea%2Bqk7jQGuqqMg39TCpjN0zxRLMuziPz0Rg4oUJin8H8a%2FRQv2bhGy0iPJJXJ0kAjDGOsITL88xDO1rYblPb9rzSF9N7nLpjZlLuz0PPMsC4q8RxiGQ5JKhxIHBaU9qGuKkWRpC%2F4mXuXX8st6Ap99IXXwsTPgXugF%2BIh8lQ%2BOQDblYX01MeMD2GvCOS6t7a9WaIXBTM3dhzv2UOBEGiN40sdAmfUP7s0cg21iecRvMnq8nqbFBKrBFcKDrQFPene8M9BYa7OPG8Twd9Ht7pripWfbykxu%2BhNC2mViZcFP%2FhQz10Kkrq5lkvwyfEdzuHPILo4MJocjtKvH1C1HMlEA6TQJAJwQyO8K3KILSKVqQAQIgqkh7zaCRo75rzeo9qY7ArW%2FavZBbM08GzCj%2B%2Fc6zhsvaCjvlwNAvwNkKBPO4UAQSwY3TVvSzahqk9ZMkciGnQLRlDsWcbanLDsqqbG2EV4R%2Bl5c%2FGRf7E0SobESTvn6vTm49KP%2F6CcbwbY9ijYRDjJrVkDuTHRp9A8DqO4i1aG9MSUKXDxK%2FOy%2FZJlfAQ50tdz7oCKUEjLLMHGtRUF1RQPN1GvpPN07CDdoGnFaVYw%3D%3D', 'http://trends.revcontent.com/click.php?d=H7p9jvqg8Y0rPriUusTzzIkG7hn23JZEoKBAiTFioIIXdxfCWPvLrzfSdQ1WeuHPvSwyoQUxAsR4dc5PejgiGnoZU0du3Xoy1lR5E7jcy7EJYbVMZeWx90%2B%2B6o2SkG9SZcFmqRIKVExlGtaa5G7OXpMBequ%2FYY0NYllRPQ%2FtV5%2ByY5aKE1cRCbUr4%2Bn5jQEoYOKZfeGcCgaDwojxwZ7MW6V7vYe0tKlHSd1ABG57tilNjwSho3W2fLBbVR9oLuRFKrlxSGTeGSF7lXe%2BUHtKZDtAyWTCBUW0foFKzde9Wv4ejPaRMUU7DgAio%2Bzey0YTj%2B%2BIc4GEw5aTFU9u8UHxcVOyDFqItlGGrpMo9a9qZAOxSPW%2FnfDmJ70yaBesd%2FeUiL2KBjyrpUbkqtAd0oVpwmkzu73SojqCeuo3gTJZtRnunE4cXNVEPmk9wiuSt3miUOAIXwtR5%2BJeuUeJxoO1upv1K0M%2FC%2F4ERdpV5%2FKYb1u%2BdCabqYbgqL1Ey%2F6ouMsaT8NM9UTM9oJgnorIzQSHUDVeao8n4FTid3xm8derPFI8nRsMgArbzSbOfo1x2D2x27V6HoD4h8O4HzcV%2BWejsD%2FzOFBYjW%2FVFJU9x5EU5%2Fzm0wT6ckNT3mH6eOZrClgUFd5AneySSSD%2Bau29NIuarb91xgvVylFBtRdSqYl7Jl5pN0BtTOtoCwTBnQmo50DWsLFReYwObo8p%2BmMYsrQLEBhcvEkqSVXFs9uu9hnHzCFAKO2AZO61tmScf64eYR5T97Fg16ilyjVQOpa4gehd3P%2BZBY9Axph%2BTc2lxh1oBAc%3D', 'http://trends.revcontent.com/click.php?d=k93fr5s6%2B9BTtZsxe%2FGhrSj5bBYkmji%2FO%2FkcXN6qyNwROVcuMZHXA8UQrIS7GeDmVtKvpbQNujTBJzsWdz%2BO01VNxa%2FRVF2SXsncjMjHvNKElb5KNo1Zg2gf71aXqU3l68%2FFqCnJIRWNFnXQRg94igBgm%2B2lJqzrsErPUH%2BB4G5aYWTpencZqKwB0yuuHfwBw7hNrOI0VB6%2BCaYORLokQbZE7Fkw%2BarOQz94DaLMF5%2BO%2FAIKKVuEJIvNukoYwyX1ZylZLM8C%2FsBJrjTHaLuX5AnsWcLGFiAG2g4XiKaowm%2F2HbF00xoaL36p7FsWY0qitqcMG34E7LBUWV%2BLr6GPvoNgglItEjH0InL6c%2FKcBReQtyovQGovRS8nPprtVMu197LTvuNspjA5CCRC9qlsdeDgZ1qC18Ymi8wniAKzBmBNsys475IDkKRImvYq59f3fwJKO3u1V2MLhOIC%2BfKoMIttOqzt2N6LUX04SbEW6nehdMDKUwsj6aqF2l8QjrRWGE8yedY1jVQn1J9t3STYM1zkd4wGBO3NML2kzAmKdoK7rfzIubd0RcachqaWOuIcgnL6rdNHBJE%2FyAJQRWODshZoLwi%2F0j0f3ywqgD%2BtPiNBtcpUSyEGRNQrgsh7j6bQ9Nnib3bqC2bsncg9%2F9DXIGuWkpER2%2BAEzRjWrNKrpXkGQPeT3SL70zzFlRRj2LKmTptU0KpX9Ade3OgOnTxM%2B0bHYocBhgcumHr2xHM9aIvJC0myH68fwda9E5dTXooyZPwc2E0kyc%2FxKMxzcjeoSTFbRocgsm7bwPzIFVq71XA%3D', 'http://trends.revcontent.com/click.php?d=D7zVhol0pjaqu0QuKSDNf4tzwdp%2FAgTXGJLJieK2e39fGbQv8NxMNDHmRLR1c0WlUwJ1S293lwKpnVRjOs66UFs9VTJasbuGHThLm1CKKe7WEnIMHwkQspXlXZgRtrXTxaISRnYnZASyJWSSvwfA5dzyWgOlrNK%2BTB1a95fpOKCsvBEZ4zC2UPVyP%2FoIXgqs6Sx5%2BCLjmWJR0TDFFQ85M9DL49tPjcAMWdj6MCCi1MmoM4XCTQjauySRkETmTBkM8ep12ZQdEfKilq7qxwsOHiVHAXcjnJwKkqNedn0K2mxT1mfhwNXjJhNFrUsqRNy7A41oesEIeVIrv9rPQ4Tp3jOMqtZN81KYjnxKOaj58jQrOn9J68KzWrynOz1lGDljI%2FqOU%2Fp8hpBHLs8zVe6Vw48DFQz8e2kN3jefoMLts4GDK19PqRHrt5HPmfHQiRaJboNsj%2F9%2B1l0XgZZXG192RPcmpTuSkI0nlPsCpDUhIfENp%2F63JO7%2FUGP8V4eG84i2MeLJzNIyZioG%2FsRmilRWQrwf%2F7KOb2iL04TdhkKkQI647VIAy12guVlxJNPWkINQYfdliaAgEO8%2FMmQub8x8dacFJmrOPaczsJqKMb7GIBNhTkaqvSi7vaOJ7XsXmc%2FFyIljE1d5BiuO1JDg%2B1%2BD%2BiSJhjZnlCC6FJ81HYmTvbykhP2uU3EJKx%2B5MrCa%2B5ZdXyBGOHkI6H9ANjTxVA1XYy0rXeq4%2FyxJbr2OWfv5JjXMOs4zZQtsYZgJx4NglqnCptaU5ToR1xATnHrcNjktfwxxr831AIX7Ae%2Ba8hL4e7c%3D', 'http://trends.revcontent.com/click.php?d=llkLgynpMSuoHPGQYCk5IeOyAOSTv7I1PwhbKHT42YJ5D2N1E8UUtiL4iOlZJ6db%2B97lsGuZrdSG4LXEF2vRPBMbDtk88%2FlLMmcs2cP2nxRmCoR31Xh%2F4hlCbEp%2B8pkQl1HfrPDKg3jZnS%2BeB1xogko%2Be3HOu43zaSq5IFQguexGkl686qBGFPQTPCO%2B03RE0kDZQaPgz%2ByUALkmEe9odQZt7AkKUsI%2Fjpa7bl%2BltvAoXcngsrWdbM4kZZz%2FoTQAgoP6mXxla11MuPIQHxK1L0Xbmvyv0FraRUOpSidH2xTAKcKO45UWCw9Pv9eMZ8pHl9y0U8v7B55kaZ3kQTYjpDAhl2ZZ5yZJKARUKwPoW6R9tlskBXeHQ6hRMS9x2DWUlTUC2%2FnRmrInRSJ5ShpX9e1JgLqkSXX2Ma34ZH5V5VOKxuYs0%2FKpSZ5FhmwT762%2B%2Foi0eTXBUnfBMvvJ8MW%2FIa%2Fjo0sWM4nMZONImcRB9WfU2fzVddocxhDeDFGT75GpDubzu7RnXXRs2SKV9QfssoXI2BacrcfVyhujMGiX3A3yiHkPQ96eBeN6S0xaSPEF9Nsqg9vMPGfcQ%2B98IiNGxweJuTWK8FCMeXvznCQQzeZ2d5XHHxtLXFKJrMm7o9w6VtuIAZLjM%2BA9SOO4Bz6e8PygcMkbY%2F2tp3JKFKMYznLQum%2BuNjT3YwMWyiTDinnyuXgMPQ%2BkXM46r4cGCJ4fq87WzSIoUgSGKs%2F360oLV9UyuBNeuWgmuCugPQGT92tRHxntw3bnmVDMPChnVCl%2B9gOv%2BDS7roxOXVjrY7woWjE%3D', 'http://trends.revcontent.com/click.php?d=VpyJ0dMuxKyTm4JrCI9q7xrUyZt6E7QHaLSPji6QkIE7FHCh9GSglnKawc7mzVhouPTQMS3sBprsZiVWlC%2Bk3%2BKkQUUKVDwIWYqTt5QSVRJkoleZWvQniF%2FiTkvjz9d9fsqLyrinoJdmapt8w3eAP%2BinP70yOec60KjE0rf8DjzgtMKowU36vdAESiwXM3%2B49lAkX30aHvJrle3qDo5mA2Z7Rgovjckqlq8fqy%2FXkHfHM7O0RCO00tBtLLEgYKpXeIMJPDpbQcVElaDH0YdcSNoIZYYbXPuhsa3Z%2FxTDOa4hYvY5HZjOUs0ziSHIEAgR7tjcRLeLVwZ5XKPF0B%2F7R2DYQPp9vCoUWKLWAn0Cf22CtBNljHAggGRfStQSApKw5G0QGRgEXh6Jo9KPGc%2F%2FOXvOq9wUhF5QqKRut9PyuT62oBOP4BEXtqACKG1bsjK2tqoVN%2BOawBP3W5RP1OzHHVrItquNeAU%2B%2BbyS9oh2Tc5W7fDTBykFACynvZYVT8qhnJegpWI1NxLDn%2BxFEPnC%2F1njKsijDxwZiN66smEVRBV9Qwe8Acq5GhxhzF7%2BWCNZGdFx7i3U%2BdxmSYHcpNT6%2F8NgqPkZqlsxQ9daa%2FMWBVvxUjr7spIgqaIFrYwSi0H1T7LqoHkfhKzgJRbKYFzQjg%2Bcbgaof5ZxQMReiFhMvQtfhGYJietI3Jhf6iuAB0nEEVEFaz78P%2B59heyhGMthJFxzK%2BcFXXINq8ORAVs5jaJyZq%2BZS3XQ1CcEpstJ6fQvBLrZephQ4iPaBf%2FTOJiAQ4ZlslqjILYC4FAieHC3hsU%3D']}
Here's What New Dental Implants Should Cost You - View Pricing & Dentist Info  =  {'img_urls': ['https://revcontent-p0.s3.amazonaws.com/content/images/1489682572.jpg'], 'dates': ['2017-03-27T12:59:15.237Z', '2017-03-27T12:59:17.051Z'], 'sources': ['http://worldstarhiphop.com/videos/'], 'providers': ['revcontent'], 'classifications': ['tabloid'], 'imgs': ['f70f91d2ebf37e35480fe4f689477406adf9243e.jpg'], 'locations': ['http://gaindentalfixdeals.com/?affid=1016&s1=10949&s2=1801868%7C239864%7Cworldstarhiphop.com', 'http://getdentaltoothfixdeals.com/?affid=1016&s1=10949&s2=1801868%7C239864%7Cworldstarhiphop.com']}
Michael Jordan Has Pretty Much Given Up on His Son, Here's Why  =  {'img_urls': ['https://revcontent-p0.s3.amazonaws.com/content/images/1486415171.jpg'], 'dates': ['2017-03-27T12:59:15.614Z', '2017-03-27T12:59:17.480Z', '2017-04-29T04:22:47.966Z', '2017-05-15T10:55:38.925Z', '2017-05-15T10:55:39.281Z', '2017-05-15T10:55:39.952Z', '2017-05-15T10:55:40.263Z'], 'sources': ['http://worldstarhiphop.com/videos/'], 'providers': ['revcontent'], 'classifications': ['tabloid'], 'imgs': ['ab914b86682795c6d6624707b22b06f88f0e551a.jpg'], 'locations': ['http://trends.revcontent.com/click.php?d=vJdwplKu0pUY0G8mZgW7%2BfkWm%2F8rSKsQQkXQbHQYBgW3pRycVsQRgTsyi3%2FtsV6I4lap%2BjX9h1%2BEbLcUlqTQMVSNfHQQkbUicfWHb7dw91dD0inXxnglXt10FAQWZjo1Larx5KRm92nP9arlHHZz%2BdyE9Tn5guObB8L%2FJp0Dt57DtRF%2Bfok8%2BfLSpJtcjLFjO1r35UKJAuSO4bmpYbB109TRS1lWZHUtRsO0N6DTib1O4c7Cn7iEWlC7iWej6AASi16lKmBEyLqQYrzxjEwaTbWZuqglYDO6XYRqG%2FyyuQ%2BPUcik74RbX%2BuOIkungdlYncPD0dXrvhTRETfRTPb8yoZBMt7o2VPDp0qHQXHsUiJlZGHe3MWaSTXEQuYxs2U1nLhyS8NlxIo3TAJi41W2ko8JSm2oMSb48e1EVcCuKL9Ep5cB4IcwcEyc6tJvwJRH8GMfuSYVHLMatxlsKgRlvo9snwlOIEY95fOZrXoO9B84ebMGPeUfFAuDmiK8mklUF%2FsMXmkh7sPSD7uuZyVvRRPTVC%2FfaPjZuIhk9o%2FsETyg9v7kvxHi%2FUZ%2FcN%2F6ujf%2BD5QZ61baXRElk8xvSFTDpqLW4lBTNTaJ19kKloFzPuO6dHqkLOPactBW06HeOQp8%2B5rt4xTIg5%2Bkc9ndm8mTmfpkP1hP3TeEa%2F%2FjxYV1CkXG69pOW5Mbp3b%2F03%2Fhp78P1bp1%2BtlhCORKjyGSvWfl2YTMg4bnUQddHrgw9BX0diQ4ZnuQwB6Lt7oUADBf7mhl', 'http://trends.revcontent.com/click.php?d=Bh%2F%2BvKNq0Yge75HL7eZKGEBZFXmDwUSro%2FYm%2B7dIn8ge57dHmw6JOzhz5BLk%2BlyWfl5DnJgcNB9tgTW4AabntGxJSZXpRSux0HYbxrK6BXX%2FhS6B%2BJ8RTkH9bSrvgb5THbidoOPo7HaY7Oak7vZdnRbRWJnm56YHPwQ%2Fm8dE2gm00rz1qdsEqrWHzVL3zTWuCp9imlBcXnMgCVuC2NbQl2pUrPghg3%2B8S%2BvGZzh71UUl51V8et0Ch6OemcQ1xs4%2FIKymJddu2XWF6yfqRQWXBGqUXUBXCjB9AB4J0DQfcwsMtrJX7lBPFQ1zq4ngC5MplmT5jt8GXDKx%2BP2sfPRnrc5NwLqKp93wXhKM24nR%2FUQE2b0iv7ojNe06yS7bGq4yRQmymVpombe9CMCkr3YIiGAPGIvMRmIJiDGLDCTFluqk39jiyYfJKB2guhWvPAqe7Yy%2Fo9r0fZxPsERR0DN0GtLqaRyIrR6GPavWquPWv4%2F1TJJxcbDpBcwy8A4TWYvatIjMYQUoEO24L6pd9nCEVKB3cw8BmxX0PFD4bHfCrh3CbyZmR41R9jJV6y%2FuOK3cRIig2s6Vtt8WtLVtvrXi1gm5WstuEXeaYT9z8vzk5OWn7JXFr%2BdAXjarDidQh9oi6pCeJtKgeqqmshOPA3fEF64cDkj1DsRksu8pJ5mazI9Gf5w9VykJnA87JVbg8LJFjSrqE9wOtMWyUkENOxrCNeHVU8s78LdMLMPjRZE8%2FuU5KSm85e28fW2%2FXTFVRzDT', 'http://trends.revcontent.com/click.php?d=R%2FMznJavHHk6hqB%2FW8DTG%2Fqoa5q7ipZx8O4qu6LSZUqEstCK4gNm%2BdUpwvX1yhqnrmTgZRSrbRf3ranJwsVstbshJvAOU6SKqJgqUwEJ4Y0RxhDO5Z%2Baxr5J%2B9SdPcipDFAsjIFjhCHkHyZL4QeKtuT4AYFsLkFHX1HKhHnhjCWJD0TZBpolzPVxaJ4mFhc0iYvISyHbeKMUyGM8i3iJc3pxaSJOVqmBFl%2BzJwHGnSF6UoXQy1HeHPsEa0PQ192X5bwb63CC6YAQIIKawp0HL%2FanYK7RlopuQ2RxuExb7W%2BWjYdxlUQfAU2Bf%2FFk9jhk58VCCoaGydZvxPFUuvgcudDAp3ALTYRQFNW8YqJlgTkDKxuXC%2FRyU0vHU%2Bv%2BxBFxLkKb6x%2F2V352WKOXGeIvXBd1NHKHQ%2BleUOLuTOcHVQ%2BWZCx6NLcd4jhh69Oj%2F4TLSMsSVgw0imAXXW1ismZc3Q0LodL3NO5%2BMh26EXbBjkakhxhjKEwBy5Z2mh9uAI6vZrgCf6cdjBUmkedDZ%2FwN%2FIWEsTgcSb0Lf3N1Yu%2FPL4ztOjaPoK85iCN0tmt5CNN%2FHu%2BeajrsrsEhNw7Th1qKmBWyK9T4AwC%2FoM4rYN5eyjLEiyqsk404oGVxPH6PzXRhoD%2BpTZZpFtBM5m0vpVGY%2FeloXNgGpiiXMZ1%2BqgJ0ZI0RteyS9c%2FcEBgkVhuQI5EaA1%2Bg%2BH518ZotG%2BHtzUt9E2ZLmziLlbRxlEdl1dYKIB0fyn%2Blu20YZCNJayCfzgc5XgOKYLTXZPhLkGBgvb0HL9zJ9c7xs62XPFXVIjeaV3U%3D', 'http://trends.revcontent.com/click.php?d=zT7csNf77I2ykPB6J5zpnBow7C0%2FgdQO%2FFJZHtHFxUROTLZAwaNhs5xukII%2FsS%2BkQTiX6YBklaN8GYePzEF%2Bytyh1TAp%2FJaz5S%2BBRq92tyA2GXS4iKVxOlPfGeQ6xw1w8IA07RXrr8AxL9ePw2WfP2DEiAzCdFmZU%2BdyZlkx9UoF44Tr0%2B25zrnwDBugZY%2FE2I8SjVi1E%2BxDOKV%2BiFMNjyj7GSkb7xB7ZB9t9JM46MOraK1AL4IKnVZJh8GATF6jot9x4%2FDvnjbGN84j3BfK5ymQC3q0%2BoQVF0Tcr2F7w%2FCZNGbhNERPVbWz3W0XG9cM0C1X6QB0Nx2YmTKOlg0v6OuILJxlB1YPUP59%2FoPcsHjcS0SHMunMXqE%2B1ZTOQItbOkBru%2Fe2oLm1Wknzc0%2BcELsFgKcd0PZ26zrzK59vTfDy1mku3sbmZCv7nHIwga3lX3igUapt%2Bfp65NCFhyODBCRLjxVtMRcc25N5lt9yYZj10Cv5SPY3CcowmR46ExaVFbWuWwVVGD%2FrG1TUXETKiEJXWGmY%2FtXVF8r%2FQrLz3RnpVbJEK7BYsgH7jDlosuoUX6bzBVyNGDwd8vLPlpRdI52qMW9ZfPVNEQkOYc97b4gE4zmG4PNnaVgKnE4uUBFnfxrFRpUgRl7JGnrKFMpHGJCoHgSr5ByfvRw6udPyDDP00RNHzqmwYnWS1WQaISUfUE5cZ39j4RdqFG4ulrp42y%2BQfMJAtRJw%2B7YHRFU%2Fr3PPMusmJnuOKRh1qxNjn%2F2I0ePh00sekak73PJH8QHLzLP%2FXvalFM3XKsNW%2F6N8tk0%3D', 'http://trends.revcontent.com/click.php?d=zEhzqSroRD193mvTp8qDeQ78%2BC5d%2Bt95UYzMige2O7ICjvDgETAoQDoJynnLbc66YwSU9DtjcNOK4uDi3kog3qt09Q%2BDTQ%2BaLqIIOTYub2cjlwOSvk6DoPdxZyoqvqvgOI7Ebt4tM5WDK%2B7ASYWHmW1d19eyB68MjkCm%2BFfRJrL9TVpOsuUNdB68%2F62vNHvTPvzrS9gSTfRytI55EDCXFgqLiYWUl2PQGB50sWyNotDCJmMe%2BXF2zVtqU6z%2Fkmli%2BThB7Yoe1RRxwLwDevhg0Z47vQWgHeDO%2FXYAjnrRikauG9apugaS%2Bg4SiEn9DSXwQMbz6krHAPuFcor2E%2BkENd2VXCrR1%2FaGWHJCsLHXfK5PYrIV09Ay2cURGIbahPjmDLNN%2B7vwssPjOEhibj9g8pOpDCQe1Cg1aTcmEafs9oBH%2FzGIRk31eJA%2FTYVWlTxFL336%2FyMK6sbTeug1Ek04Vbr05XNmQhRhd3L884xd7OiJTFsXhXIjPt7RgH0s5YQad%2Fbn%2F6kztu1A09wZUXAaI2k1uZ0nceFi844mTe68e528EMUSUsvnx7yacN0U4XA58%2BjG3EsGm507%2Fonjda1jzG%2BCF3tfQ%2BmtQq%2BNhVXwEQ9XGwZElfzAKiie2l7lBbxonGmb8w4WITs%2FRmuxqwCXxpg0A43XGRDi1KfhJqEvMkqOWIQGUqmOqbqL5bMCT%2FgoYnKxH96FqlNkAbMODgpA7TgQCNxkeLeqj7pYui5NgNP9Yyc%2FVYnB8YOgG1BtLtdqcAeS9u2oAzmHGjpwg%2BCW%2FxNMV8C8ZDrZdbZe3BIlZiE%3D', 'http://trends.revcontent.com/click.php?d=yhgcZoUtLD2nk6vdesk71n%2F1XgC0cbYmAOE9WH9Ui1FKFhvAy5wMaM2lOv8bjt8vlg8tBas7BGZfuoW%2BYYJ%2Bd8ome7RyySnzohklQ2ZxS%2BHLu0J%2B6tB95wRKyu%2BuV9%2F1I7U6vi8VbaFy8KKWGmQ4SmNXtRwAznmv%2Bh1B4Xgy6oKTWnpOt1LhQ85ukw7ckzfTKQXuxC%2FaUUDfhc6MRIcNmOQfYTURKv%2FYDV46B12qxma28MY1O2CdopxidA6llOM%2FyWDLHiftvjlTkRhJu1lATqdhSDU2NWK%2Fs7Wz933mgAreM5DWRyLUFyJm6Hg2ZS1s2BQCsLcDisx5ffKXdCbejydeIk6YkUut3kj%2BUfLL8GJ6PwBjNb3JTYVRZ%2BBzdgpdwZXWRMOSM5JK%2FcDryya5GpJHYIbU6HUXs6tbyEP6gMxfXveJvvHQ553PQg5CI8b07BBnb6ebs6DfWTkjCJXztaB3rfFuEyR2stgcHDKayoMuEOcxHEP1WNXOycgtTAuKqsFooJj%2BItYELkUGPAXzpBTwnSCmIV%2B06x02OsmzyepKLT3mvB4vm6w6RYb%2BTYlCY6CwCisCW3f0U%2BbnbQ%2BkfpwIpV6IudYfo%2F5zbCWj1eLqJXnnGdD67Q%2FL4CLjpcHiL1OudHGgOtGlToq82ovH8zBIZAJ8qFHdXUiYRnzCBDqL7qRXSDWMEIW%2FJScgqOqJ%2FLS18EYFMZJwYVCFoI03pw%2BhLS73cg5jFJQ%2BAlYGpo7AMDFP8dvkxJ0sXp93bDyaTrIphOLmeBr1EmpwE8nTN94JST2mghn0Ze1MCYXF%2Bzw%3D', 'http://trends.revcontent.com/click.php?d=7J0uNuYGBo92nwrA0hFHhWhYZEIHeRFos5DOeIZZNLepawY1CcdFLyUeD2LZmX5kx3lQ3iYyT4sqCjAReCNPdYTOCO1omwDWoSSltVzfX5S0%2FSXj%2F5SmuU%2Fv1H3aIy8Sbt%2BW9jVyQToUOfnxOKdQ92%2FreSiycG2PLCZXZOhCt%2FCMLXqkoOl8x3z3D4BNifXuL5UAsTMHKHbvgt1S3Bu66h55oQLGVCAv9nYoSBms%2FQZdo5HlMVO7VQl9IDfcnUsPz0E%2FG7zN7xs5Y6xb9k%2FAj4qOrc2YJj1w907pjSBfMhct0hX471jpEAyIfF7MMVLj%2FVSHmNBWcrqHFaE6TCCVkKFowkglACXJCpEkzVwaOf%2FsmyVvwKa0FY9D7qbjwWI4k6IwV8OSxDBmXSdVemzjHuFH5waovG%2ByHeiYPR656BNDCTW80CiQQXag1Rp36hb8hG%2FDwP34T46st4wQw73%2F6%2B40Of7OWg7W%2FrTWbt0pyuL5FRbWMYhbJOyCCPeJ6kjpoBiEbQLnUx7%2BObAXmVX1IwKvrqYzQnzulqXPycLjRo%2FyfzqmzvspFSnbNAIfy8br6PGPbBUVZDdLCAnstgPwY0SW85cIbaYL8cTApQeLKz9yNc6TdmrHpHrXJIEcuyaPQZm%2BxyGPuUcxFi4HQyxelv6C3WpYmhsOyqsO%2F9MxuBiU9lvOww5CkapbNKoGm2GEKE82JhuUHujCmgwlfSzw8%2BDn9csKkm6k7zyE7vGL3qAQcl2ZhJs%2FM1fw%2B%2BscKICf2Kwqz8lwmzkfAFQr2SHBPFJheZan9a4HvT4P%2BC%2FIBZQ%3D']}

In [83]:
def to_json_file(json_data, prefix):
    filename = "../data/out/{}_grouped_data.json".format(prefix)
    with open(filename, 'w') as outfile:
        json.dump(json_data, outfile, indent=4)

In [84]:
to_json_file(img_json_data, "images")

In [85]:
to_json_file(hl_json_data, "headlines")