Grade: 8 / 8

All API's: http://developer.nytimes.com/ Article search API: http://developer.nytimes.com/article_search_v2.json Best-seller API: http://developer.nytimes.com/books_api.json#/Documentation Test/build queries: http://developer.nytimes.com/

Tip: Remember to include your API key in all requests! And their interactive web thing is pretty bad. You'll need to register for the API key.


In [2]:
import requests

1) What books topped the Hardcover Fiction NYT best-sellers list on Mother's Day in 2009 and 2010? How about Father's Day?


In [72]:
dates = ['2009-05-10', '2010-05-09', '2009-06-21', '2010-06-20']
for date in dates:
    response = requests.get('https://api.nytimes.com/svc/books/v3/lists//.json?list-name=hardcover-fiction&published-date=' + date + '&api-key=1a25289d587a49b7ba8128badd7088a2')
    data = response.json()
    print('On', date, 'this was the hardcover fiction NYT best-sellers list:')
    for item in data['results']:
        for book in item['book_details']:
            print(book['title'])
    print('')


On 2009-05-10 this was the hardcover fiction NYT best-sellers list:
FIRST FAMILY
TEA TIME FOR THE TRADITIONALLY BUILT
LOITERING WITH INTENT
JUST TAKE MY HEART
THE PERFECT POISON
THE HOST
LOOK AGAIN
DEADLOCK
LONG LOST
TURN COAT
THE ASSOCIATE
HANDLE WITH CARE
THE HELP
THE GUERNSEY LITERARY AND POTATO PEEL PIE SOCIETY
FATALLY FLAKY
ARTHAS
A RELIABLE WIFE
BORDERLINE
ONE SECOND AFTER
BONEMAN'S DAUGHTERS

On 2010-05-09 this was the hardcover fiction NYT best-sellers list:
DELIVER US FROM EVIL
THE HELP
THE DOUBLE COMFORT SAFARI CLUB
THIS BODY OF DEATH
LUCID INTERVALS
THE SHADOW OF YOUR SMILE
BURNING LAMP
EVERY LAST ONE
EIGHT DAYS TO LIVE
CHANGES
CAUGHT
HOUSE RULES
MATTERHORN
THE WALK
DECEPTION
BEATRICE AND VIRGIL
WRECKED
SILVER BORNE
ABRAHAM LINCOLN: VAMPIRE HUNTER
A RIVER IN THE SKY

On 2009-06-21 this was the hardcover fiction NYT best-sellers list:
SKIN TRADE
MEDUSA
THE SCARECROW
SHANGHAI GIRLS
MATTERS OF THE HEART
GONE TOMORROW
DEAD AND GONE
THE 8TH CONFESSION
THE STRAIN
WICKED PREY
THE HOST
FIRST FAMILY
CEMETERY DANCE
UNDEAD AND UNWELCOME
THE HELP
PYGMY
MY FATHER'S TEARS AND OTHER STORIES
ROAD DOGS
THE STORY SISTERS
HEARTLESS

On 2010-06-20 this was the hardcover fiction NYT best-sellers list:
THE GIRL WHO KICKED THE HORNET’S NEST
BULLET
THE SPY
THE HELP
DEAD IN THE FAMILY
61 HOURS
THE BURNING WIRE
STORM PREY
THE BOURNE OBJECTIVE
INNOCENT
HEART OF THE MATTER
THE 9TH JUDGMENT
BLOCKADE BILLY
ALLIES
THE RULE OF NINE
FEVER DREAM
DELIVER US FROM EVIL
MATTERHORN
THE PARTICULAR SADNESS OF LEMON CAKE
DANGEROUS

2) What are all the different book categories the NYT ranked in June 6, 2009? How about June 6, 2015?


In [90]:
cat_dates = ['2009-06-06', '2015-06-06']
for date in cat_dates:
    cat_response = requests.get('https://api.nytimes.com/svc/books/v3/lists/names.json?published-date=' + date + '&api-key=1a25289d587a49b7ba8128badd7088a2')
    cat_data = cat_response.json()
    print('On', date + ', these were the different book categories the NYT ranked:')
    categories = []
    for result in cat_data['results']:
        categories.append(result['list_name'])
    print(', '.join(set(categories)))
    print('')


On 2009-06-06, these were the different book categories the NYT ranked:
E-Book Fiction, Hardcover Advice, Business Books, Race and Civil Rights, Family, Celebrities, Young Adult, Paperback Advice, Mass Market Paperback, Childrens Middle Grade E-Book, Travel, Crime and Punishment, Childrens Middle Grade, Young Adult E-Book, Trade Fiction Paperback, Animals, Indigenous Americans, Education, Health, Relationships, Combined Print Nonfiction, Hardcover Fiction, Food and Fitness, Hardcover Business Books, Hardcover Graphic Books, Combined Print and E-Book Fiction, E-Book Nonfiction, Combined Print and E-Book Nonfiction, Culture, Science, Young Adult Hardcover, Paperback Graphic Books, Humor, Series Books, Young Adult Paperback, Fashion Manners and Customs, Picture Books, Espionage, Hardcover Nonfiction, Religion Spirituality and Faith, Paperback Nonfiction, Advice How-To and Miscellaneous, Chapter Books, Manga, Combined Print Fiction, Childrens Middle Grade Hardcover, Paperback Business Books, Sports, Expeditions Disasters and Adventures, Hardcover Political Books, Paperback Books, Childrens Middle Grade Paperback, Games and Activities

On 2015-06-06, these were the different book categories the NYT ranked:
E-Book Fiction, Hardcover Advice, Business Books, Race and Civil Rights, Family, Celebrities, Young Adult, Paperback Advice, Mass Market Paperback, Childrens Middle Grade E-Book, Travel, Crime and Punishment, Childrens Middle Grade, Young Adult E-Book, Trade Fiction Paperback, Animals, Indigenous Americans, Education, Health, Relationships, Combined Print Nonfiction, Hardcover Fiction, Food and Fitness, Hardcover Business Books, Hardcover Graphic Books, Combined Print and E-Book Fiction, E-Book Nonfiction, Combined Print and E-Book Nonfiction, Culture, Science, Young Adult Hardcover, Paperback Graphic Books, Humor, Series Books, Young Adult Paperback, Fashion Manners and Customs, Picture Books, Espionage, Hardcover Nonfiction, Religion Spirituality and Faith, Paperback Nonfiction, Advice How-To and Miscellaneous, Chapter Books, Manga, Combined Print Fiction, Childrens Middle Grade Hardcover, Paperback Business Books, Sports, Expeditions Disasters and Adventures, Hardcover Political Books, Paperback Books, Childrens Middle Grade Paperback, Games and Activities

3) Muammar Gaddafi's name can be transliterated many many ways. His last name is often a source of a million and one versions - Gadafi, Gaddafi, Kadafi, and Qaddafi to name a few. How many times has the New York Times referred to him by each of those names?

Tip: Add "Libya" to your search to make sure (-ish) you're talking about the right guy.


In [195]:
gaddafis = ['Gadafi', 'Gaddafi', 'Kadafi', 'Qaddafi']

for gaddafi in gaddafis:
    g_response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?q=' + gaddafi + '+libya&api-key=1a25289d587a49b7ba8128badd7088a2')
    g_data = g_response.json()
    print('There are', g_data['response']['meta']['hits'], 'instances of the spelling', gaddafi + '.')


There are 0 instances of the spelling Gadafi.
There are 1025 instances of the spelling Gaddafi.
There are 4 instances of the spelling Kadafi.
There are 5687 instances of the spelling Qaddafi.

In [1]:
# TA-COMMENT: As per usual, your commented code is excellent! I love how you're thinking through what might work.

In [205]:
# #HELP try 1.
# #Doesn't show next pages.
# gaddafis = ['Gadafi', 'Gaddafi', 'Kadafi', 'Qaddafi']

# for gaddafi in gaddafis:
#     g_response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?q=' + gaddafi + '+libya&page=0&api-key=1a25289d587a49b7ba8128badd7088a2')
#     g_data = g_response.json()
#     print('There are', len(g_data['response']['docs']), 'instances of the spelling', gaddafi)

In [206]:
# #HELP try 2. What I want to do next is 
# #if the number of articles != 10 , stop
# #else, add 1 to the page number

# #Tell it to loop until the end result is not 10

# #but right now it keeps crashing
# #Maybe try by powers of 2.

# import time, sys
# pages = range(400)
# total_articles = 0
# for page in pages:
#     g_response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?q=gaddafi+libya&page=' + str(page) + '&api-key=1a25289d587a49b7ba8128badd7088a2')
#     g_data = g_response.json()
#     articles_on_pg = len(g_data['response']['docs'])
#     total_articles = total_articles + articles_on_pg
#     print(total_articles)
#     time.sleep(0.6)

In [207]:
#HELP try 3. Trying by powers of 2.

#OMG does 'hits' means the number of articles with this text?? If so, where could I find that in the README??

# numbers = range(10)
# pages = []

# for number in numbers:
#     pages.append(2 ** number)

# #temp
# print(pages)

# import time, sys
# total_articles = 0
# for page in pages:
#     g_response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?q=gaddafi+libya&page=' + str(page) + '&api-key=1a25289d587a49b7ba8128badd7088a2')
#     g_data = g_response.json()
#     articles_on_pg = len(g_data['response']['docs'])
#     #temp
#     meta_on_pg = g_data['response']['meta']

#     print(page, articles_on_pg, meta_on_pg)
#     time.sleep(1)

In [208]:
# #HELP (troubleshooting the page number that returns a keyerror)
# #By trial and error, it seems like "101" breaks it. 100 is fine.

# g_response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?q=gadafi+libya&page=101&api-key=1a25289d587a49b7ba8128badd7088a2')
# g_data = g_response.json()
# articles_on_pg = len(g_data['response']['docs'])
# print(articles_on_pg)

4) What's the title of the first story to mention the word 'hipster' in 1995? What's the first paragraph?


In [161]:
hip_response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?q=hipster&begin_date=19950101&sort=oldest&api-key=1a25289d587a49b7ba8128badd7088a2')
hip_data = hip_response.json()
first_hipster = hip_data['response']['docs'][0]
print('The first hipster article of 1995 was titled', first_hipster['headline']['main'] + '.\nCheck it out:\n' + first_hipster['lead_paragraph'])


The first hipster article of 1995 was titled SOUND.
Check it out:
Portable record players with built-in speakers, from the 1960's, are the latest points on hipster score cards. In some cases, they are the only way to listen to many of the old LP or 45-r.p.m. recordings still around but not released on cassette or CD. Usually available in white plastic or metal, they can be found in flea markets and secondhand stores. One style has the arm cast in the shape of a cobra. (Don Hogan Charles/The New York Times)

5) How many times was gay marriage mentioned in the NYT between 1950-1959, 1960-1969, 1970-1978, 1980-1989, 1990-2099, 2000-2009, and 2010-present?

Tip: You'll want to put quotes around the search term so it isn't just looking for "gay" and "marriage" in the same article.

Tip: Write code to find the number of mentions between Jan 1, 1950 and Dec 31, 1959.


In [204]:
decade_range = range(5)
date_attributes = []
for decade in decade_range:
    date_attributes.append('begin_date=' + str(1950 + decade*10) +'0101&end_date=' + str(1959 + decade*10) + '1231')
date_attributes.append('begin_date=20100101')

for date in date_attributes:
    gm_response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?q="gay+marriage"&' + date + '&api-key=1a25289d587a49b7ba8128badd7088a2')
    gm_data = gm_response.json()
    hits = gm_data['response']['meta']['hits']
    print(hits)


0
0
0
3
137
4748

6) What section talks about motorcycles the most?

Tip: You'll be using facets


In [9]:
#I searched for motorcyle or motorcycles

# for motorcyles:
# {'count': 10, 'term': 'New York and Region'}
# {'count': 10, 'term': 'New York and Region'}
# {'count': 7, 'term': 'World'}
# {'count': 6, 'term': 'Arts'}
# {'count': 6, 'term': 'Business'}
# {'count': 5, 'term': 'U.S.'}
# for motorcycle:
# {'count': 24, 'term': 'Sports'}
# {'count': 24, 'term': 'Sports'}
# {'count': 20, 'term': 'New York and Region'}
# {'count': 16, 'term': 'U.S.'}
# {'count': 14, 'term': 'Arts'}
# {'count': 8, 'term': 'Business'} 

moto_response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?q=motorcyle+OR+motorcyles&facet_field=section_name&api-key=1a25289d587a49b7ba8128badd7088a2')
moto_data = moto_response.json()
# #temp. Answer: dict
# print(type(moto_data))
# #temp. Answer: ['status', 'copyright', 'response']
# print(moto_data.keys())
# #temp. Answer: dict
# print(type(moto_data['response']))
# #temp. Answer: ['docs', 'meta', 'facets']
# print(moto_data['response'].keys())
# #temp. Answer: dict
# print(type(moto_data['response']['facets']))
# #temp. Answer: 'section_name'
# print(moto_data['response']['facets'].keys())
# #temp. Answer: dict
# print(type(moto_data['response']['facets']['section_name']))
# #temp. Answer:'terms'
# print(moto_data['response']['facets']['section_name'].keys())
# #temp. Answer: list
# print(type(moto_data['response']['facets']['section_name']['terms']))
# #temp. It's a list of dictionaries, with a count and a section name for each one.
# print(moto_data['response']['facets']['section_name']['terms'][0])

sections = moto_data['response']['facets']['section_name']['terms']

the_most = 0
for section in sections:
    if section['count'] > the_most:
        the_most = section['count']
        the_most_name = section['term']
print(the_most_name, 'talks about motorcycles the most, with', the_most, 'articles.')

# #Q: WHY DO SO FEW ARTICLES MENTION MOTORCYCLES? 
# #A: MAYBE BECAUSE MANY ARTICLES AREN'T IN SECTIONS?
# #temp. Answer: {'hits': 312, 'offset': 0, 'time': 24}
# print(moto_data['response']['meta'])

# #temp. Answer: ['document_type', 'blog', 'multimedia', 'pub_date', 
# #'news_desk', 'keywords', 'byline', '_id', 'headline', 'snippet', 
# #'source', 'lead_paragraph', 'web_url', 'print_page', 'slideshow_credits', 
# #'abstract', 'section_name', 'word_count', 'subsection_name', 'type_of_material']
# print(moto_data['response']['docs'][0].keys())
# #temp. Answer: Sports
# #print(moto_data['response']['docs'][0]['section_name'])
# #temp.
# # Sports
# # Sports
# # Sports
# # None
# # Multimedia/Photos
# # Multimedia/Photos
# # Multimedia/Photos
# # New York and Region
# # None
# # New York and Region
# # New York and Region
# for article in moto_data['response']['docs']:
#     print(article['section_name'])
# #temp. 10. There are only 10 because only 10 show up in search results.
# print(len(moto_data['response']['docs']))


[{'term': 'New York and Region', 'count': 24}, {'term': 'Sports', 'count': 22}, {'term': 'Arts', 'count': 20}, {'term': 'U.S.', 'count': 20}, {'term': 'World', 'count': 15}]
New York and Region talks about motorcycles the most, with 24 articles.

7) How many of the last 20 movies reviewed by the NYT were Critics' Picks? How about the last 40? The last 60?

Tip: You really don't want to do this 3 separate times (1-20, 21-40 and 41-60) and add them together. What if, perhaps, you were able to figure out how to combine two lists? Then you could have a 1-20 list, a 1-40 list, and a 1-60 list, and then just run similar code for each of them.


In [286]:
offsets = range(3)
picks_by_group = []

for offset in offsets:
    picks_response = requests.get('https://api.nytimes.com/svc/movies/v2/reviews/search.json?offset=' + str(offset * 20) + '&api-key=1a25289d587a49b7ba8128badd7088a2')
    picks_data = picks_response.json()
    
    results = picks_data['results']
    picks = 0

    for result in results:
        if result['critics_pick'] == 1:
            picks = picks + 1
    picks_by_group.append(picks)
    print('In the most recent', offset * 20, 'to', offset * 20 + 20, 'movies, the critics liked', picks, 'movies.')
    print('In the past', (offset + 1) * 20, 'reviews, the critics liked', sum(picks_by_group), 'movies.')
    print('')


In the most recent 0 to 20 movies, the critics liked 10 movies.
In the past 20 reviews, the critics liked 10 movies.

In the most recent 20 to 40 movies, the critics liked 4 movies.
In the past 40 reviews, the critics liked 14 movies.

In the most recent 40 to 60 movies, the critics liked 10 movies.
In the past 60 reviews, the critics liked 24 movies.


In [ ]:
# #temp. Answer: ['has_more', 'status', 'results', 'copyright', 'num_results']
# print(picks_data.keys())
# #temp. 20
# #not what we're looking for
# print(picks_data['num_results'])
# #temp. Answer: list
# print(type(picks_data['results']))
# #temp.
# print(picks_data['results'][0])
# #temp. Answer: ['display_title', 'headline', 'mpaa_rating', 'critics_pick', 
# #'publication_date', 'link', 'summary_short', 'byline', 'opening_date', 'multimedia', 'date_updated']
# print(picks_data['results'][0].keys())

8) Out of the last 40 movie reviews from the NYT, which critic has written the most reviews?


In [287]:
offsets = range(2)
bylines = []

for offset in offsets:
    picks_response = requests.get('https://api.nytimes.com/svc/movies/v2/reviews/search.json?offset=' + str(offset * 20) + '&api-key=1a25289d587a49b7ba8128badd7088a2')
    picks_data = picks_response.json()
    
    for result in picks_data['results']:
        bylines.append(result['byline'])

print(bylines)


['STEPHEN HOLDEN', 'MANOHLA DARGIS', 'STEPHEN HOLDEN', 'A. O. SCOTT', 'STEPHEN HOLDEN', 'NEIL GENZLINGER', 'BEN KENIGSBERG', 'GLENN KENNY', 'NEIL GENZLINGER', 'HELEN T. VERONGOS', 'BEN KENIGSBERG', 'GLENN KENNY', 'JEANNETTE CATSOULIS', 'GLENN KENNY', 'ANDY WEBSTER', 'A. O. SCOTT', 'HELEN T. VERONGOS', 'JEANNETTE CATSOULIS', 'ANDY WEBSTER', 'GLENN KENNY', 'KEN JAWOROWSKI', 'ANDY WEBSTER', 'A. O. SCOTT', 'NICOLAS RAPOLD', 'STEPHEN HOLDEN', 'GLENN KENNY', 'STEPHEN HOLDEN', 'A. O. SCOTT', 'STEPHEN HOLDEN', 'A. O. SCOTT', 'KEN JAWOROWSKI', 'STEPHEN HOLDEN', 'GLENN KENNY', 'BEN KENIGSBERG', 'GLENN KENNY', 'ANDY WEBSTER', 'HELEN T. VERONGOS', 'NEIL GENZLINGER', 'JEANNETTE CATSOULIS', 'KEN JAWOROWSKI']

In [316]:
# I tried Counter, but there were two most common results, and it only gave me one.
# from collections import Counter
# print(collections.Counter(bylines))
# print(Counter(bylines).most_common(1))


Counter({'GLENN KENNY': 7, 'STEPHEN HOLDEN': 7, 'A. O. SCOTT': 5, 'ANDY WEBSTER': 4, 'KEN JAWOROWSKI': 3, 'JEANNETTE CATSOULIS': 3, 'NEIL GENZLINGER': 3, 'HELEN T. VERONGOS': 3, 'BEN KENIGSBERG': 3, 'MANOHLA DARGIS': 1, 'NICOLAS RAPOLD': 1})

In [326]:
sorted_bylines = (sorted(bylines))
numbers = range(40)
most_bylines = 0

for number in numbers:
    if most_bylines < sorted_bylines.count(sorted_bylines[number]):
        most_bylines = sorted_bylines.count(sorted_bylines[number])

for number in numbers:          
    if most_bylines == sorted_bylines.count(sorted_bylines[number]) and sorted_bylines[number] != sorted_bylines[number - 1]:
        print(sorted_bylines[number], sorted_bylines.count(sorted_bylines[number]))


GLENN KENNY 7
STEPHEN HOLDEN 7