Graded = 7/8

NYT

All API's: http://developer.nytimes.com/ Article search API: http://developer.nytimes.com/article_search_v2.json Best-seller API: http://developer.nytimes.com/books_api.json#/Documentation Test/build queries: http://developer.nytimes.com/

Tip: Remember to include your API key in all requests! And their interactive web thing is pretty bad. You'll need to register for the API key.


In [2]:
import config
import requests

#imports key from config file
nyt_articles_api = config.nyt_articles_api
nyt_books_api = config.nyt_books_api
nyt_movie_api = config.nyt_movie_api 


response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?api-key=' + nyt_articles_api)
data = response.json()
# print(data)

1) What books topped the Hardcover Fiction NYT best-sellers list on Mother's Day in 2009 and 2010? How about Father's Day?


In [61]:
published = "";

# response = requests.get('https://api.nytimes.com/svc/books/v3/lists//.json?api-key=' + nyt_books_api + "&list-name=hardcover-fiction&published-date=2009-10-05")

#mother's day 2009 - 10 - 05
# mother's day 2010 2010-09-05
# father's day 2009 -21-06
# father day 2010 - 20 -06

dates = ['2009-05-10', '2010-05-09', '2009-06-21', '2010-06-20']
for date in dates: 
    response = requests.get('https://api.nytimes.com/svc/books/v3/lists//.json?api-key=' + nyt_books_api + "&list-name=hardcover-fiction&published-date=" + date)
    bestseller_data = response.json()
    bestseller_data['results']
    results = bestseller_data['results'][0]
    # print(type(results))
    
    print("The best selling book on", date, "was", results['book_details'][0]['title'])
    


# print(bestseller_data)





#print(results['book_details'])


The best selling book on 2009-05-10 was FIRST FAMILY
The best selling book on 2010-05-09 was DELIVER US FROM EVIL
The best selling book on 2009-06-21 was SKIN TRADE
The best selling book on 2010-06-20 was THE GIRL WHO KICKED THE HORNET’S NEST

2) What are all the different book categories the NYT ranked in June 6, 2009? How about June 6, 2015?


In [85]:
response = requests.get('https://api.nytimes.com/svc/books/v3/lists/names.json?api-key=' + nyt_books_api)
bestseller_ldata = response.json()
bestseller_ldata['results'] 

# print(bestseller_ldata['results'][0])

#The lists 

print("On June, 6th, 2009 the NYT published the following bestsellers lists:")
for book in bestseller_ldata['results']:
    if book['oldest_published_date'] < '2009-06-06' and book['newest_published_date'] >= '2009-06-06':
        print(book['display_name'])
    else:
        pass
    
    
print("\nOn June, 6th, 2015 the NYT published the following bestsellers lists:")
for book in bestseller_ldata['results']:
    if book['oldest_published_date'] < '2015-06-06' and book['newest_published_date'] >= '2015-06-06':
        print(book['display_name'])
    else:
        pass
    # print("Too young")
# for book in bestseller_ldata:


On June, 6th, 2009 the NYT published the following bestsellers lists:
Hardcover Fiction
Hardcover Nonfiction
Paperback Trade Fiction
Paperback Mass-Market Fiction
Paperback Nonfiction
Hardcover Advice & Misc.
Paperback Advice & Misc.
Children’s Chapter Books
Children’s Paperback Books
Children’s Picture Books
Children’s Series
Hardcover Graphic Books
Paperback Graphic Books
Manga

On June, 6th, 2015 the NYT published the following bestsellers lists:
Combined Print & E-Book Fiction
Combined Print & E-Book Nonfiction
Hardcover Fiction
Hardcover Nonfiction
Paperback Trade Fiction
Paperback Mass-Market Fiction
Paperback Nonfiction
E-Book Fiction
E-Book Nonfiction
Advice, How-To & Miscellaneous
Children’s Middle Grade
Children’s Picture Books
Children’s Series
Young Adult
Hardcover Graphic Books
Paperback Graphic Books
Manga
Animals
Business
Celebrities
Crime and Punishment
Culture
Education
Espionage
Expeditions
Fashion, Manners and Customs
Food and Diet
Games and Activities
Health
Humor
Indigenous Americans
Love and Relationships
Parenthood and Family
Politics and American History
Race and Civil Rights
Religion, Spirituality and Faith
Science
Sports and Fitness
Travel

3) Muammar Gaddafi's name can be transliterated many many ways. His last name is often a source of a million and one versions - Gadafi, Gaddafi, Kadafi, and Qaddafi to name a few. How many times has the New York Times referred to him by each of those names?

Tip: Add "Libya" to your search to make sure (-ish) you're talking about the right guy.


In [239]:
ppl = ['Gaddafi','Gadafi', 'Kadafi','Qaddafi']

for person in ppl:
    # fq yields a lot more results than just q need to figure out difference b/w hits and times 
    # response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?api-key=' + nyt_articles_api + '&fq=' + person + ' Libya')
    response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?api-key=' + nyt_articles_api + '&q=' + person + ' Libya')
    muammar_data = response.json()
    print("Muammar was referred to as ", person, muammar_data['response']['meta']['hits'], "times in the New York Times.")
# print(muammar_data)


# print(muammar_data['response']['docs'])

# print(muammar_data['response']['docs'][0]['keywords'])
# print(muammar_data['response']['docs'][0])

# keywords = []
# ppl = ['Gaddafi','Gadafi', 'Kadafi','Qaddafi']
#for article in muammar_data['response']['docs']:
 #   for keyword in article['keywords']:
      #  print(keyword['value'])
  #      for person in ppl:
            #print(x)
   #         if person in keyword:
    #            print("print", keyword['value'], "was found")
     #           print(keyword['value'])
      #          keywords.append(keyword['value'])
        
#from collections import Counter 
#counts = Counter(keywords)  
#print(counts)


Muammar was referred to as  Gaddafi 1027 times in the New York Times.
Muammar was referred to as  Gadafi 0 times in the New York Times.
Muammar was referred to as  Kadafi 4 times in the New York Times.
Muammar was referred to as  Qaddafi 5687 times in the New York Times.

In [189]:
#for article in muammar_data['response']['docs']:
 #   print(article["keywords"])

In [190]:
# len(muammar_data['response']['docs'])

4) What's the title of the first story to mention the word 'hipster' in 1995? What's the first paragraph?


In [224]:
response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?api-key=' + nyt_articles_api + '&q=hipster&begin_date=19950101&end_date=19951231&sort=oldest')
hipster_data = response.json()
# print(hipster_data['response']['docs'])

hippie = hipster_data['response']['docs']

print("The first story to mention the word 'hipster' in 1995 was titled", hippie[0]['headline']['kicker'] + "; " + hippie[0]['headline']['main'])


The first story to mention the word 'hipster' in 1995 was titled SURFACING; SOUND

5) How many times was gay marriage mentioned in the NYT between 1950-1959, 1960-1969, 1970-1978, 1980-1989, 1990-2099, 2000-2009, and 2010-present?

Tip: You'll want to put quotes around the search term so it isn't just looking for "gay" and "marriage" in the same article.

Tip: Write code to find the number of mentions between Jan 1, 1950 and Dec 31, 1959.


In [294]:
#Ta-Stephan: Beause you added to the start and end date early, the 1950s weren't counted.

start_date = 19500101  
end_date = 19591231

for n in [1,2,3,4,5,6]:
    if (n <= 5):
        start_date = start_date + 100000
        end_date = end_date + 100000
    else:
        start_date = start_date + 100000
        end_date = 20160609
    
    response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?api-key=' + nyt_articles_api + '&q="\"gay marriage\""&begin_date=' + str(start_date) + '&end_date=' + str(end_date) + '&sort=oldest')
    gay_marriage_data = response.json()  
    gay_marriage_hits = gay_marriage_data['response']['meta']['hits']

    
    start_str = str(start_date)
    start_str = start_str[:4]
    
    end_str = str(end_date)
    end_str = end_str[:4]
    print("There were", gay_marriage_hits, "mentions of gay marriage between", start_str, "and", end_str)


There were 485 mentions of gay marriage between 1960 and 1969
There were 450 mentions of gay marriage between 1970 and 1979
There were 368 mentions of gay marriage between 1980 and 1989
There were 1102 mentions of gay marriage between 1990 and 1999
There were 4858 mentions of gay marriage between 2000 and 2009
There were 8610 mentions of gay marriage between 2010 and 2016

6) What section talks about motorcycles the most?

Tip: You'll be using facets


In [298]:
response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?api-key=' + nyt_articles_api + '&q=motorcycle&facet_field=section_name')

moto_data = response.json()
# print(moto_data['response']['facets']['section_name']['terms'])

# documentation found re: facets in NYT API
# https://data-gov.tw.rpi.edu/wiki/How_to_use_New_York_Times_Article_Search_API

moto_sections = moto_data['response']['facets']['section_name']['terms']

moto_count = 0
most_motos = ""

for section in moto_sections:
    if section['count'] > moto_count:
        moto_count = section['count']
        most_motos = section['term']
        
print("The section of the New York Times that mentions motorcycles the most is the", most_motos, "section which mentions motorcycles", moto_count, "times.")


The section of the New York Times that mentions motorcycles the most is the World section which mentions motorcycles 1738 times.

7) How many of the last 20 movies reviewed by the NYT were Critics' Picks? How about the last 40? The last 60?

Tip: You really don't want to do this 3 separate times (1-20, 21-40 and 41-60) and add them together. What if, perhaps, you were able to figure out how to combine two lists? Then you could have a 1-20 list, a 1-40 list, and a 1-60 list, and then just run similar code for each of them.


In [46]:
criticPickCount = 0
for offset in [0,1,2,3]:
    offset = offset * 20
       # print(offset)

    response = requests.get('https://api.nytimes.com/svc/movies/v2/reviews/search.json?api-key=' + nyt_movie_api + '&offset=' + str(offset))

    
    movie_data = response.json()

    # print(movie_data)

    
    # print(movie_data['results'])

    for movie in movie_data['results']:
          if movie['critics_pick'] == 1:
           # print(movie['display_title'])
            criticPickCount = criticPickCount + 1
            
    if offset == 0:
          print("There were", criticPickCount, "Critic' Picks in the last 20 movies that were reviewed.")
        
    if offset == 20:
          print("There were", criticPickCount, "Critic' Picks in the last 40 movies that were reviewed.")
        
    if offset == 40:
          print("There were", criticPickCount, "Critic' Picks in the last 60 movies that were reviewd.")
        
    if offset == 60:
          print("There were", criticPickCount, "Critic' Picks in the last 80 movies that were reviewed.")

    # print("There were", criticPickCount, "Critic' Picks.")


There were 6 Critic' Picks in the last 20 movies that were reviewed.
There were 15 Critic' Picks in the last 40 movies that were reviewed.
There were 20 Critic' Picks in the last 60 movies that were reviewd.
There were 30 Critic' Picks in the last 80 movies that were reviewed.

8) Out of the last 40 movie reviews from the NYT, which critic has written the most reviews?


In [34]:
for offset in [0,1,2]:
    offset = offset * 20
    # print(offset)

    response = requests.get('https://api.nytimes.com/svc/movies/v2/reviews/search.json?api-key=' + nyt_movie_api + '&offset=' + str(offset))

    
    movie_data = response.json()

    # print(movie_data)
    criticPickCount = 0
    authors = []
    
   # print(movie_data['results'])

#the critics name is stored in the byline
    for movie in movie_data['results']:
        authors.append(movie['byline'])
       # print(movie['byline'])
            

                    
from collections import Counter 
counts = Counter(authors)  
# print(counts)
print(Counter(authors).most_common(1) , 'has written the most reviews out of the last 40 NYT reviews.')


[('STEPHEN HOLDEN', 4)] has written the most reviews out of the last 40 NYT reviews.

In [ ]: