In [23]:

    
import urllib2
from collections import namedtuple
import datetime
from bs4 import BeautifulSoup
import re
import pandas as pd
import numpy as np
import networkx as nx
import itertools
import pickle as pickle
import time
from collections import Counter
import operator

New York Social Diary provides a fascinating lens onto New York's socially well-to-do. The data forms a natural social graph for New York's social elite. As shown in this report of a recent holiday party, almost all the photos have annotated captions labeling their subjects. We can think of this as implicitly implying a social graph: there is a connection between two individuals if they appear in a picture together.

In this project, I investigate these connections between the NYC elite.

There are two steps -- gathering the data and analyzing it.

(1) To gather the data, I grab all the relevant photo-captions and save them; and then parse them to retrieve relevant information.

(2) To analyze the data, I consider the problem in terms of a network or a graph. Any time a pair of people appear in a photo together, that is considered a link. This is an (undirected) multigraph with no self-loops, and has an obvious analog in terms of an undirected weighted graph.

GATHERING THE DATA

The first step is to gather the data. I want photos from parties before December 1st, 2014. This link contains a list of (party) pages. For each party, I find the url, and grab all the photocaptions.

(1) As you can see, the url changes are consistent for each party. There is the base url, followed by the year, followed by the party name, with dashes in place of spaces

(2) I use python's datetime.strptime function to parse the dates.



In [1]:

    
max_date = "14/12/01"
max_pages = 25 #actually 24 but just in case
url_base = "http://www.newyorksocialdiary.com/"
url_page_call = "party-pictures?page="

cutofftime=datetime.datetime.strptime(max_date, '%y/%m/%d')
PicBasic = namedtuple('PicBasic', 'url, dateinfo')

def span_info(span):
    urldata = span.select('span.field-content > a')
    datedata = span.select('span.views-field-created > span.field-content')
    if len(urldata)!=1 or len(datedata)!=1:
        print "Uh oh! We did something wrong"
        return None
    return PicBasic (
            url   = urldata[0]['href'],
            dateinfo = datetime.datetime.strptime(datedata[0].text, '%A, %B %d, %Y')
            )

urladdons=[]
for i in range(max_pages):
    pageno=i+1
    url = url_base + url_page_call + str(pageno)
    raw_page = urllib2.urlopen(url).read()
    soup = BeautifulSoup(raw_page)
    t2spans=soup.select('div.views-row')
    span_links=[span_info(span) for span in t2spans]
    url_links = [datapt.url for datapt in span_links if datapt.dateinfo<cutofftime]
    urladdons.extend(url_links) #finds the add-on links

print(len(urladdons)) #number of party pages
print(urladdons[0]) #add-on url for the first (last chronologically) party we identified

all_pic_captions = []
max_parties = len(urladdons)

def has_class_and_face(tag):
    return not tag.has_attr('color') and tag.has_attr('face')

for j in range(max_parties):   
    try:
        soup = BeautifulSoup(urllib2.urlopen(url_base + urladdons[j]))
    except: #sometimes the webpage is not responsive; therefore it is necessary to have except statements
        try:
            soup = BeautifulSoup(urllib2.urlopen(url_base + urladdons[j]))
        except:
            try:
                soup = BeautifulSoup(urllib2.urlopen(url_base + urladdons[j]))
            except:
                pass
    for a in soup.find_all(class_ = "photocaption"):
        try:
            names_with_white = str(a.get_text())
            names = names_with_white.lstrip()
            all_pic_captions.append(names)
        except:
            pass









    



1192
/party-pictures/2014/gala-guests



In [14]:

    
#TAKE OUT THE PHOTOGRAPHER
print(len(all_pic_captions))
all_pic_captions = [caption for caption in all_pic_captions if not re.search(r'^Photographs by ',caption)]
print(len(all_pic_captions))



In [17]:

    
###### SAVE AS PICKLE DATAFRAME FILE ###############
####################################################
print(all_pic_captions[0])
df=pd.DataFrame(all_pic_captions, columns=['all_pic_captions'])
df.to_pickle('captions2.pickle')
print(len(all_pic_captions))









    



Les Lieberman, Barri Lieberman, Isabel Kallman, Trish Iervolino, and Ron Iervolino 
132131

PARSING THE DATA

Now comes the parsing part.

Some captions are not useful: they contain long narrative texts that explain the event. We have to find some heuristic rules to separate captions that are a list of names from those that are not. A few heuristics include:
- Looking for sentences (which have verbs) and as opposed to lists of nouns.
- Looking for commonly repeated threads (e.g. you might end up picking up the photo credtis).
- Long captions are often not lists of people.
I separate the captions based on various forms of punctuation.
This site is pretty formal and likes to say things like "Mayor Michael Bloomberg" after his election but "Michael Bloomberg" before his election. There are many titles such as Mayor, CEO, etc, that need to be filtered out.



In [4]:

    
############### OPEN SAVED PICKLE FILE #################
###########RUN FROM HERE IF DOING PREVIOUS ANALYSIS#####

df=pd.io.pickle.read_pickle('captions2.pickle')
allcaptions=df['all_pic_captions']

###USE ONLY CAPTIONS UNDER SOME SUBJECTIVE CHARACTER LENGTH
subjective_cutoff = 250
smallcaps=[caption for caption in allcaptions if len(caption)<subjective_cutoff]
len(smallcaps)









    Out[4]:





131991



In [4]:

    
####IDENTIFY VERBS ##########

dfiltered=pd.DataFrame(smallcaps, columns=['smallcaps'])

capwords = [[re.sub(r'[^\w\-\s]','',word) for word in document.split()]
          for document in smallcaps]

def extractverbcaps(words):
    #function extracts nouns from a tokenized list of unigrams
    twords=pos_tag(words)
    vtags = ['VB','VBD','VBG','VBN','VBP','VBZ']
    stopvwords=['van','left','right','honoree','de','host','dressed']
    verbpresent=0
    for word in twords:
        if word[1] in vtags:
            if not word[0].istitle(): #only check for upper-case words
                if word[0] not in stopvwords:
                    verbpresent=1        
    return verbpresent

verbpresent=[extractverbcaps(caption) for caption in capwords]
dfiltered['verbpresent']=verbpresent
dfiltered['tokenized']=capwords
dfiltered.to_pickle('filteredcaptions.pickle') #saving to pickle file
filteredcaps=dfiltered[dfiltered['verbpresent'] == 0]['smallcaps']

#####GETTING RID OF HONORIFICS, ETC###

filteredcaps2= [re.sub(r'[(][a-zA-Z]+[)]','', caps) for caps in filteredcaps] #getting rid of everything inside brackets
hwords1=['Mr. ','Guest',' M.D.','PhD','Ph.D.',' Jr.',' Sr.','Mrs. ','Miss ','Doctor ','Dr. ','Dr ','Chair ','CEO ','the Honorable ','Mayor ','Prince ','Baroness ', 'Princess ', 'Honorees ', 'Honoree',' MD']
hwordsp=['Museum President ','Chief Curator ','Frick Director ','Police Commissioner ','Music Director ','Frick Trustee ','Historic Hudson Valley Trustee ', 'Museum President ','Public Theater Artistic Director ','Public Theater Executive Director ','Executive Director ','Cooper Union President ','The Hon. ','Dancing Chair ','Director Emerita ']
hwords2=['Hon. ','Lord ','Senator ','Deputy ','Director ','Dean ','Actor ','Actress ',' Esq.', 'Gov ','Governor ','Father ','Congresswoman ','Congressman ', 'Countess ','Awardee ','Chairman ','Commissioner ','Lady ','Ambassador ','President ','CEO ']
hwords=hwordsp+hwords1+hwords2
hwords = '|'.join(list(set(hwords)))
filteredcaps2= [re.sub(r'^\s+|\s+$','', caps) for caps in filteredcaps2]
filteredcaps2= [re.sub(hwords,'', caps) for caps in filteredcaps2]



In [15]:

    
##########REPLACING COUPLES###########
#On investigation, we find that there are a lot of couple names -- i.e. Mary and John Drew.
#To parse these, we need to have it in a "Mary Drew and John Drew" format.

newnames=[]
countno=0
capstring="([A-Z][a-z]+)\s+and\s+([A-Z][a-z]+)\s+([A-Z][a-z]+)" #string for Kelly and Tom Monro forex
begstring="^%s" % capstring #string if it appears in the beginning
andstring="\\s+and\\s+%s" % capstring
withstring="\\s+with\\s+%s" % capstring
otherstring="\\s+[a-z]+\\s+%s" % capstring

def findingpairs(xlistno):
    namestr=[]
    for names in xlistno:
        nstr = names[0] + " " + names[2] + " and " + names[1] + " " + names[2]
        namestr.append(nstr)
    return(', '.join(namestr))

for xnames in filteredcaps2:
    xlistno2=re.search(otherstring,xnames)
    xlistno=re.search(begstring,xnames)
    if xlistno2:
        xno= re.findall(capstring,xnames)
        if len(xno)>1:
            xn=findingpairs(xno)
            newnames.append(xn)
        else:
            newstring=xlistno2.group(1)+ " " + xlistno2.group(3) + " and " + xlistno2.group(2) + " " + xlistno2.group(3)
            newnames.append(re.sub(capstring, newstring, xnames))
    elif xlistno:       
        xno= re.findall(capstring,xnames)
        if len(xno)>1:
            xn=findingpairs(xno)
            newnames.append(xn)
        else:
            newstring=xlistno.group(1)+ " " + xlistno.group(3) + " and " + xlistno.group(2) + " " + xlistno.group(3)
            newnames.append(re.sub(capstring, newstring, xnames))
    else:
        newnames.append(xnames)

print(len(newnames))
print("\n WITHOUT REPLACING COUPLES \n")
print(filteredcaps2[30:50])
print("\n REPLACING COUPLES \n")
print(newnames[30:50])









    



130101

 WITHOUT REPLACING COUPLES 

['Melissa Errico, Todd Hollander, and Natalia Bulgari ', 'George Lichtblau, Anne Lichtblau, and Sig Ackerman ', 'Daniel Murphy, Deann Murphy, Jessica Farrell, and Ken Farrell ', 'Jonny Podell ', 'Ashley rmott ', 'Caroline Dean ', 'Jena Silverman, John Silverman, Tess Silverman, and Dara Silverman ', 'Jaime Gleicher and Jared Seligman ', 'Fred Feiner, Ricki Feiner, Sandie Greene, and Bob Greene ', 'Kelly Mulderry, Ted Murphy, and Marci Murphy ', 'Michael Cominotto, Gigi Grimstad, and Dennis Basso ', 'Mark Bessler, Andrea Ziegelman, and Alex Figueroa ', 'Dana and Jamie Creel ', 'Dana Taylor, Baird Ryan, and Alexia Hamm Ryan ', 'Kurt Henckels and Lance Lundeberg ', 'William Candelaria, Mark Brendel, Yaz Hernandez, Dennis Basso, and Giulia Caltagirone ', 'Guy Robinson, Elizabeth Stribling, and Fernanda Kellogg ', 'Eleanor Noell ', 'Melissa Errico ', 'Outside the tent ']

 REPLACING COUPLES 

['Melissa Errico, Todd Hollander, and Natalia Bulgari ', 'George Lichtblau, Anne Lichtblau, and Sig Ackerman ', 'Daniel Murphy, Deann Murphy, Jessica Farrell, and Ken Farrell ', 'Jonny Podell ', 'Ashley rmott ', 'Caroline Dean ', 'Jena Silverman, John Silverman, Tess Silverman, and Dara Silverman ', 'Jaime Gleicher and Jared Seligman ', 'Fred Feiner, Ricki Feiner, Sandie Greene, and Bob Greene ', 'Kelly Mulderry, Ted Murphy, and Marci Murphy ', 'Michael Cominotto, Gigi Grimstad, and Dennis Basso ', 'Mark Bessler, Andrea Ziegelman, and Alex Figueroa ', 'Dana Creel and Jamie Creel ', 'Dana Taylor, Baird Ryan, and Alexia Hamm Ryan ', 'Kurt Henckels and Lance Lundeberg ', 'William Candelaria, Mark Brendel, Yaz Hernandez, Dennis Basso, and Giulia Caltagirone ', 'Guy Robinson, Elizabeth Stribling, and Fernanda Kellogg ', 'Eleanor Noell ', 'Melissa Errico ', 'Outside the tent ']



In [16]:

    
## FURTHER PARSING TO GET IN LIST OF NAMES FORMAT ##

newnames2 = [re.split(r',\s+and\s+|,\s+with\s+|;\s|\s+and\s+|\s+amd\s+|,\s|\s+with\s+',mylistentries) for mylistentries in newnames]            
nameslist = [[word for word in caps if word !='']
             for caps in newnames2]
nameslist=[[re.sub(r'\s+$|^\s+|\s+\n|\n\s+|\n','', caps) for caps in names]
       for names in nameslist]
nameslist=[names for names in nameslist if len(names)>1]
nameslist=[[caps for caps in names if names[0].istitle()]
       for names in nameslist]
stopwords=['friend','her daughter','President','CEO','Hospital for Special Surgery', 'a friend','NYU','son','sons','wife','dean','daughters','friends','guest','Guest','children','daughter','his wife','squires','guests','family','left','right','presents','welcomes','honoree','host']
nameslist = [[names for names in nameinds if names not in stopwords]
          for nameinds in nameslist]
print(nameslist[30:50])









    



[['Kelly Mulderry', 'Ted Murphy', 'Marci Murphy'], ['Michael Cominotto', 'Gigi Grimstad', 'Dennis Basso'], ['Mark Bessler', 'Andrea Ziegelman', 'Alex Figueroa'], ['Dana Creel', 'Jamie Creel'], ['Dana Taylor', 'Baird Ryan', 'Alexia Hamm Ryan'], ['Kurt Henckels', 'Lance Lundeberg'], ['William Candelaria', 'Mark Brendel', 'Yaz Hernandez', 'Dennis Basso', 'Giulia Caltagirone'], ['Guy Robinson', 'Elizabeth Stribling', 'Fernanda Kellogg'], ['Jenny Price', 'Sharon Jacob', 'Stephanie Shuman', 'Gillian Miniter', 'Deborah Roberts'], ['Cynthia Conway', 'Tom Kempner', 'Luann Blowers', 'David Blowers'], ['Michael Price', 'Jenny Price'], ['Bob Cochran', 'Suzanne Cochran'], ['Patsy Tarr', 'Jeff Tarr'], ['Fred Shuman', 'Stephanie Shuman'], ['Gillian Miniter', 'Sylvester Miniter'], ['Iffie Okoronkwo Aitkenhead', 'Agenia Clark', 'Deborah Roberts', 'Fiona Rudin'], ['Michael Evans', 'Lise Evans'], ['Fe Fendi', 'Alessandro Fendi'], ['Jill Ross', 'Sharon Teles', 'Eby McKay'], ['Kathryn Chenault', 'Carol Sutton Lewis']]

ANALYZING THE DATA

(1) A simple question we can ask is 'who is the most popular'? The easiest way to answer this question is to look at how many connections everyone has -- returning the top 100 people and their degree.

(2) A similar way to determine popularity is to look at their pagerank. Pagerank is essentially the stationary distribution of a markov chain implied by the social graph.

(3) Another interesting question is who tend to co-occur with each other. We might even be able to use this analysis to detect instances of affairs and infidelities!



In [24]:

    
## FIRST WE ENTER THE DATA INTO GRAPH FORMAT (i.e., containing edges and nodes)

def joinlists(listname):
    #function for joining list of lists (i.e. from x=([['a','b'],['c']]) to x=['a','b','c'])
    #want only unique values for each document, so:
    uniquelitems = [set(listitems) for listitems in listname]
    newlist=list(itertools.chain.from_iterable(uniquelitems))
    return newlist

tot_allwords=joinlists(nameslist)
uniquenames=list(set(tot_allwords))

edgelists = [sorted(captions) for captions in nameslist]
edgelists = [itertools.combinations(captions,2) for captions in edgelists]
edgelists = [list(captions) for captions in edgelists]
edgelists = sum(edgelists, [])

uniquedges=list(set(edgelists))

xedges=Counter(edgelists)
edgecounts=[xedges[namestr] for namestr in uniquedges]

wedges=[0]*len(edgecounts)
for i in range(len(edgecounts)):
    wedges[i]=(uniquedges[i][0],uniquedges[i][1],edgecounts[i])
    
G=nx.Graph()
G.add_nodes_from(uniquenames)
G.add_weighted_edges_from(wedges)



In [30]:

    
## 1: DETERMINING MOST POPULAR NAMES THAT APPEAR IN THE PHOTO CAPTIONS

deg_names = G.degree(weight='weight')
deg_names_sorted = sorted(deg_names.items(), key=operator.itemgetter(1),reverse=True)
deg_half = [(d[0],d[1]/2) for d in deg_sort]
print("MOST POPULAR PEOPLE IN THE NYC SOCIAL SCENE \N")
for x in range(100):
    print(deg_sort[x])









    



MOST POPULAR PEOPLE IN THE NYC SOCIAL SCENE \N
('Jean Shafiroff', 346)
('Gillian Miniter', 309)
('Mark Gilbertson', 298)
('Andrew Saffir', 204)
('Geoffrey Bradfield', 200)
('Alexandra Lebenthal', 196)
('Somers Farkas', 190)
('Debbie Bancroft', 166)
('Jamee Gregory', 166)
('Yaz Hernandez', 154)
('Sharon Bush', 153)
('Eleanora Kennedy', 148)
('Sylvester Miniter', 143)
('Mario Buatta', 141)
('Barbara Tober', 141)
('Alina Cho', 136)
('Kamie Lightburn', 131)
('Bonnie Comley', 126)
('Muffie Potter Aston', 125)
('Daniel Benedict', 124)
('Bettina Zilkha', 122)
('Liliana Cavendish', 119)
('Amy Fine Collins', 112)
('Barbara Regna', 111)
('Grace Meigher', 109)
('Lucia Hwong Gordon', 109)
('Michael Bloomberg', 106)
('Amy Hoadley', 101)
('Allison Aston', 101)
('Margo Langenberg', 100)
('Stewart Lane', 99)
('Liz Peek', 98)
('Roric Tobin', 97)
('Nicole Miller', 97)
('Dennis Basso', 96)
('Jennifer Creel', 95)
('R. Couri Hay', 93)
('Peter Gregory', 90)
('Donna Karan', 90)
('Fe Fendi', 89)
('Janna Bullock', 89)
('Lydia Fenet', 88)
('Deborah Norville', 87)
('Martha Stewart', 87)
('Felicia Taylor', 85)
('Coco Kopelman', 84)
('Audrey Gruss', 84)
('Lizzie Tisch', 83)
('John Demsey', 82)
('Karen Klopp', 81)
('Jonathan Farkas', 81)
('Wendy Carduner', 81)
('Frederick Anderson', 80)
('Diana Taylor', 80)
('Wilbur Ross', 80)
('Hilary Geary Ross', 80)
('Cynthia Lufkin', 79)
('Hunt Slonem', 78)
('Douglas Hannant', 78)
('Russell Simmons', 77)
('Tory Burch', 77)
('Fernanda Kellogg', 77)
('Evelyn Lauder', 75)
('Ann Rapp', 75)
('Alexandra Lind Rose', 74)
('Patricia Shiah', 74)
('Alec Baldwin', 74)
('Michele Gerber Klein', 73)
('Kipton Cronkite', 73)
('Elizabeth Stribling', 73)
('Leonard Lauder', 73)
('Fern Mallis', 72)
('Jonathan Tisch', 72)
('Donald Tober', 72)
('Anka Palitz', 72)
('Nina Griscom', 71)
('Adelina Wong Ettelson', 71)
('Lisa Anastos', 71)
('Campion Platt', 70)
('Richard Johnson', 70)
('Mary Snow', 70)
('Steven Stolman', 70)
('Alison Minton', 69)
('Susan Shin', 69)
('Julia Koch', 69)
('Heather Leeds', 68)
('Zani Gugelmann', 68)
('Michael Kennedy', 68)
('Melissa Berkelhammer', 68)
('Melissa Morris', 67)
('Michele Herbert', 67)
('Martin Shafiroff', 67)
('Peter Davis', 66)
('Carol Mack', 66)
('Mary Davidson', 66)
('Mary Van Pelt', 66)
('Tinsley Mortimer', 65)
('Christine Schwarzman', 65)
('David Koch', 65)
('Cassandra Seidenfeld', 65)



In [21]:

    
## 2: DETERMINING TOP 100 MOST INFLUENTIAL PEOPLE IN THE NYC SOCIAL SCENE

pagerankout=[]
pgpop=nx.pagerank(G, alpha=0.85, personalization=None, max_iter=100)
highest = sorted(pgpop, key=pgpop.get, reverse=True)

for eachname in highest[0:100]:
    x=(eachname,pgpop[eachname])
    pagerankout.append(x)

#saving to pickle file
output = open('../../miniprojects/questions/pagerank2.pickle','w')
pickle.dump(pagerankout,output)
output.close()

#sorted_by_pagerank=pickle.load(open('../../miniprojects/questions/pagerank2.pickle'))
print("MOST INFLUENTIAL PEOPLE IN THE NYC SOCIAL SCENE: \n")
#for sorted_people in sorted_by_pagerank:
for sorted_people in pagerankout:
    print(sorted_people)









    



MOST INFLUENTIAL PEOPLE IN THE NYC SOCIAL SCENE: 

('Jean Shafiroff', 0.0006680686790028176)
('Mark Gilbertson', 0.0005077194888412853)
('Gillian Miniter', 0.00043659950484159607)
('Geoffrey Bradfield', 0.00036226295504544)
('Andrew Saffir', 0.0003462215665377449)
('Alexandra Lebenthal', 0.00033723551290372147)
('Sharon Bush', 0.00031230979335758653)
('Yaz Hernandez', 0.00030702334845351704)
('Somers Farkas', 0.0003005394065802728)
('Mario Buatta', 0.0002959690065681298)
('Debbie Bancroft', 0.00028454491381753913)
('Eleanora Kennedy', 0.0002823685240168033)
('Kamie Lightburn', 0.00027685818151433465)
('Barbara Tober', 0.0002725415737560136)
('Alina Cho', 0.00026386352073388665)
('Michael Bloomberg', 0.0002589816846743657)
('Bonnie Comley', 0.0002516745738431008)
('Lucia Hwong Gordon', 0.00024987602625568724)
('Liliana Cavendish', 0.0002383901042492601)
('Christopher Hyland', 0.00023245754368317318)
('Jamee Gregory', 0.00022636685338120333)
('Muffie Potter Aston', 0.00022235854477034835)
('Martha Stewart', 0.00021902650189749152)
('Bettina Zilkha', 0.00021548717276869715)
('Lydia Fenet', 0.00020891822839858403)
('Amy Fine Collins', 0.00020607001268067306)
('Stewart Lane', 0.00019883507675542297)
('Russell Simmons', 0.00019802569726042074)
('Daniel Benedict', 0.0001949245621619272)
('Diana Taylor', 0.00019298611569804107)
('Allison Aston', 0.00019294247860226677)
('Fernanda Kellogg', 0.0001927206331957139)
('Kipton Cronkite', 0.0001871287254486088)
('Evelyn Lauder', 0.00018616341752590565)
('Barbara Regna', 0.0001843308394930202)
('Karen Klopp', 0.00018213138936168635)
('Margo Langenberg', 0.00018117065144171237)
('Donna Karan', 0.00018112227009216478)
('Steven Stolman', 0.00018062099166177376)
('Grace Meigher', 0.00017959073080158245)
('Elizabeth Stribling', 0.00017673695268068394)
('Dawne Marie Grannum', 0.00017648339904248147)
('Dennis Basso', 0.00017600181187875388)
('Leonard Lauder', 0.0001753201700144135)
('Liz Peek', 0.00017370700719647236)
('Michele Herbert', 0.00017327635659803497)
('Amy Hoadley', 0.00017324772281271408)
('Roric Tobin', 0.0001719893596780435)
('Anka Palitz', 0.00017182286485233932)
('Janna Bullock', 0.00017135674132351603)
('Rosanna Scotto', 0.00016952556970730492)
('R. Couri Hay', 0.00016920241659900326)
('Felicia Taylor', 0.00016895255094268334)
('Deborah Norville', 0.00016881388184446081)
('Nicole Miller', 0.0001670932844622423)
('Alec Baldwin', 0.00016441264939403693)
('Audrey Gruss', 0.00016297902526761748)
('Jennifer Creel', 0.00016245524993493195)
('Annette Rickel', 0.00015941303411115087)
('Paula Zahn', 0.00015879578171048282)
('Fern Mallis', 0.00015741360333393936)
('Michele Gerber Klein', 0.00015631576057699046)
('Fe Fendi', 0.00015617102143334713)
('Tory Burch', 0.00015582118204567142)
('Lisa Anastos', 0.000154181667443866)
('Richard Johnson', 0.00015237315076600368)
('Agnes Gund', 0.00015217060292409824)
('Sylvester Miniter', 0.00015179142904707288)
('Coco Kopelman', 0.00015030581458566337)
('Tinsley Mortimer', 0.00014866621135185547)
('Georgina Schaeffer', 0.0001482253630423582)
('Mary Van Pelt', 0.0001465700067103552)
('Bette Midler', 0.00014586426288570796)
('John Demsey', 0.0001456267355440231)
('Chuck Scarborough', 0.00014456396989738802)
('Adelina Wong Ettelson', 0.00014434272947252462)
('Cassandra Seidenfeld', 0.00014313529172450476)
('Wendy Carduner', 0.00014124462501137862)
('Lizzie Tisch', 0.00014052015383866844)
('Margo Catsimatidis', 0.00013865172330918503)
('Patricia Shiah', 0.0001386446942319718)
('Barbara Walters', 0.00013795329999782594)
('Susan Shin', 0.00013764057137683833)
('Kristian Laliberte', 0.00013748394874709126)
('Pamela Fiori', 0.00013719499957558134)
('Heather Leeds', 0.00013704950453041138)
('Mary Davidson', 0.00013654426822036177)
('Fabiola Beracasa', 0.000135056647210653)
('Melissa Berkelhammer', 0.00013284832536515127)
('Nathalie Kaplan', 0.0001324303067030337)
('Gregory Long', 0.00013195622699712294)
('Bettina Prentice', 0.00013153452241230942)
('Marc Rosen', 0.00013118671814301306)
('Edward Callaghan', 0.0001309615769931228)
('Cynthia Lufkin', 0.00013093269112924612)
('Hunt Slonem', 0.0001303575535510108)
('Tina Brown', 0.00013015032931930074)
('Amy Phelan', 0.00012999603775848927)
('John Wegorzewski', 0.00012995965758374122)
('Deborah Roberts', 0.00012983077119543116)



In [29]:

    
## 3: DETERMINING TOP CONNECTIONS

edge_list = G.edges(data=True)
edge_sort = sorted(edge_list, key=operator.itemgetter(2),reverse=True)
edge_mod = [((a[0],a[1]),a[2]['weight']) for a in edge_sort]
print("People that appear together most frequently in pictures: \n")
for x in range(100):
    print(edge_mod[x])









    



People that appear together most frequently in pictures: 

(('Sylvester Miniter', 'Gillian Miniter'), 123)
(('Bonnie Comley', 'Stewart Lane'), 82)
(('Peter Gregory', 'Jamee Gregory'), 77)
(('Andrew Saffir', 'Daniel Benedict'), 68)
(('Geoffrey Bradfield', 'Roric Tobin'), 66)
(('Donald Tober', 'Barbara Tober'), 58)
(('Somers Farkas', 'Jonathan Farkas'), 57)
(('Jean Shafiroff', 'Martin Shafiroff'), 56)
(('Eleanora Kennedy', 'Michael Kennedy'), 50)
(('Jay Diamond', 'Alexandra Lebenthal'), 49)
(('Campion Platt', 'Tatiana Platt'), 48)
(('Yaz Hernandez', 'Valentin Hernandez'), 48)
(('Jonathan Tisch', 'Lizzie Tisch'), 45)
(('Grace Meigher', 'Chris Meigher'), 43)
(('Melissa Morris', 'Chappy Morris'), 43)
(('Barbara Regna', 'Peter Regna'), 42)
(('Deborah Norville', 'Karl Wellner'), 40)
(('Margo Catsimatidis', 'John Catsimatidis'), 40)
(('Hilary Geary Ross', 'Wilbur Ross'), 35)
(('Elizabeth Stribling', 'Guy Robinson'), 35)
(('Frederick Anderson', 'Douglas Hannant'), 34)
(('David Koch', 'Julia Koch'), 34)
(('Coco Kopelman', 'Arie Kopelman'), 32)
(('R. Couri Hay', 'Janna Bullock'), 31)
(('Clo Cohen', 'Charles Cohen'), 30)
(('Fernanda Kellogg', 'Kirk Henckels'), 30)
(('Michael Cominotto', 'Dennis Basso'), 29)
(('Leonel Piraino', 'Nina Griscom'), 28)
(('Sessa von Richthofen', 'Richard Johnson'), 27)
(('Olivia Palermo', 'Johannes Huebl'), 27)
(('Chuck Scarborough', 'Ellen Scarborough'), 26)
(('Dan Lufkin', 'Cynthia Lufkin'), 26)
(('Arlene Dahl', 'Marc Rosen'), 25)
(('Tommy Hilfiger', 'Dee Ocleppo'), 25)
(('Mark Badgley', 'James Mischka'), 25)
(('Melania Trump', 'Donald Trump'), 25)
(('Liz Peek', 'Jeff Peek'), 24)
(('David Lauren', 'Lauren Bush'), 23)
(('Alina Cho', 'John Demsey'), 23)
(('Donna Soloway', 'Richard Soloway'), 23)
(('John Wambold', 'Melanie Wambold'), 23)
(('Wilbur Ross', 'Hilary Ross'), 23)
(('Judy Gilbert', 'Rod Gilbert'), 22)
(('Al Roker', 'Deborah Roberts'), 22)
(('Jean Shafiroff', 'Patricia Shiah'), 22)
(('Othon Prounis', 'Kathy Prounis'), 22)
(('Sherrell Aston', 'Muffie Potter Aston'), 22)
(('Stephanie Krieger', 'Brian Stewart'), 21)
(('Sharon Bush', 'Jean Shafiroff'), 21)
(('Marvin Davidson', 'Mary Davidson'), 21)
(('Diana Taylor', 'Michael Bloomberg'), 21)
(('Anna Safir', 'Eleanora Kennedy'), 21)
(('Richard Steinberg', 'Renee Steinberg'), 20)
(('Hunt Slonem', 'Liliana Cavendish'), 20)
(('Laura Slatkin', 'Harry Slatkin'), 20)
(('Elaine Langone', 'Ken Langone'), 20)
(('Coleman Burke', 'Susan Burke'), 19)
(('Keytt Lundqvist', 'Alex Lundqvist'), 19)
(('Naeem Khan', 'Ranjana Khan'), 19)
(('Rick Hilton', 'Kathy Hilton'), 19)
(('Isabel Toledo', 'Ruben Toledo'), 19)
(('Somers Farkas', 'Muffie Potter Aston'), 19)
(('Heather Leeds', 'Tom Leeds'), 18)
(('Tony Ingrao', 'Randy Kemper'), 18)
(('Bobby Zarin', 'Jill Zarin'), 18)
(('Howard Sobel', 'Gayle Sobel'), 18)
(('Marcia Mishaan', 'Richard Mishaan'), 17)
(('Alexandra Lebenthal', 'Claudia Lebenthal'), 17)
(('Marcy Warren', 'Michael Warren'), 17)
(('Serena Miniter', 'Gillian Miniter'), 17)
(('Cece Black', 'Lee Black'), 17)
(('Geoffrey Thomas', 'Sharon Sondes'), 17)
(('Gillian Hearst Simonds', 'Christian Simonds'), 17)
(('Nicole Miller', 'Kim Taipale'), 17)
(('Thorne Perkin', 'Tatiana Perkin'), 17)
(('Bunny Williams', 'John Rosselli'), 16)
(('Chuck Royce', 'Deborah Royce'), 16)
(('Harry Kargman', 'Jill Kargman'), 16)
(('Mario Singer', 'Ramona Singer'), 16)
(('Ann Rapp', 'Roy Kean'), 16)
(('Marisol Thomas', 'Rob Thomas'), 16)
(('Shirin von Wulffen', 'Frederic Fekkai'), 16)
(('Philip Gorrivan', 'Lisa Gorrivan'), 16)
(('Daniel Benedict', 'Johannes Huebl'), 16)
(('Robert Bradford', 'Barbara Taylor Bradford'), 16)
(('Debbie Bancroft', 'Tiffany Dubin'), 15)
(('Arnie Rosenshein', 'Paola Rosenshein'), 15)
(('Charlotte Ronson', 'Ali Wise'), 15)
(('Whitney Fairchild', 'James Fairchild'), 15)
(('Roxanne Palin', 'Dean Palin'), 15)
(('Alexandra Lebenthal', 'Gillian Miniter'), 15)
(('Anna Wintour', 'Bee Shaffer'), 15)
(('Matt Semino', 'Linette Semino'), 15)
(('Mary Snow', 'Ian Snow'), 15)
(('Will Cotton', 'Rose Dergan'), 15)
(('Susan Magazine', 'Nicholas Scoppetta'), 15)
(('Samantha Yanks', 'David Yanks'), 15)
(('Larry Wohl', 'Leesa Rowland'), 14)
(('Ken Starr', 'Diane Passage'), 14)
(('Edward Callaghan', 'John Wegorzewski'), 14)

NYC SOCIAL DIARY PROJECT

GATHERING THE DATA

PARSING THE DATA

ANALYZING THE DATA