Foodnet - Spanish cuisine analysis

Author: Marc Cadús García

In this notebook I pretend to apply different analytics techniques over a graph representing the Spanish cuisine in order to extract new insights. It is expected that graph algorithms may help to extract new knowledge for helping to understand better the Spanish culinary culture. To do so, I a going to use Python networkX. I have scrapped near 3000 Spanish recipes from cookpad.com. These recipes and the scrapping code are available in this repository.

Data exploration and transformation


In [93]:
#imports
import networkx as nx
import pandas as pd
from itertools import combinations
import matplotlib.pyplot as plt
from matplotlib import pylab
import sys  
from itertools import combinations
import operator
from operator import itemgetter
from scipy import integrate

In [2]:
# Exploring data
recipes_df = pd.read_csv('../data/clean_spanish_recipes.csv',sep='","')
print recipes_df.keys()
print "\n"
print recipes_df.head()

In [3]:
# Transforming data
#recipes_df["ingredients"].apply(encode("latin-1"))
recipes_df["ingredients"] = recipes_df["ingredients"].str.split("', '")
print type(recipes_df["ingredients"][0])

Graph building


In [4]:
def build_graph(nodes, graph):
   # Generate a new graph. Edges are nodes permutations in pairs
    edges = combinations(nodes, 2)
    graph.add_nodes_from(nodes)
    weighted_edges = list()
    for edge in edges:
        if graph.has_edge(edge[0],edge[1]):
            weighted_edges.append((edge[0],edge[1],graph[edge[0]][edge[1]]['weight']+1))
        else:
            weighted_edges.append((edge[0],edge[1],1))
    graph.add_weighted_edges_from(weighted_edges)

In [5]:
def save_graph(graph,file_name):
    #initialze Figure
    plt.figure(num=None, figsize=(120, 120), dpi=60)
    plt.axis('off')
    fig = plt.figure(1)
    pos = nx.spring_layout(graph)
    
    d = nx.degree(graph)
    
    nx.draw_networkx_nodes(graph,pos, nodelist=d.keys(), node_size=[v * 10 for v in d.values()])
    nx.draw_networkx_edges(graph,pos)
    nx.draw_networkx_labels(graph,pos)

    cut = 1.00
    xmax = cut * max(xx for xx, yy in pos.values())
    ymax = cut * max(yy for xx, yy in pos.values())
    plt.xlim(0, xmax)
    plt.ylim(0, ymax)

    plt.savefig(file_name,bbox_inches="tight")
    pylab.close()
    del fig

In [6]:
# Generating graph
recipes_graph  = nx.Graph()
recipes_graph.clear()
for val in recipes_df["ingredients"]:
    build_graph(val,recipes_graph)

Graph analytics


In [7]:
#Num of nodes
print "Total num of nodes: "+str(len(recipes_graph.nodes()))
print "Total num of edges: "+str(len(recipes_graph.edges()))

In [55]:
# Top 20 higher degree nodes
degrees = sorted(recipes_graph.degree_iter(),key=itemgetter(1),reverse=True)
high_degree_nodes = list()
for node in degrees[:20]:
    high_degree_nodes.append(node[0])
    print node

In [54]:
# Top 20 eigenvector centrality
eigenvector_centrality = nx.eigenvector_centrality(recipes_graph)
eigenvector_centrality_sorted = sorted(eigenvector_centrality.items(), key=itemgetter(1), reverse=True)
for node in eigenvector_centrality_sorted[1:21]:
    print node

In [60]:
# Top 20 pagerank centrality
pagerank_centrality = nx.eigenvector_centrality(recipes_graph)
pagerank_centrality_sorted = sorted(pagerank_centrality.items(), key=itemgetter(1), reverse=True)
for node in pagerank_centrality_sorted[1:21]:
    print node

In [86]:
# Conected components
connected_component = list(nx.connected_component_subgraphs(recipes_graph))
print "There is "+str(len(connected_component))+" connected componentes"
for component in connected_component:
    print "- Component of "+str(len(component))+ " nodes"
    if (len(component)==1):
        print "\t- Ingredient: "+str(component.nodes())
main_component = connected_component[0]

In [88]:
# Graph diameter
print "Nodes having minimum eccentricity\n"+str(nx.center(main_component))
print "Nodes having maximum eccentricity\n"+str(nx.periphery(main_component))
print "Minimum eccentricity "+str(nx.radius(main_component))
print "Maximum eccentricity "+str(nx.diameter(main_component))

In [90]:
# Mean cut
print "Nodes to be removed to disconect the graph"+nx.minimum_node_cut(main_component)

Visualitzations


In [91]:
# For avoid encoding problems
reload(sys)  
sys.setdefaultencoding('utf8')

In [99]:
# Original graph
save_graph(main_component,"original_graph.jpg")

In [100]:
def extract_backbone(g, alpha):
    backbone_graph = nx.Graph()
    for node in g:
        k_n = len(g[node])
        if k_n > 1:
            sum_w = sum( g[node][neighbor]['weight'] for neighbor in g[node] )
            for neighbor in g[node]:
                edgeWeight = g[node][neighbor]['weight']
                pij = float(edgeWeight)/sum_w
                if (1-pij)**(k_n-1) < alpha: # equation 2
                    backbone_graph.add_edge( node,neighbor, weight = edgeWeight)
    return backbone_graph

In [98]:
save_graph(extract_backbone(main_component,0.01),"backbone_graph.jpg")

In [ ]:
# Visualizing Higher degree nodes
k = recipes_graph.subgraph(high_degree_nodes)
save_graph(k,"high_degree_subgraph.jpg")

In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]: