Advanced Data Analysis using Python

Matthew McKay

In this notebook we demonstrate a few of the Python ecosystem tools that enable research in areas that can be difficult to do using traditional tools such as Stata that are typically fit-for-purpose tools.

The agility of a full programming language environment allows for a high degree of flexibility and the Python ecosystem provides a vast toolkit to remain productive.

Table of Contents

  1. The Product Space Network (Hidalgo, 2007)
  2. Quick introduction to Networks and Graphs
  3. Replicate Product Space Proximity Measure
    • Compute Revealed Comparative Advantage and $M_{cp}$ matrices [Tools: Pandas] (786 Products, 200+ Countries, and 53 Years)
    • Compute Proximity Matrices ($\phi_{pp'}$) and make this code run fast [Tools: Pandas, Numpy, Numba, Dask]
    • (Extension) Building Networks and Plotting Product Space Network Diagrams - albiet not as fancy [Tools: NetworkX]

Atlas of Complexity Product Space Map

Some Initial Observations

Oil (3330), has a large world export share, but is not strongly co-exported (i.e. connected in the network) with any other products (other than LNG).

Machinery, Electronics, Garments are all sectors that have a high degree of co-export potential with other related products and form part of a densely connected core of the network.

Developing Economies typically occupy products in the weakly connected periphery of the network and new products tend to emerge close to exisiting products in the network. (established from analysis using the Product Space network). Middle Income Countries manage to diffuse into the densely connected core of the product space


Network Analysis

Interest in studying networks is increasing within Economics with recent publications building network type features into their models, or using network analysis to uncover structural features of data that may otherwise go unexplored.

What is a Network (Graph)?

Many people who have interacted with tools from network analysis have done so via the idea of Social Network Analysis (SNA).

A Graph is a way of specifying relationships among a collection of items

They consist of a collection of nodes (or vertices) that are joined together by edges.


In [1]:
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('seaborn')

In [2]:
g = nx.Graph()

In [3]:
g.add_nodes_from(["A", "B", "C", "D"])

In [4]:
nx.draw_networkx(g)
plt.show()



In [5]:
g.add_edge("A", "B")   #Add Edge between Nodes A and B
g.add_edge("A", "C")   #Add Edge between Nodes A and C
g.add_edge("A", "D")   #Add Edge between Nodes A and D

In [6]:
nx.draw_networkx(g)
plt.show()


You can use network metrics to learn more about the structure. What is the most central node?


In [7]:
nx.degree_centrality(g)


Out[7]:
{'A': 1.0,
 'B': 0.3333333333333333,
 'C': 0.3333333333333333,
 'D': 0.3333333333333333}

In [8]:
g.add_edge("C","D")

In [9]:
nx.draw_networkx(g)
plt.show()



In [10]:
nx.degree_centrality(g)


Out[10]:
{'A': 1.0,
 'B': 0.3333333333333333,
 'C': 0.6666666666666666,
 'D': 0.6666666666666666}

What can we learn from Networks?

Social Network Example: Karate Club (Zachary, 1977)

One early example of Social Network Analysis was conducted by Zachary (1977) who set out to use network analysis to explain factional dynamics and to understand fission in small groups. A network of friendship was used to understand and identify how this Karate group eventually split due to an initial conflict between two members.

  • Nodes: Individuals
  • Edges: Connections were added between two individuals if they were consistently observed to interact outside the normal activities of the club.

We can learn things by considering the structure of these networks

The structure of these relationships can be exploited to uncover new insights into the data:

  1. Communities (through Clustering)
  2. Identification of main actors in Social Networks (Centrality Metrics)
  3. Identifying indirect relationships through shortest / longest paths
  4. Diffusion characteristics on temporal networks (such as disease transmission modeling)
  5. ... + many other applications across many different sciences

One visualization (Cao, 2013) demonstrates how algorithmic analysis can reveal meaningful structure that clearly identifies roles played by certain individuals, that is based on observing simpler relational information on friendship between pairs of individuals.


Replicating the Product Space Network using International Trade Data (Hidalgo, 2007)

Let's focus on an application of network analysis that is applied to international trade data to replicate some of the results contained in the Hidalgo (2007) paper and later in the The Atlas of Complexity and The Observatory of Economic Complexity.

The Hidalgo (2007) paper is used as a motivating example to demonstrate various tools that are available in the Python ecosystem.

In this setting we want to looking at a characterisation of International Trade data by considering:

  • Nodes: Products
  • Edges: the likelihood of two products being co-exported

Assumption: If products are highly co-exported across countries, then the products are revealed to be more likely to share similar factors of production (or capabilities) required to produce them. For example, Shirts and Pants require a set of similar skills that lend themselves to be co-exported, while shirts and cars are much more dissimilar.

This relational information between products can be represented by a edge weights.

A high value means they have a high likelihood of being co-exported


In [11]:
g = nx.Graph()
g.add_edge("P1", "P2")
pos = nx.spring_layout(g)
nx.draw_networkx(g, pos=pos, node_size=600, font_size=14)
nx.draw_networkx_edge_labels(g, pos=pos, edge_labels={("P1", "P2") : "Coexport Probability: Min{P(P1 | P2), P(P2 | P1)}"}, font_size=14)
plt.show()


Let's work with a Toy Example with 8 products


In [12]:
adj = pd.read_csv("./data/simple_productspace.csv", names=["P1", "P2", "weight"])
adj


Out[12]:
P1 P2 weight
0 Shirts Pants 0.900
1 Shirts Cars 0.050
2 Shirts Cows 0.010
3 Shirts Sugar 0.010
4 Pants Cars 0.050
5 Oil Sugar 0.010
6 Oil Cars 0.005
7 Oil Aircraft 0.005
8 Cars Aircraft 0.500
9 Cars Aircraft 0.500
10 Wheat Sugar 0.400
11 Wheat Cows 0.200
12 Cows Sugar 0.100
13 Cows Cars 0.001
14 Cows Pants 0.060

In [13]:
g = nx.read_weighted_edgelist("./data/simple_productspace.csv", delimiter=",")

In [14]:
g.nodes()


Out[14]:
['Shirts', 'Pants', 'Cars', 'Cows', 'Sugar', 'Oil', 'Aircraft', 'Wheat']

In [15]:
#Visualize the Network
pos = nx.spring_layout(g)
weights = [g[u][v]['weight']*10 for u,v in g.edges()]
nx.draw_networkx(g, node_size=1200, pos=pos, width=weights)
plt.show()


Scale Up to Full set of Products (SITC R2 L4)

We now want to compute the edge weights to explore the full product space network derived from product level international trade data by computing the proximity matrix:

$$ \phi_{ij} = \min \{ P(RCA_i >= 1 \hspace{0.25cm}| \hspace{0.25cm} RCA_j >= 1), P(RCA_j >= 1 \hspace{0.25cm} | \hspace{0.25cm} RCA_i >= 1) \} $$

Proximity: A high proximity value suggests any two products are exported by a similar set of countries.

The tasks involve:

  1. Compute Revealed Comparative Advantage and $M_{cp}$ matrices [Tools: Pandas]
  2. Compute Proximity Matrices ($\phi_{pp'}$) and make this code run fast [Tools: Pandas, Numpy, Numba, Dask]
  3. Building Networks and Plotting Product Space Network Diagrams - albiet not as fancy [Tools: NetworkX]

Computing Proximity


In [16]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from numba import jit
import networkx as nx
from bokeh.io import output_notebook

In [17]:
#-Load Jupyter Extensions-#
%matplotlib inline
output_notebook()


Loading BokehJS ...

Data

International Trade Data is largely available in SITC and HS product classification systems.

In this notebook we will focus on SITC revision 2 Level 4 data with 786 defined products.

Classification Level Products
SITC 4 786
HS 6 5016

Note:

We use SITC data in this seminar, but as you can see performance of code becomes even more important when working with fully disaggregated HS international trade data


In [18]:
fl = "./data/year_origin_sitc_rev2.csv"
data = pd.read_csv(fl, converters={'sitc':str})   #Import SITC codes as strings to preserve formatting

In [19]:
data.head()


Out[19]:
year origin sitc export
0 1962 AFG 0230 4000.0
1 1962 AFG 0250 66000.0
2 1962 AFG 0540 74000.0
3 1962 AFG 0545 17000.0
4 1962 AFG 0548 33000.0

Question 1: What years are available in this dataset?

Hint: There is a method named unique(), so you should get the array of years and then call .unique()


In [ ]:

data['year'].unique()

Question 2: How many non-zero trade flow values are in this dataset?


In [ ]:

data.shape[0]

Question 3: What countries are available in this dataset?


In [ ]:

data['origin'].unique()

In [20]:
data[(data['year'] == 2000)&(data['origin']=="AUS")].head()


Out[20]:
year origin sitc export
2213361 2000 AUS 0011 260132101.0
2213362 2000 AUS 0012 167611315.0
2213363 2000 AUS 0013 280098.0
2213364 2000 AUS 0014 548603.0
2213365 2000 AUS 0015 134571371.0

In [21]:
data[data['origin'] == 'AUS'].set_index(["year","sitc"])["export"].unstack(level="year").head()


Out[21]:
year 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 ... 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
sitc
0010 NaN NaN NaN 1000.0 NaN NaN NaN 1000.0 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
0011 1286000.0 675000.0 1177000.0 1700000.0 748000.0 548000.0 586000.0 747000.0 331000.0 443000.0 ... 283381176.0 284544151.0 365401097.0 571444736.0 586230117.0 793934945.0 650001457.0 627274883.0 728998394.0 1.114734e+09
0012 2600000.0 3619000.0 5593000.0 3560000.0 4725000.0 5078000.0 5578000.0 6828000.0 11517000.0 13321000.0 ... 211748259.0 215548475.0 216467212.0 307677017.0 296862273.0 319842757.0 347639153.0 296233596.0 174194459.0 2.209887e+08
0013 26000.0 52000.0 81000.0 69000.0 60000.0 32000.0 99000.0 108000.0 108000.0 189000.0 ... 260620.0 309274.0 549802.0 153458.0 37764.0 224888.0 NaN NaN 345256.0 2.302240e+05
0014 375000.0 398000.0 445000.0 187000.0 83000.0 73000.0 30000.0 83000.0 263000.0 258000.0 ... 592790.0 2842029.0 3599939.0 4708910.0 6542608.0 5466152.0 5372783.0 4237839.0 1165708.0 3.056300e+05

5 rows × 53 columns


Computing Revealed Comparative Advantage

The literature uses the standard Balassa definition for Revealed Comparative Advantage

$$ \large RCA_{cpt} = \frac{\frac{E_{cpt}}{E_{ct}}}{\frac{E_{pt}}{E_t}} $$

where,

  1. $E_{cpt}$ are exports from country $c$ in product $p$ at time $t$
  2. $E_{ct}$ are total country $c$ exports at time $t$
  3. $E_{pt}$ are total product $p$ exports at time $t$
  4. $E_{t}$ are total world exports at time $t$

Reference: Balassa, B. (1965), Trade Liberalisation and Revealed Comparative Advantage, The Manchester School, 33, 99-123.

To compute RCA we need to aggregate data at difference levels to obtain each component of the fraction defined above.

Let's break the equation down to figure out what needs to be computed:

$$ \large E_{ct} = \sum_{p}{E_{cpt}} $$

In [22]:
cntry_export = data[["year", "origin", "export"]].groupby(by=["year", "origin"]).sum()
cntry_export.head(n=2)


Out[22]:
export
year origin
1962 AFG 86135000.0
AGO 119458000.0

This gives us a pandas.DataFrame that is indexed by a multi-index object. This can be very useful but we would like to use this data in the original data table for each product exported at time t by each country. We could use this new object and:

  1. merge the data back into the original data DataFrame
  2. use transform to request an object that is of the same shape as the original data DataFrame.

In [23]:
data["cntry_export"] = data[["year", "origin", "export"]].groupby(by=["year", "origin"]).transform(np.sum)
data["prod_export"] = data[["year", "sitc", "export"]].groupby(by=["year", "sitc"]).transform(np.sum)
data["world_export"] = data[["year", "export"]].groupby(by=["year"]).transform(np.sum)

Now that the components of the equation have been computed we can now simply calculate $RCA$ as expressed by the original fraction


In [24]:
data["rca"] = (data["export"] / data["cntry_export"]) / (data["prod_export"] / data["world_export"])

In [25]:
data.head()


Out[25]:
year origin sitc export cntry_export prod_export world_export rca
0 1962 AFG 0230 4000.0 86135000.0 438581000.0 1.428420e+11 0.015125
1 1962 AFG 0250 66000.0 86135000.0 261448000.0 1.428420e+11 0.418634
2 1962 AFG 0540 74000.0 86135000.0 48924000.0 1.428420e+11 2.508338
3 1962 AFG 0545 17000.0 86135000.0 349188000.0 1.428420e+11 0.080736
4 1962 AFG 0548 33000.0 86135000.0 85126000.0 1.428420e+11 0.642877

Computing $M_{cp}$ Matrix: Who Exports What Products and When?

$RCA >= 1$ is where country $c$ has a revealed comparative advantage in product $p$ at time $t$

Therefore we can define the matrix $M_{cp}$:

$$ M_{cp} = \begin{cases} 1 \text{ if }RCA \ge 1\\ 0 \text{ if }RCA \lt 1 \end{cases} $$

We can first construct $RCA$ matrices and then compute $M_{cp}$ using a conditional map


In [26]:
#-Generate Yearly RCA Mcp Matrices and store them in a Dictionary-#
rca = {}
for year in data.year.unique():
    yr = data[data.year == year].set_index(['origin', 'sitc']).unstack('sitc')['rca']
    rca[year] = yr

In [27]:
rca[2000].head()


Out[27]:
sitc 0011 0012 0013 0014 0015 0111 0112 0113 0114 0115 ... 8994 8996 8997 8998 8999 9310 9410 9510 9610 9710
origin
ABW NaN NaN NaN NaN NaN 0.000189 NaN 0.008223 NaN NaN ... NaN 0.009006 0.000910 0.006053 0.001468 1.264113 0.104766 NaN NaN 0.76756
AFG NaN NaN NaN NaN NaN 0.176481 0.066896 NaN NaN NaN ... NaN 0.067773 0.782384 NaN NaN 0.066374 NaN NaN NaN NaN
AGO NaN NaN 0.000012 NaN NaN 0.001756 NaN NaN NaN NaN ... NaN NaN NaN 0.000137 NaN 0.026715 NaN NaN 0.005944 NaN
AIA NaN NaN NaN NaN NaN 15.716737 6.270237 NaN NaN NaN ... NaN NaN NaN NaN 0.649915 6.906314 NaN NaN NaN NaN
ALB NaN NaN NaN NaN NaN NaN NaN NaN 0.006836 NaN ... 0.304894 0.014938 0.146457 0.093441 NaN 0.357895 25.006762 NaN 3.426721 NaN

5 rows × 775 columns

Question 6: How can we use rca to compute the mcp matrix?


In [28]:
#-Generate Yearly Binary Mcp Matrices-#
mcp = {}
for year in rca.keys():
    mcp[year] = rca[year].fillna(0.0).applymap(lambda x: 1 if x >= 1.0 else 0.0)

#-Generate Yearly Binary Mcp Matrices-#
mcp = {}
for year in rca.keys():
    mcp[year] = rca[year].fillna(0.0).applymap(lambda x: 1 if x >= 1.0 else 0.0)

Question: What is the key assumption implied by the above code?


In [29]:
mcp[2000].head()


Out[29]:
sitc 0011 0012 0013 0014 0015 0111 0112 0113 0114 0115 ... 8994 8996 8997 8998 8999 9310 9410 9510 9610 9710
origin
ABW 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
AFG 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
AGO 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
AIA 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
ALB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0

5 rows × 775 columns

Question 6: What products did Australia ("AUS") export with RCA in 1998?


In [ ]:

products = mcp[1998].loc['AUS']
products[products == 1.0]

Computing Proximity Matrix $\phi_{ij}$

Proximity: A high proximity value suggests any two products are exported by a similar set of countries.

$$ \phi_{ij} = \min \{ P(RCA_i >=1 \hspace{0.25cm} | \hspace{0.25cm} RCA_j >= 1), P(RCA_j >= 1 \hspace{0.25cm} | \hspace{0.25cm} RCA_i >= 1) \} $$

The minimum conditional probability of coexport can be computed:

$$ \phi_{ij} = \frac{\sum_c \{ M_{cp_i} * M_{cp_j} \}}{max \{k_{p_i}, k_{p_j}\}} $$

where,

  1. $k_{p_i}$ is the ubiquity of product $i$ (i.e. the number of countries that export product $i$)
  2. $k_{p_j}$ is the ubiquity of product $j$ (i.e. the number of countries that export product $j$)
  3. $M_{cp_i}$ is the column vector in $M_{cp}$ matrix for product $i$
  4. $M_{cp_j}$ is the column vector in $M_{cp}$ matrix for product $j$
  5. $\sum_c \{ M_{cp_i} * M_{cp_j} \}$ is the number of countries that export both product $i$ and product $j$

The $\phi_{ij}$ matrix is therefore computed through all pairwise combinations of column vectors which is computationally intensive.

Step 1: Compute Proximity Matrix using Pandas


In [30]:
def proximity_matrix_pandas(mcp, fillna=True):
    products = sorted(list(mcp.columns))
    sum_products = mcp.sum(axis=0)
    proximity = pd.DataFrame(index=products, columns=products)
    for i, product1 in enumerate(products):
        for j, product2 in enumerate(products):
            if j > i:  #Symmetric Matrix Condition
                continue
            numerator = (mcp[product1] * mcp[product2]).sum()
            denominator = max(sum_products[product1], sum_products[product2])
            if denominator == 0:
                cond_prob =  np.nan
            else:
                cond_prob = numerator / denominator
            proximity.set_value(index=product1, col=product2, value=cond_prob)
            proximity.set_value(index=product2, col=product1, value=cond_prob)
    if fillna:
        proximity = proximity.fillna(0.0)
    return proximity

In [31]:
%%time
prox_2000 = proximity_matrix_pandas(mcp[2000])


CPU times: user 44.9 s, sys: 528 ms, total: 45.4 s
Wall time: 45.1 s

Check the Data (simple stats and visualizations)

Hidalgo (2007) suggests that 32% of values are < 0.1 and 65% of values are < 0.2


In [32]:
prox_2000.unstack().describe()


Out[32]:
count    600625.000000
mean          0.159894
std           0.106652
min           0.000000
25%           0.081081
50%           0.142857
75%           0.222222
max           1.000000
dtype: float64

In [33]:
prox_2000.unstack().hist()


Out[33]:
<matplotlib.axes._subplots.AxesSubplot at 0x1211ae6d8>

But Wait - Problem!

at ~1 minute this is taking a reasonably long time to compute for one year. This makes working with this data in an agile way problematic and computing for 50 years would take an hour to compute. While this was easy to implement, it isn't very fast!

Let's profile this code to get an understanding where we spend most of our time

For this line to run you will need to install line_profiler by running:

conda install line_profiler

In [34]:
# !conda install line_profiler

In [35]:
import line_profiler
%load_ext line_profiler

In [36]:
# %lprun -f proximity_matrix_pandas proximity_matrix_pandas(mcp[2000])

Step 2: Consider other Python Tools (NumPy)

Most of the time you will want to conduct numerical type computing in NumPy.

The code actually looks pretty similar - the main difference is conducting operations on pure numpy arrays


In [37]:
def proximity_matrix_numpy(mcp, fillna=False):
    products = sorted(list(mcp.columns))
    num_products = len(products)
    proximity = np.empty((num_products, num_products))
    col_sums = mcp.sum().values  
    data = mcp.T.as_matrix()                  #This generates a c x p numpy array
    for index1 in range(0,num_products):
        for index2 in range(0,num_products):
            if index2 > index1:
                continue
            numerator = (data[index1] * data[index2]).sum()
            denominator = max(col_sums[index1], col_sums[index2])
            if denominator == 0.0:
                cond_prob = np.nan
            else:
                cond_prob = numerator / denominator
            proximity[index1][index2] = cond_prob
            proximity[index2][index1] = cond_prob
    # Return DataFrame Representation #
    proximity = pd.DataFrame(proximity, index=products, columns=products)
    proximity.index.name = 'productcode1'
    proximity.columns.name = 'productcode2'
    if fillna:
        proximity = proximity.fillna(0.0)
    return proximity

In [38]:
%%time
prox_2000_numpy = proximity_matrix_numpy(mcp[2000])


CPU times: user 1.39 s, sys: 3.76 ms, total: 1.39 s
Wall time: 1.39 s

In [39]:
prox_2000.equals(prox_2000_numpy)


Out[39]:
True

Step 3: Just in Time Compilation (Numba)

Numba is a package you can use to accelerate your code by using a technique called just in time (or JIT) compilation. It converts your high-level python code to low level llvm code to run it closer to the raw machine level.

nopython=True ensures the jit compiles without any python objects. If it cannot achieve this it will throw an error.

Numba now supports a lot of the NumPy api and can be checked here


In [40]:
@jit(nopython=True)
def coexport_probability(data, num_products, col_sums):
    proximity = np.empty((num_products, num_products))
    for index1 in range(0,num_products):
        for index2 in range(0,num_products):
            if index2 > index1:
                continue
            numerator = (data[index1] * data[index2]).sum()
            denominator = max(col_sums[index1], col_sums[index2])
            if denominator == 0.0:
                cond_prob = np.nan
            else:
                cond_prob = numerator / denominator
            proximity[index1][index2] = cond_prob
            proximity[index2][index1] = cond_prob
    return proximity

def proximity_matrix_numba(mcp, fillna=False):
    products = sorted(list(mcp.columns))
    num_products = len(products)
    col_sums = mcp.sum().values  
    data = mcp.T.as_matrix()                  
    proximity = coexport_probability(data, num_products, col_sums)   #Call Jit Function
    # Return DataFrame Representation #
    proximity = pd.DataFrame(proximity, index=products, columns=products)
    proximity.index.name = 'productcode1'
    proximity.columns.name = 'productcode2'
    if fillna:
        proximity = proximity.fillna(0.0)
    return proximity

In [41]:
prox_2000_numba = proximity_matrix_numba(mcp[2000])

In [42]:
%%timeit
prox_2000_numba = proximity_matrix_numba(mcp[2000])


10 loops, best of 3: 131 ms per loop

In [43]:
prox_2000_numba.equals(prox_2000)


Out[43]:
True

Computing All Years


In [44]:
%%time
proximity = {}
for year in mcp.keys():
    proximity[year] = proximity_matrix_numba(mcp[year])


CPU times: user 6.36 s, sys: 61 ms, total: 6.42 s
Wall time: 6.42 s

In [45]:
proximity[2000].head()


Out[45]:
productcode2 0011 0012 0013 0014 0015 0111 0112 0113 0114 0115 ... 8994 8996 8997 8998 8999 9310 9410 9510 9610 9710
productcode1
0011 1.000000 0.448276 0.206897 0.206897 0.275862 0.37500 0.275862 0.344828 0.206897 0.241379 ... 0.068966 0.137931 0.162791 0.068966 0.034483 0.244898 0.178082 0.206897 0.137931 0.175
0012 0.448276 1.000000 0.178571 0.107143 0.321429 0.25000 0.392857 0.142857 0.178571 0.142857 ... 0.035714 0.178571 0.116279 0.071429 0.000000 0.163265 0.219178 0.071429 0.107143 0.125
0013 0.206897 0.178571 1.000000 0.227273 0.100000 0.12500 0.083333 0.500000 0.312500 0.153846 ... 0.153846 0.210526 0.093023 0.080000 0.142857 0.163265 0.095890 0.090909 0.071429 0.075
0014 0.206897 0.107143 0.227273 1.000000 0.181818 0.15625 0.125000 0.318182 0.363636 0.090909 ... 0.090909 0.181818 0.139535 0.120000 0.045455 0.142857 0.178082 0.409091 0.045455 0.200
0015 0.275862 0.321429 0.100000 0.181818 1.000000 0.28125 0.250000 0.150000 0.200000 0.250000 ... 0.050000 0.350000 0.139535 0.040000 0.050000 0.122449 0.150685 0.136364 0.150000 0.125

5 rows × 775 columns


Using Dask to Compute all Years in Parallel

NOTE: THIS WON'T WORK ON DEMO DOCKER ENVIRONMENT

Now that we have a fast single year computation, we can compute all cross-sections serially using a loop.

Alternatively, we can parallelize these operations using Dask to delay computation and then ask the Dask scheduler to coordinate the computation over the number of cores available to you. This is particularly useful when using HS data.

Note: This simple approach to parallelization does have some overhead to coordinate the computations so you won't get a full 4 x speed up when using a 4-core machine.


In [46]:
import dask
from distributed import Client
Client()


Out[46]:

Client

Cluster

  • Workers: 8
  • Cores: 8
  • Memory: 10.31 GB

In [47]:
#-Setup the Computations as a Collection of Tasks-#
collection = []
for year in sorted(mcp.keys()):
    collection.append((year, dask.delayed(proximity_matrix_numba)(mcp[year])))

In [48]:
%%time
#-Compute the Results-#
result = dask.compute(*collection)


CPU times: user 1.25 s, sys: 1.11 s, total: 2.35 s
Wall time: 5.11 s

In [49]:
#-Organise the list of returned tuples into a convenient dictionary-#
results = {}
for year, df in result:
    results[year] = df

In [50]:
results[2000].equals(prox_2000)


Out[50]:
True

In [51]:
results[2000].head()


Out[51]:
productcode2 0011 0012 0013 0014 0015 0111 0112 0113 0114 0115 ... 8994 8996 8997 8998 8999 9310 9410 9510 9610 9710
productcode1
0011 1.000000 0.448276 0.206897 0.206897 0.275862 0.37500 0.275862 0.344828 0.206897 0.241379 ... 0.068966 0.137931 0.162791 0.068966 0.034483 0.244898 0.178082 0.206897 0.137931 0.175
0012 0.448276 1.000000 0.178571 0.107143 0.321429 0.25000 0.392857 0.142857 0.178571 0.142857 ... 0.035714 0.178571 0.116279 0.071429 0.000000 0.163265 0.219178 0.071429 0.107143 0.125
0013 0.206897 0.178571 1.000000 0.227273 0.100000 0.12500 0.083333 0.500000 0.312500 0.153846 ... 0.153846 0.210526 0.093023 0.080000 0.142857 0.163265 0.095890 0.090909 0.071429 0.075
0014 0.206897 0.107143 0.227273 1.000000 0.181818 0.15625 0.125000 0.318182 0.363636 0.090909 ... 0.090909 0.181818 0.139535 0.120000 0.045455 0.142857 0.178082 0.409091 0.045455 0.200
0015 0.275862 0.321429 0.100000 0.181818 1.000000 0.28125 0.250000 0.150000 0.200000 0.250000 ... 0.050000 0.350000 0.139535 0.040000 0.050000 0.122449 0.150685 0.136364 0.150000 0.125

5 rows × 775 columns

Note: Dask does a lot more than this and is worth looking into for medium to large scale computations


In [52]:
#-Save Results into a HDF5 File-#
fl = "data/sitcr2l4_proximity.h5"
store = pd.HDFStore(fl, mode='w')
for year in results.keys():
    store["Y{}".format(year)] = results[year]
store.close()

In [53]:
%%html
<style>
  table {margin-left: 0 !important;}
</style>



Performance Comparison (SITC and HS Data)

For SITC Data: (786 Products, 229 Countries, 52 Years)

Function Time/Year Total Time Speedup
pandas 220 seconds ~177 minutes -
pandas_symmetric 104 seconds ~84 minutes BASE
numpy 2.5 seconds 120 seconds ~41x
numba 124 milliseconds 6 seconds ~800x
numba + dask N/A 5 seconds -

For HS Data: (5016 Products, 222 Countries, 20 Years)

Function Time/Year Total Time Speedup
pandas 1 Hour 25 minutes - -
pandas_symmetric 43 minutes - BASE
numpy 1 min 37 seconds - ~28x
numba 5 seconds 1min 45 seconds ~516x
numba + dask N/A 45 seconds -

These were run on the following machine:

Item Details
Processor Xeon E5 @ 3.6Ghz
Cores 8
RAM 32Gb RAM
Python Python 3.6

(Extension) Preparing Graph Data: Product Space Network

Here we will use NetworkX to construct our version of the Product Space using Python


In [54]:
prox = pd.read_hdf("data/sitcr2l4_proximity.h5", key="Y2000")

In [55]:
prox.head()


Out[55]:
productcode2 0011 0012 0013 0014 0015 0111 0112 0113 0114 0115 ... 8994 8996 8997 8998 8999 9310 9410 9510 9610 9710
productcode1
0011 1.000000 0.448276 0.206897 0.206897 0.275862 0.37500 0.275862 0.344828 0.206897 0.241379 ... 0.068966 0.137931 0.162791 0.068966 0.034483 0.244898 0.178082 0.206897 0.137931 0.175
0012 0.448276 1.000000 0.178571 0.107143 0.321429 0.25000 0.392857 0.142857 0.178571 0.142857 ... 0.035714 0.178571 0.116279 0.071429 0.000000 0.163265 0.219178 0.071429 0.107143 0.125
0013 0.206897 0.178571 1.000000 0.227273 0.100000 0.12500 0.083333 0.500000 0.312500 0.153846 ... 0.153846 0.210526 0.093023 0.080000 0.142857 0.163265 0.095890 0.090909 0.071429 0.075
0014 0.206897 0.107143 0.227273 1.000000 0.181818 0.15625 0.125000 0.318182 0.363636 0.090909 ... 0.090909 0.181818 0.139535 0.120000 0.045455 0.142857 0.178082 0.409091 0.045455 0.200
0015 0.275862 0.321429 0.100000 0.181818 1.000000 0.28125 0.250000 0.150000 0.200000 0.250000 ... 0.050000 0.350000 0.139535 0.040000 0.050000 0.122449 0.150685 0.136364 0.150000 0.125

5 rows × 775 columns

use pandas to construct and edge list


In [56]:
edge_list = prox.unstack()

In [57]:
#-Construct Sequence of node pairs as a pd.Series
edge_list.head()


Out[57]:
productcode2  productcode1
0011          0011            1.000000
              0012            0.448276
              0013            0.206897
              0014            0.206897
              0015            0.275862
dtype: float64

In [58]:
#-Remove Self Loops-#
edge_list = edge_list[edge_list != 1.0]     #TODO: do this operation properly to compare node1 == node2

In [59]:
edge_list.head()


Out[59]:
productcode2  productcode1
0011          0012            0.448276
              0013            0.206897
              0014            0.206897
              0015            0.275862
              0111            0.375000
dtype: float64

We would like to construct the maximum_spanning_tree, but the current version of networkx supports minimum_spanning_tree so we need to add inv_weight for this computation.


In [60]:
#-Construct DataFrame-#
edge_list = edge_list.reset_index()
edge_list.columns = ["P1","P2","weight"]

In [61]:
edge_list["inv_weight"] = 1 - edge_list['weight']    #Useful when working with minimum spanning tree in networkx

In [62]:
edge_list.head()


Out[62]:
P1 P2 weight inv_weight
0 0011 0012 0.448276 0.551724
1 0011 0013 0.206897 0.793103
2 0011 0014 0.206897 0.793103
3 0011 0015 0.275862 0.724138
4 0011 0111 0.375000 0.625000

In [63]:
edge_list[["weight","inv_weight"]].hist();


Network Tools

We want to now construct a maximum_spanning_tree and then add in all nodes that are highly connected above a threshold value of 0.5


In [64]:
import networkx as nx

In [65]:
#-Construct the complete network-#
g = nx.from_pandas_dataframe(edge_list, source="P1", target="P2", edge_attr=["weight", "inv_weight"])
print("# of Nodes: {}".format(g.number_of_nodes()))
print("# of Edges: {}".format(g.number_of_edges()))


# of Nodes: 775
# of Edges: 299925

In [66]:
mst = nx.minimum_spanning_tree(g, weight='inv_weight') #Maximum Spanning Tree
print("# of Nodes: {}".format(mst.number_of_nodes()))
print("# of Edges: {}".format(mst.number_of_edges()))


# of Nodes: 775
# of Edges: 774

In [67]:
mst["0011"]


Out[67]:
{'0012': {'inv_weight': 0.5517241379310345, 'weight': 0.4482758620689655},
 '0223': {'inv_weight': 0.5517241379310345, 'weight': 0.4482758620689655}}

In [68]:
#-Build Maximum Spanning Tree + Keep Edges > 0.50-#
ps = nx.Graph()
#Add MST ('weight' attribute only)
for u,v,w in mst.edges_iter(data=True):
    ps.add_edge(u,v,attr_dict={'weight' : w["weight"]})
#Add Edges > 0.50
for u,v,w in g.edges_iter(data=True):
    if w['weight'] >= 0.50:
        ps.add_edge(u,v,attr_dict={'weight' : w["weight"]})

In [69]:
print("# of Nodes: {}".format(ps.number_of_nodes()))
print("# of Edges: {}".format(ps.number_of_edges()))


# of Nodes: 775
# of Edges: 1547

Visualizations


In [70]:
ps_nodes = pd.read_csv("data/PS_SITC_nodes", sep="\t", converters={'sitc' : str},
                       names=["sitc", "community", "x", "y", "nodesize","leamer","pname","ncolor"])
ps_edges = pd.read_csv("data/PS_SITC_edges", sep="\t", converters={'sourceid' : str, 'targetid' : str},
                       names=["sourceid", "sourcex", "sourcey","targetid","targetx","targety", "width","color"])

In [71]:
ps_nodes.head()


Out[71]:
sitc community x y nodesize leamer pname ncolor
0 6932 999 4551.899658 2540.087158 48.780762 8 WIRE,TWISTED HOOP FOR FENCING OF IRON OR STEEL #9c9a87
1 7362 10 216.835098 5013.330811 65.180725 9 METAL FORMING MACHINE TOOLS #4037ab
2 7911 10 538.914902 5650.589111 53.997589 9 RAIL LOCOMOTIVES,ELECTRIC #4037ab
3 8946 10 696.394257 5316.897949 57.695251 7 NON-MILITARY ARMS AND AMMUNITION THEREFOR #4037ab
4 7264 10 57.284065 5879.528076 73.333267 9 PRINTING PRESSES #4037ab

In [72]:
ps_nodes.shape


Out[72]:
(774, 8)

In [73]:
def normalize(df, column):
    max_value = df[column].max()
    min_value = df[column].min()
    df[column+"_scaled"] = (df[column] - min_value) / (max_value - min_value)
    return df

In [74]:
#Preprocess Coordinates to be Normalized between 0,1
ps_nodes = normalize(ps_nodes, 'x')
ps_nodes = normalize(ps_nodes, 'y')

In [75]:
ps_nodes.head()


Out[75]:
sitc community x y nodesize leamer pname ncolor x_scaled y_scaled
0 6932 999 4551.899658 2540.087158 48.780762 8 WIRE,TWISTED HOOP FOR FENCING OF IRON OR STEEL #9c9a87 0.831500 0.136279
1 7362 10 216.835098 5013.330811 65.180725 9 METAL FORMING MACHINE TOOLS #4037ab 0.335160 0.632872
2 7911 10 538.914902 5650.589111 53.997589 9 RAIL LOCOMOTIVES,ELECTRIC #4037ab 0.372036 0.760825
3 8946 10 696.394257 5316.897949 57.695251 7 NON-MILITARY ARMS AND AMMUNITION THEREFOR #4037ab 0.390066 0.693824
4 7264 10 57.284065 5879.528076 73.333267 9 PRINTING PRESSES #4037ab 0.316892 0.806793

In [76]:
import numpy as np
#-Obtain Dictionary of Coordinates-#
coord = {}
xy = ps_nodes[["x_scaled","y_scaled"]].values
for idx, productcode in enumerate(ps_nodes["sitc"]):
   coord[productcode] = xy[idx]
#-Add Missing Nodes-#
coord['6784'] = np.array([0,0])

In [77]:
coord


Out[77]:
{'0011': array([ 0.63635789,  0.29305675]),
 '0012': array([ 0.89501582,  0.00749056]),
 '0013': array([ 0.49858259,  0.76763095]),
 '0014': array([ 0.5520262 ,  0.40525301]),
 '0015': array([ 0.74470858,  0.71973192]),
 '0111': array([ 0.79874644,  0.84198768]),
 '0112': array([ 0.87884575,  0.01180109]),
 '0113': array([ 0.49590829,  0.67857286]),
 '0114': array([ 0.48507251,  0.76832806]),
 '0115': array([ 0.76640784,  0.76481862]),
 '0116': array([ 0.81179179,  0.81664421]),
 '0118': array([ 0.64906381,  0.19757255]),
 '0121': array([ 0.10098441,  0.53733217]),
 '0129': array([ 0.68994095,  0.30165181]),
 '0141': array([ 0.52012477,  0.78594745]),
 '0142': array([ 0.68846539,  0.38211879]),
 '0149': array([ 0.66141205,  0.31634175]),
 '0223': array([ 0.65156137,  0.37792959]),
 '0224': array([ 0.68550201,  0.40946486]),
 '0230': array([ 0.71059287,  0.6434541 ]),
 '0240': array([ 0.70160323,  0.42749238]),
 '0251': array([ 0.71759312,  0.38431479]),
 '0252': array([ 0.4006868 ,  0.32952457]),
 '0341': array([ 0.94629419,  0.19430359]),
 '0342': array([ 0.91927217,  0.20276235]),
 '0343': array([ 0.97770012,  0.14887625]),
 '0344': array([ 0.93688035,  0.14636667]),
 '0350': array([ 0.93374579,  0.17769192]),
 '0360': array([ 0.93429738,  0.21478526]),
 '0371': array([ 0.91780697,  0.16919306]),
 '0372': array([ 1.        ,  0.14001901]),
 '0411': array([ 0.65493384,  0.8134331 ]),
 '0412': array([ 0.64433633,  0.75996328]),
 '0421': array([ 0.90644388,  0.93082028]),
 '0422': array([ 0.88236159,  0.92425238]),
 '0430': array([ 0.62964211,  0.74067222]),
 '0440': array([ 0.6439859 ,  0.82573238]),
 '0451': array([ 0.72568476,  0.85958402]),
 '0452': array([ 0.52897962,  0.84234945]),
 '0459': array([ 0.85069646,  0.96400206]),
 '0460': array([ 0.76932035,  0.19184465]),
 '0470': array([ 0.75047206,  0.19027125]),
 '0481': array([ 0.67451731,  0.24032582]),
 '0482': array([ 0.53932578,  0.41319688]),
 '0483': array([ 0.917674 ,  0.8337203]),
 '0484': array([ 0.74867081,  0.42008408]),
 '0488': array([ 0.68121005,  0.32931457]),
 '0541': array([ 0.86236723,  0.11724096]),
 '0542': array([ 0.88106794,  0.16425865]),
 '0544': array([ 0.84585708,  0.18388889]),
 '0545': array([ 0.82882269,  0.27524436]),
 '0546': array([ 0.84539276,  0.21776117]),
 '0548': array([ 0.95186634,  0.39403734]),
 '0561': array([ 0.86091215,  0.17606901]),
 '0564': array([ 0.80157051,  0.27752462]),
 '0565': array([ 0.80736858,  0.33735847]),
 '0571': array([ 0.84980292,  0.05240932]),
 '0572': array([ 0.85395723,  0.07730166]),
 '0573': array([ 0.92352244,  0.10519153]),
 '0574': array([ 0.87084448,  0.06825976]),
 '0575': array([ 0.87665644,  0.09216018]),
 '0576': array([ 0.88937099,  0.0854536 ]),
 '0577': array([ 0.93917723,  0.35682213]),
 '0579': array([ 0.82376707,  0.3387675 ]),
 '0582': array([ 0.86347977,  0.79212344]),
 '0583': array([ 0.78106859,  0.36023599]),
 '0585': array([ 0.78217058,  0.4039315 ]),
 '0586': array([ 0.79101679,  0.32925432]),
 '0589': array([ 0.80365997,  0.31380688]),
 '0611': array([ 0.89877183,  0.14383523]),
 '0612': array([ 0.75952269,  0.17560915]),
 '0615': array([ 0.75901759,  0.14102924]),
 '0616': array([ 0.7945135 ,  0.06174391]),
 '0619': array([ 0.38018554,  0.32189533]),
 '0620': array([ 0.76761188,  0.41850043]),
 '0711': array([ 0.91184664,  0.13190565]),
 '0712': array([ 0.63294722,  0.17489395]),
 '0721': array([ 0.04181841,  0.04137577]),
 '0722': array([ 0.05211408,  0.02292486]),
 '0723': array([ 0.06483267,  0.07456639]),
 '0730': array([ 0.73716717,  0.45194727]),
 '0741': array([ 0.87066799,  0.9471269 ]),
 '0742': array([ 0.6723572 ,  0.91495096]),
 '0751': array([ 0.92768681,  0.12685502]),
 '0752': array([ 0.86969386,  0.29415243]),
 '0811': array([ 0.84528249,  1.        ]),
 '0812': array([ 0.92814406,  0.26480632]),
 '0813': array([ 0.65228856,  0.85849676]),
 '0814': array([ 0.95241466,  0.10117223]),
 '0819': array([ 0.67600939,  0.28060519]),
 '0913': array([ 0.52059946,  0.60956515]),
 '0914': array([ 0.7349096 ,  0.29270094]),
 '0980': array([ 0.74237336,  0.38104099]),
 '1110': array([ 0.74606296,  0.34412809]),
 '1121': array([ 0.8669751 ,  0.04067182]),
 '1122': array([ 0.68477878,  0.17416487]),
 '1123': array([ 0.69575134,  0.27443376]),
 '1124': array([ 0.69092375,  0.22269325]),
 '1211': array([ 0.95953241,  0.35811837]),
 '1212': array([ 0.98245759,  0.33012482]),
 '1213': array([ 0.98199081,  0.37120124]),
 '1221': array([ 0.59588984,  0.21548835]),
 '1222': array([ 0.69925874,  0.65791095]),
 '1223': array([ 0.63627746,  0.13192413]),
 '2111': array([ 0.78592494,  0.13730379]),
 '2112': array([ 0.78661895,  0.16876345]),
 '2114': array([ 0.82984682,  0.07978413]),
 '2116': array([ 0.8011108 ,  0.15361493]),
 '2117': array([ 0.80209171,  0.1806093 ]),
 '2119': array([ 0.79439504,  0.12168673]),
 '2120': array([ 0.54353332,  0.84488026]),
 '2221': array([ 0.85765862,  0.94001341]),
 '2222': array([ 0.6585641 ,  0.88853322]),
 '2223': array([ 0.86361739,  0.9184408 ]),
 '2224': array([ 0.63855844,  0.7928457 ]),
 '2225': array([ 0.82130265,  0.91107735]),
 '2226': array([ 0.6313761 ,  0.70555257]),
 '2231': array([ 0.03254904,  0.0518298 ]),
 '2232': array([ 0.03509435,  0.1086498 ]),
 '2234': array([ 0.2260525 ,  0.04864769]),
 '2235': array([ 0.89982842,  0.89565456]),
 '2238': array([ 0.89761814,  0.27345767]),
 '2239': array([ 0.64338789,  0.89526098]),
 '2320': array([ 0.08957042,  0.13976592]),
 '2331': array([ 0.18510404,  0.60265307]),
 '2332': array([ 0.65965969,  0.65254653]),
 '2440': array([ 0.92682841,  0.03693209]),
 '2450': array([ 0.83349767,  0.33944937]),
 '2460': array([ 0.67228629,  0.60652272]),
 '2471': array([ 0.70826501,  0.76196805]),
 '2472': array([ 0.71409037,  0.32612238]),
 '2479': array([ 0.72532445,  0.27162363]),
 '2481': array([ 0.70200867,  0.2377914 ]),
 '2482': array([ 0.69333768,  0.6876424 ]),
 '2483': array([ 0.72583599,  0.33236362]),
 '2511': array([ 0.60326761,  0.2717982 ]),
 '2512': array([ 0.26727696,  0.3284363 ]),
 '2516': array([ 0.28198083,  0.20645341]),
 '2517': array([ 0.28380044,  0.2854971 ]),
 '2518': array([ 0.30709595,  0.29114455]),
 '2519': array([ 0.26960684,  0.22832685]),
 '2613': array([ 0.95041755,  0.86696198]),
 '2614': array([ 0.95643846,  0.85600747]),
 '2631': array([ 0.83970555,  0.87558221]),
 '2632': array([ 0.86337747,  0.89885954]),
 '2633': array([ 0.8858473 ,  0.34237038]),
 '2634': array([ 0.83590764,  0.84100296]),
 '2640': array([ 0.88716846,  0.90016298]),
 '2651': array([ 0.77513431,  0.98050717]),
 '2652': array([ 0.57734971,  0.76283782]),
 '2654': array([ 0.89256758,  0.8735841 ]),
 '2655': array([ 0.07427784,  0.31152607]),
 '2659': array([ 0.04486092,  0.07931673]),
 '2665': array([ 0.76720212,  0.65406317]),
 '2666': array([ 0.87284019,  0.80768682]),
 '2667': array([ 0.8687509 ,  0.76724703]),
 '2671': array([ 0.16250802,  0.59394005]),
 '2672': array([ 0.83193274,  0.56803593]),
 '2681': array([ 0.82133468,  0.58656586]),
 '2682': array([ 0.8040943,  0.580486 ]),
 '2683': array([ 0.81835573,  0.64206305]),
 '2685': array([ 0.82385602,  0.61964779]),
 '2686': array([ 0.79764116,  0.59487207]),
 '2687': array([ 0.81130387,  0.62895415]),
 '2690': array([ 0.76825037,  0.31272338]),
 '2711': array([ 0.78379204,  0.31477512]),
 '2712': array([ 0.74363215,  0.9548006 ]),
 '2713': array([ 0.67388406,  0.82900501]),
 '2714': array([ 0.75322086,  0.91333095]),
 '2731': array([ 0.87287947,  0.14234056]),
 '2732': array([ 0.78951561,  0.29040494]),
 '2733': array([ 0.83057122,  0.97508158]),
 '2734': array([ 0.74239301,  0.66152177]),
 '2741': array([ 0.27829636,  0.09395584]),
 '2742': array([ 0.42074917,  0.81771088]),
 '2771': array([ 0.0281311 ,  0.16531763]),
 '2772': array([ 0.00843426,  0.18785427]),
 '2782': array([ 0.48017127,  0.27235467]),
 '2783': array([ 0.84413469,  0.13412067]),
 '2784': array([ 0.2786335 ,  0.12279635]),
 '2785': array([ 0.47610507,  0.24909347]),
 '2786': array([ 0.79159524,  0.90178481]),
 '2789': array([ 0.72492325,  0.21828666]),
 '2814': array([ 0.45309889,  0.17766834]),
 '2815': array([ 0.43892666,  0.21527658]),
 '2816': array([ 0.45250286,  0.25757046]),
 '2820': array([ 0.77408505,  0.23747915]),
 '2860': array([ 0.44023875,  0.12169914]),
 '2871': array([ 0.92439498,  0.06866619]),
 '2872': array([ 0.25038505,  0.11265106]),
 '2873': array([ 0.43882644,  0.18292515]),
 '2874': array([ 0.88864811,  0.03565057]),
 '2875': array([ 0.88159709,  0.05911271]),
 '2876': array([ 0.83201931,  0.65372713]),
 '2877': array([ 0.43953774,  0.1520278 ]),
 '2879': array([ 0.89455958,  0.10782383]),
 '2881': array([ 0.8630308 ,  0.00104601]),
 '2882': array([ 0.77729795,  0.27832595]),
 '2890': array([ 0.99922303,  0.21329083]),
 '2911': array([ 0.88504165,  0.24126752]),
 '2919': array([ 0.77917735,  0.03597681]),
 '2922': array([ 0.81284161,  0.96245573]),
 '2923': array([ 0.91000654,  0.53332889]),
 '2924': array([ 0.85546701,  0.30850957]),
 '2925': array([ 0.07335365,  0.        ]),
 '2926': array([ 0.08074328,  0.02176416]),
 '2927': array([ 0.08039992,  0.04975361]),
 '2929': array([ 0.93825216,  0.24504966]),
 '3221': array([ 0.36525415,  0.86900837]),
 '3222': array([ 0.34622862,  0.85539388]),
 '3223': array([ 0.34309396,  0.90723761]),
 '3224': array([ 0.55060698,  0.76872763]),
 '3231': array([ 0.35863778,  0.77780698]),
 '3232': array([ 0.35859708,  0.81352398]),
 '3330': array([ 0.79101103,  0.93930801]),
 '3345': array([ 0.24854934,  0.53092416]),
 '3351': array([ 0.5904266 ,  0.29555009]),
 '3352': array([ 0.50993242,  0.82164543]),
 '3353': array([ 0.46727892,  0.33350608]),
 '3354': array([ 0.54617512,  0.22026095]),
 '3413': array([ 0.5427787 ,  0.15929723]),
 '3414': array([ 0.24644195,  0.06940038]),
 '3415': array([ 0.00462231,  0.16217638]),
 '3510': array([ 0.7207862,  0.5978446]),
 '4111': array([ 0.9687307 ,  0.06569214]),
 '4113': array([ 0.78741144,  0.8114121 ]),
 '4232': array([ 0.65416533,  0.93610695]),
 '4233': array([ 0.83357449,  0.91714447]),
 '4234': array([ 0.09625087,  0.66562548]),
 '4235': array([ 0.89763832,  0.06786765]),
 '4236': array([ 0.62580473,  0.82483688]),
 '4239': array([ 0.62203945,  0.65065887]),
 '4241': array([ 0.14177874,  0.54877923]),
 '4242': array([ 0.07557083,  0.09729116]),
 '4243': array([ 0.05954498,  0.09497869]),
 '4244': array([ 0.0776234 ,  0.11891703]),
 '4245': array([ 0.19628943,  0.78602735]),
 '4249': array([ 0.83176592,  0.94717509]),
 '4311': array([ 0.10639469,  0.58077953]),
 '4312': array([ 0.73798905,  0.24154914]),
 '4313': array([ 0.43115874,  0.23462559]),
 '4314': array([ 0.06234906,  0.0097022 ]),
 '5111': array([ 0.1182396 ,  0.39758889]),
 '5112': array([ 0.13964068,  0.65654201]),
 '5113': array([ 0.2408855 ,  0.61994932]),
 '5114': array([ 0.50727379,  0.74187816]),
 '5121': array([ 0.67354621,  0.19453222]),
 '5122': array([ 0.2369537 ,  0.55618519]),
 '5123': array([ 0.21698937,  0.51091868]),
 '5137': array([ 0.19683758,  0.5789503 ]),
 '5138': array([ 0.14626244,  0.41431799]),
 '5139': array([ 0.48896044,  0.23895828]),
 '5145': array([ 0.25736227,  0.48709669]),
 '5146': array([ 0.17164977,  0.46116762]),
 '5147': array([ 0.27403819,  0.69152813]),
 '5148': array([ 0.13066138,  0.50622197]),
 '5154': array([ 0.21480677,  0.56121935]),
 '5155': array([ 0.17332244,  0.72248356]),
 '5156': array([ 0.19836384,  0.50859743]),
 '5157': array([ 0.15055788,  0.46377554]),
 '5161': array([ 0.15621654,  0.62436338]),
 '5162': array([ 0.22154389,  0.53296654]),
 '5163': array([ 0.26443174,  0.60476054]),
 '5169': array([ 0.30361748,  0.47660635]),
 '5221': array([ 0.46423497,  0.25323167]),
 '5222': array([ 0.66119708,  0.74469191]),
 '5223': array([ 0.11047742,  0.64475102]),
 '5224': array([ 0.41880427,  0.23688287]),
 '5225': array([ 0.6746201 ,  0.74894367]),
 '5231': array([ 0.6539533 ,  0.69733754]),
 '5232': array([ 0.64683007,  0.64638447]),
 '5233': array([ 0.46798502,  0.22021629]),
 '5239': array([ 0.47432143,  0.29511219]),
 '5241': array([ 0.25789256,  0.09773386]),
 '5249': array([ 0.43976042,  0.8306188 ]),
 '5311': array([ 0.2615462 ,  0.67774162]),
 '5312': array([ 0.24684077,  0.77994729]),
 '5322': array([ 0.14780281,  0.52193584]),
 '5323': array([ 0.16483818,  0.52566235]),
 '5331': array([ 0.15093628,  0.7437643 ]),
 '5332': array([ 0.28657723,  0.59255729]),
 '5334': array([ 0.61928322,  0.38998485]),
 '5335': array([ 0.62132804,  0.42463101]),
 '5411': array([ 0.23510525,  0.65566053]),
 '5413': array([ 0.49946297,  0.30598876]),
 '5414': array([ 0.37999854,  0.36168953]),
 '5415': array([ 0.2383942 ,  0.47588987]),
 '5416': array([ 0.28117032,  0.50736272]),
 '5417': array([ 0.51303938,  0.35335896]),
 '5419': array([ 0.39193688,  0.39493862]),
 '5513': array([ 0.89794748,  0.25117236]),
 '5514': array([ 0.19265012,  0.43933923]),
 '5530': array([ 0.51249065,  0.30261755]),
 '5541': array([ 0.76061005,  0.22435716]),
 '5542': array([ 0.66101476,  0.3592627 ]),
 '5543': array([ 0.60384159,  0.34035948]),
 '5621': array([ 0.66674643,  0.70111621]),
 '5622': array([ 0.66747715,  0.78889336]),
 '5623': array([ 0.73299791,  0.90919155]),
 '5629': array([ 0.96421058,  0.12554447]),
 '5721': array([ 0.71034517,  0.23708365]),
 '5722': array([ 0.70676156,  0.17141196]),
 '5723': array([ 0.10320986,  0.20534443]),
 '5821': array([ 0.60728432,  0.44374991]),
 '5822': array([ 0.47539385,  0.67607381]),
 '5823': array([ 0.13149249,  0.13068481]),
 '5824': array([ 0.2639243 ,  0.53004729]),
 '5825': array([ 0.47569113,  0.76236762]),
 '5826': array([ 0.16724095,  0.40149505]),
 '5827': array([ 0.22163546,  0.60160306]),
 '5829': array([ 0.17214926,  0.43555172]),
 '5831': array([ 0.5573469,  0.2643158]),
 '5832': array([ 0.5487989 ,  0.29336149]),
 '5833': array([ 0.14702528,  0.18044327]),
 '5834': array([ 0.54195332,  0.33145707]),
 '5835': array([ 0.20486581,  0.61774253]),
 '5836': array([ 0.1848795 ,  0.57368452]),
 '5837': array([ 0.61025138,  0.27755482]),
 '5838': array([ 0.33353248,  0.34209283]),
 '5839': array([ 0.18615722,  0.53841698]),
 '5841': array([ 0.31173794,  0.68176828]),
 '5842': array([ 0.11728222,  0.62398151]),
 '5843': array([ 0.19358744,  0.75290674]),
 '5849': array([ 0.55371141,  0.81029081]),
 '5851': array([ 0.32337169,  0.28502621]),
 '5852': array([ 0.37614729,  0.28343139]),
 '5911': array([ 0.63259489,  0.21667949]),
 '5912': array([ 0.60951457,  0.23396647]),
 '5913': array([ 0.62235698,  0.24476569]),
 '5914': array([ 0.62512728,  0.28033367]),
 '5921': array([ 0.62001031,  0.68929521]),
 '5922': array([ 0.51536335,  0.7194377 ]),
 '5981': array([ 0.32864126,  0.25585456]),
 '5982': array([ 0.18515119,  0.74272067]),
 '5983': array([ 0.23537272,  0.68705152]),
 '5989': array([ 0.2538109 ,  0.59635806]),
 '6112': array([ 0.11960962,  0.52932729]),
 '6113': array([ 0.80345508,  0.03655887]),
 '6114': array([ 0.80339286,  0.08908231]),
 '6115': array([ 0.8141217 ,  0.14192756]),
 '6116': array([ 0.80688516,  0.11900779]),
 '6118': array([ 0.79180416,  0.60836622]),
 '6121': array([ 0.4024542 ,  0.82458109]),
 '6122': array([ 0.94099771,  0.86642639]),
 '6123': array([ 0.84585476,  0.47133674]),
 '6129': array([ 0.7789375 ,  0.58126782]),
 '6130': array([ 0.73579098,  0.75559118]),
 '6210': array([ 0.52922059,  0.44971595]),
 '6251': array([ 0.5914618 ,  0.58573414]),
 '6252': array([ 0.58283612,  0.42871921]),
 '6253': array([ 0.13587762,  0.68083567]),
 '6254': array([ 0.25565177,  0.80323344]),
 '6259': array([ 0.58099404,  0.40789842]),
 '6281': array([ 0.14166451,  0.72954294]),
 '6282': array([ 0.46495837,  0.49206625]),
 '6289': array([ 0.52058877,  0.49442299]),
 '6330': array([ 0.90761362,  0.0461323 ]),
 '6341': array([ 0.70851853,  0.35765606]),
 '6342': array([ 0.71926057,  0.72956388]),
 '6343': array([ 0.69478525,  0.55933423]),
 '6344': array([ 0.75620804,  0.70145846]),
 '6349': array([ 0.7737313 ,  0.71876574]),
 '6351': array([ 0.71900676,  0.51996288]),
 '6352': array([ 0.55568805,  0.71636694]),
 '6353': array([ 0.72384296,  0.49437926]),
 '6354': array([ 0.88276861,  0.53119118]),
 '6359': array([ 0.7177487 ,  0.54539234]),
 '6411': array([ 0.28785093,  0.34077968]),
 '6412': array([ 0.28639342,  0.41149598]),
 '6413': array([ 0.31941734,  0.23644026]),
 '6415': array([ 0.57426156,  0.45762499]),
 '6416': array([ 0.66813101,  0.54237726]),
 '6417': array([ 0.74688477,  0.4757809 ]),
 '6418': array([ 0.40626467,  0.73183203]),
 '6419': array([ 0.56086123,  0.79400527]),
 '6421': array([ 0.759203  ,  0.37490863]),
 '6422': array([ 0.69600167,  0.50252404]),
 '6423': array([ 0.83273999,  0.53707692]),
 '6424': array([ 0.64261753,  0.40849622]),
 '6428': array([ 0.63694952,  0.45416822]),
 '6511': array([ 0.93739829,  0.83281852]),
 '6512': array([ 0.7783232,  0.5600287]),
 '6513': array([ 0.82931717,  0.78161074]),
 '6514': array([ 0.74963088,  0.58886941]),
 '6515': array([ 0.91429133,  0.80559366]),
 '6516': array([ 0.80853784,  0.71002683]),
 '6517': array([ 0.76129712,  0.57515536]),
 '6518': array([ 0.92577216,  0.83092071]),
 '6519': array([ 0.63740831,  0.65841502]),
 '6521': array([ 0.84509442,  0.8076354 ]),
 '6522': array([ 0.83932557,  0.75143913]),
 '6531': array([ 0.84404161,  0.700481  ]),
 '6532': array([ 0.86307119,  0.73691507]),
 '6534': array([ 0.85417137,  0.63932685]),
 '6535': array([ 0.86984207,  0.70915452]),
 '6536': array([ 0.90487109,  0.81415227]),
 '6538': array([ 0.88901898,  0.77088569]),
 '6539': array([ 0.85237916,  0.77012548]),
 '6541': array([ 0.95415405,  0.8184305 ]),
 '6542': array([ 0.5330556 ,  0.77175682]),
 '6543': array([ 0.53122327,  0.74144565]),
 '6544': array([ 0.52849655,  0.70497414]),
 '6545': array([ 0.87967458,  0.86734434]),
 '6546': array([ 0.5600304 ,  0.46628081]),
 '6549': array([ 0.88171563,  0.70925668]),
 '6551': array([ 0.90953419,  0.83660523]),
 '6552': array([ 0.90843738,  0.62318645]),
 '6553': array([ 0.91493282,  0.87159286]),
 '6560': array([ 0.93437582,  0.41551112]),
 '6571': array([ 0.54890691,  0.66432772]),
 '6572': array([ 0.45946214,  0.74604727]),
 '6573': array([ 0.5286919 ,  0.67733789]),
 '6574': array([ 0.87834487,  0.61230861]),
 '6575': array([ 0.9448   ,  0.3065047]),
 '6576': array([ 0.93743301,  0.79210089]),
 '6577': array([ 0.5808668 ,  0.48763773]),
 '6579': array([ 0.57520449,  0.57138092]),
 '6581': array([ 0.88311632,  0.2899568 ]),
 '6582': array([ 0.68526021,  0.75011074]),
 '6583': array([ 0.87852581,  0.68114058]),
 '6584': array([ 0.9045484,  0.460431 ]),
 '6589': array([ 0.90265332,  0.4952028 ]),
 '6591': array([ 0.57420275,  0.84683782]),
 '6592': array([ 0.90628262,  0.74264915]),
 '6593': array([ 0.92341927,  0.7569882 ]),
 '6594': array([ 0.97418221,  0.48608614]),
 '6595': array([ 0.22443524,  0.7288323 ]),
 '6596': array([ 0.94422664,  0.48019854]),
 '6597': array([ 0.02941287,  0.0867315 ]),
 '6611': array([ 0.73576825,  0.66829281]),
 '6612': array([ 0.85094465,  0.28015372]),
 '6613': array([ 0.87937297,  0.11517697]),
 '6618': array([ 0.69051189,  0.45352442]),
 '6623': array([ 0.54969634,  0.64317287]),
 '6624': array([ 0.79568112,  0.50407798]),
 '6631': array([ 0.35104102,  0.53937321]),
 '6632': array([ 0.46009821,  0.61922843]),
 '6633': array([ 0.69775114,  0.48135051]),
 '6635': array([ 0.57359612,  0.55533846]),
 '6637': array([ 0.25713825,  0.43115661]),
 '6638': array([ 0.50837813,  0.78294154]),
 '6639': array([ 0.36015203,  0.68455277]),
 '6641': array([ 0.21866973,  0.6290544 ]),
 '6642': array([ 0.10244763,  0.48788263]),
 '6643': array([ 0.08399068,  0.26835208]),
 '6644': array([ 0.51908231,  0.37779463]),
 '6645': array([ 0.60263127,  0.57977541]),
 '6646': array([ 0.35842476,  0.73846857]),
 '6647': array([ 0.58529454,  0.51034779]),
 '6648': array([ 0.23908675,  0.4142625 ]),
 '6649': array([ 0.64801598,  0.29747236]),
 '6651': array([ 0.7307748 ,  0.18933036]),
 '6652': array([ 0.56593882,  0.36097502]),
 '6658': array([ 0.38868749,  0.76038554]),
 '6664': array([ 0.56849225,  0.33617508]),
 '6665': array([ 0.57080554,  0.31563831]),
 '6666': array([ 0.90745479,  0.59015006]),
 '6671': array([ 0.10361032,  0.28247717]),
 '6672': array([ 0.04306214,  0.159214  ]),
 '6673': array([ 0.00830894,  0.13359332]),
 '6674': array([ 0.06017912,  0.16161407]),
 '6712': array([ 0.45366876,  0.21486859]),
 '6713': array([ 0.45144687,  0.29645154]),
 '6716': array([ 0.41680492,  0.35613347]),
 '6724': array([ 0.40594627,  0.30081004]),
 '6725': array([ 0.39563042,  0.21225847]),
 '6727': array([ 0.4542804 ,  0.34605115]),
 '6731': array([ 0.60169608,  0.79382943]),
 '6732': array([ 0.75155727,  0.25047991]),
 '6733': array([ 0.73846127,  0.16612411]),
 '6744': array([ 0.47313714,  0.39064951]),
 '6745': array([ 0.46076123,  0.40748129]),
 '6746': array([ 0.43455449,  0.37541035]),
 '6747': array([ 0.41617801,  0.30077396]),
 '6749': array([ 0.39073669,  0.2875204 ]),
 '6760': array([ 0.4434554 ,  0.76259154]),
 '6770': array([ 0.61232758,  0.56711779]),
 '6781': array([ 0.41030279,  0.76757546]),
 '6782': array([ 0.42799839,  0.67089774]),
 '6783': array([ 0.36763853,  0.21803602]),
 '6784': array([0, 0]),
 '6785': array([ 0.44994438,  0.4546245 ]),
 '6793': array([ 0.55204451,  0.68659617]),
 '6794': array([ 0.61487515,  0.53987106]),
 '6811': array([ 0.48341865,  0.17432698]),
 '6812': array([ 0.27982188,  0.62920602]),
 '6821': array([ 0.9121584 ,  0.08202509]),
 '6822': array([ 0.42419346,  0.28497293]),
 '6831': array([ 0.27449512,  0.14887924]),
 '6832': array([ 0.32862157,  0.4819488 ]),
 '6841': array([ 0.73153043,  0.79751672]),
 '6842': array([ 0.66065235,  0.45012372]),
 '6851': array([ 0.85044558,  0.14882738]),
 '6852': array([ 0.81178533,  0.19513238]),
 '6861': array([ 0.43569425,  0.29002013]),
 '6863': array([ 0.43248562,  0.26814243]),
 '6871': array([ 0.13860658,  0.43822707]),
 '6872': array([ 0.09787971,  0.68407521]),
 '6880': array([ 0.22974846,  0.63353836]),
 '6891': array([ 0.43467765,  0.8652452 ]),
 '6899': array([ 0.4151838 ,  0.20011889]),
 '6911': array([ 0.69763491,  0.52348493]),
 '6912': array([ 0.64686444,  0.53202113]),
 '6921': array([ 0.66764804,  0.49468137]),
 '6924': array([ 0.72459458,  0.41828609]),
 '6931': array([ 0.58991413,  0.46705626]),
 '6932': array([ 0.83150017,  0.13627893]),
 '6935': array([ 0.62988326,  0.54528298]),
 '6940': array([ 0.34435839,  0.52320267]),
 '6951': array([ 0.68879593,  0.70794715]),
 '6953': array([ 0.39997205,  0.70886368]),
 '6954': array([ 0.37189079,  0.49133355]),
 '6960': array([ 0.45229297,  0.79439096]),
 '6973': array([ 0.64498784,  0.48758861]),
 '6974': array([ 0.88282197,  0.73528138]),
 '6975': array([ 0.60103237,  0.66417792]),
 '6978': array([ 0.912547  ,  0.77794909]),
 '6991': array([ 0.56764178,  0.50370469]),
 '6992': array([ 0.44422482,  0.72707704]),
 '6993': array([ 0.5450297 ,  0.70142567]),
 '6994': array([ 0.45357289,  0.60297553]),
 '6996': array([ 0.62927048,  0.46866495]),
 '6997': array([ 0.55462023,  0.54193995]),
 '6998': array([ 0.59843194,  0.53369345]),
 '6999': array([ 0.44266166,  0.79759849]),
 '7111': array([ 0.78299206,  0.70418702]),
 '7112': array([ 0.80125788,  0.65897846]),
 '7119': array([ 0.78643533,  0.6235441 ]),
 '7126': array([ 0.3535368 ,  0.69776103]),
 '7129': array([ 0.41808184,  0.48431372]),
 '7131': array([ 0.10079132,  0.72521148]),
 '7132': array([ 0.39573747,  0.58906157]),
 '7133': array([ 0.26619172,  0.80217829]),
 '7138': array([ 0.14552812,  0.60598159]),
 '7139': array([ 0.44158378,  0.47817323]),
 '7144': array([ 0.06918631,  0.4474497 ]),
 '7148': array([ 0.09061127,  0.42891357]),
 '7149': array([ 0.13362591,  0.48105541]),
 '7161': array([ 0.36504107,  0.6012461 ]),
 '7162': array([ 0.55505369,  0.59432961]),
 '7163': array([ 0.36270462,  0.36025417]),
 '7169': array([ 0.54706216,  0.5079711 ]),
 '7187': array([ 0.17753982,  0.68656445]),
 '7188': array([ 0.48893716,  0.42472552]),
 '7211': array([ 0.58545152,  0.61810572]),
 '7212': array([ 0.50393949,  0.59676782]),
 '7213': array([ 0.43055259,  0.40568384]),
 '7219': array([ 0.40205993,  0.35339318]),
 '7223': array([ 0.34205465,  0.71383412]),
 '7224': array([ 0.36311234,  0.62755439]),
 '7233': array([ 0.23771603,  0.4414971 ]),
 '7234': array([ 0.37894045,  0.6081294 ]),
 '7239': array([ 0.51705402,  0.46546139]),
 '7243': array([ 0.4681649 ,  0.73995858]),
 '7244': array([ 0.29543643,  0.82116797]),
 '7245': array([ 0.28034235,  0.80174843]),
 '7246': array([ 0.36363926,  0.40811367]),
 '7247': array([ 0.57562465,  0.66388767]),
 '7248': array([ 0.7694627 ,  0.85134527]),
 '7251': array([ 0.31878809,  0.41024952]),
 '7252': array([ 0.33262925,  0.43282861]),
 '7259': array([ 0.32598237,  0.45326066]),
 '7263': array([ 0.30392473,  0.52763251]),
 '7264': array([ 0.31689188,  0.80679259]),
 '7267': array([ 0.32016046,  0.56574871]),
 '7268': array([ 0.27874871,  0.86410348]),
 '7269': array([ 0.32626899,  0.53011587]),
 '7271': array([ 0.61783637,  0.72647066]),
 '7272': array([ 0.3825399 ,  0.33982403]),
 '7281': array([ 0.35329976,  0.47956917]),
 '7283': array([ 0.59422616,  0.72004237]),
 '7284': array([ 0.28413942,  0.73099065]),
 '7361': array([ 0.31444576,  0.72506839]),
 '7362': array([ 0.33515958,  0.63287213]),
 '7367': array([ 0.29863928,  0.77262448]),
 '7368': array([ 0.35636586,  0.58416133]),
 '7369': array([ 0.39396456,  0.53584328]),
 '7371': array([ 0.34361129,  0.60922176]),
 '7372': array([ 0.40862659,  0.58969947]),
 '7373': array([ 0.37051417,  0.51554328]),
 '7411': array([ 0.30758091,  0.65328061]),
 '7412': array([ 0.35432753,  0.50740605]),
 '7413': array([ 0.36003401,  0.45541394]),
 '7414': array([ 0.64461802,  0.43001559]),
 '7415': array([ 0.22847831,  0.29343766]),
 '7416': array([ 0.37853714,  0.56547645]),
 '7421': array([ 0.2706857 ,  0.42055161]),
 '7422': array([ 0.35285934,  0.4132923 ]),
 '7423': array([ 0.44191851,  0.62878915]),
 '7428': array([ 0.32048991,  0.61346783]),
 '7429': array([ 0.34892629,  0.56237711]),
 '7431': array([ 0.49895019,  0.63958327]),
 '7432': array([ 0.40188343,  0.52612177]),
 '7434': array([ 0.42777724,  0.76238958]),
 '7435': array([ 0.32212875,  0.66867816]),
 '7436': array([ 0.48472533,  0.64492014]),
 '7439': array([ 0.40385219,  0.47697277]),
 '7441': array([ 0.55824511,  0.85718571]),
 '7442': array([ 0.37717431,  0.44692482]),
 '7449': array([ 0.42393041,  0.44029554]),
 '7451': array([ 0.39657537,  0.61962323]),
 '7452': array([ 0.31213793,  0.50805802]),
 '7491': array([ 0.39085064,  0.49279807]),
 '7492': array([ 0.44777004,  0.52780879]),
 '7493': array([ 0.44357512,  0.5786204 ]),
 '7499': array([ 0.40424483,  0.43523493]),
 '7511': array([ 0.21147281,  0.15472268]),
 '7512': array([ 0.23020612,  0.22725312]),
 '7518': array([ 0.17173592,  0.25252904]),
 '7521': array([ 0.22056914,  0.18477747]),
 '7522': array([ 0.1080467 ,  0.37179404]),
 '7523': array([ 0.15374747,  0.34544928]),
 '7525': array([ 0.17031443,  0.31034572]),
 '7528': array([ 0.21055285,  0.27111853]),
 '7591': array([ 0.2046156 ,  0.19754963]),
 '7599': array([ 0.20893098,  0.3238278 ]),
 '7611': array([ 0.21927849,  0.39787855]),
 '7612': array([ 0.13249222,  0.23645634]),
 '7621': array([ 0.21478579,  0.35991937]),
 '7622': array([ 0.0982712 ,  0.25571895]),
 '7628': array([ 0.07537459,  0.25307804]),
 '7631': array([ 0.10701435,  0.24058682]),
 '7638': array([ 0.13808367,  0.27939741]),
 '7641': array([ 0.19334456,  0.10392544]),
 '7642': array([ 0.0535842 ,  0.27261462]),
 '7643': array([ 0.15894842,  0.10935721]),
 '7648': array([ 0.19801184,  0.41721466]),
 '7649': array([ 0.18555615,  0.19227612]),
 '7711': array([ 0.79555354,  0.54173005]),
 '7712': array([ 0.19630172,  0.16102436]),
 '7721': array([ 0.46218656,  0.65429317]),
 '7722': array([ 0.15087716,  0.25272253]),
 '7723': array([ 0.17694919,  0.23485623]),
 '7731': array([ 0.77325681,  0.47774131]),
 '7732': array([ 0.41882484,  0.57850638]),
 '7741': array([ 0.10795824,  0.61134124]),
 '7742': array([ 0.25236234,  0.56680808]),
 '7751': array([ 0.23741773,  0.35617705]),
 '7752': array([ 0.65050003,  0.4753557 ]),
 '7753': array([ 0.24413683,  0.26112643]),
 '7754': array([ 0.01223659,  0.28807345]),
 '7757': array([ 0.02393682,  0.31112548]),
 '7758': array([ 0.45186625,  0.70401927]),
 '7761': array([ 0.20132396,  0.38818846]),
 '7762': array([ 0.21428958,  0.4404275 ]),
 '7763': array([ 0.19476213,  0.35896387]),
 '7764': array([ 0.17353918,  0.35881009]),
 '7768': array([ 0.18841445,  0.29642223]),
 '7781': array([ 0.16478119,  0.17786682]),
 '7782': array([ 0.22862391,  0.7638567 ]),
 '7783': array([ 0.4658243 ,  0.44753166]),
 '7784': array([ 0.38132673,  0.71146836]),
 '7788': array([ 0.195401  ,  0.24316837]),
 '7810': array([ 0.50751312,  0.54251133]),
 '7821': array([ 0.48202722,  0.59392936]),
 '7822': array([ 0.35902993,  0.37821753]),
 '7831': array([ 0.61000633,  0.7490222 ]),
 '7832': array([ 0.54714296,  0.73803464]),
 '7841': array([ 0.39849108,  0.67918399]),
 '7842': array([ 0.50519799,  0.61806944]),
 '7849': array([ 0.47644139,  0.51200698]),
 '7851': array([ 0.3821887 ,  0.78542543]),
 '7852': array([ 0.78078701,  0.66253986]),
 '7853': array([ 0.24682526,  0.86619129]),
 '7861': array([ 0.66451891,  0.39075549]),
 '7868': array([ 0.61149888,  0.59279421]),
 '7911': array([ 0.37203591,  0.76082481]),
 '7912': array([ 0.74056954,  0.85014031]),
 '7913': array([ 0.33702822,  0.68221044]),
 '7914': array([ 0.33372558,  0.74154188]),
 '7915': array([ 0.64866235,  0.73852416]),
 '7919': array([ 0.5219639 ,  0.56033219]),
 '7921': array([ 0.69845476,  0.17758417]),
 '7922': array([ 0.7113436 ,  0.85918171]),
 '7923': array([ 0.08588507,  0.56201818]),
 '7924': array([ 0.04434181,  0.45333998]),
 '7928': array([ 0.1218429 ,  0.55963179]),
 '7929': array([ 0.16271909,  0.55777584]),
 '7931': array([ 0.37016692,  0.30251858]),
 '7932': array([ 0.36137539,  0.27768864]),
 '7933': array([ 0.34746463,  0.23921304]),
 '7938': array([ 0.57225467,  0.15687839]),
 '8121': array([ 0.53300242,  0.5703839 ]),
 '8122': array([ 0.77428007,  0.52232614]),
 '8124': array([ 0.03725617,  0.29147652]),
 '8211': array([ 0.75448997,  0.53243006]),
 '8212': array([ 0.67234332,  0.66020945]),
 '8219': array([ 0.65457673,  0.57164582]),
 '8310': array([ 0.92751023,  0.80489694]),
 '8421': array([ 0.84309681,  0.33939413]),
 '8422': array([ 0.86852507,  0.33846755]),
 '8423': array([ 0.82728249,  0.38966156]),
 '8424': array([ 0.83563695,  0.45307968]),
 '8429': array([ 0.87837461,  0.47333504]),
 '8431': array([ 0.82758338,  0.43273366]),
 '8432': array([ 0.88029672,  0.43019572]),
 '8433': array([ 0.91289227,  0.35640919]),
 '8434': array([ 0.86936941,  0.44665087]),
 '8435': array([ 0.85375164,  0.43375587]),
 '8439': array([ 0.88826292,  0.39065529]),
 '8441': array([ 0.89702303,  0.44324032]),
 '8442': array([ 0.9217298 ,  0.39491965]),
 '8443': array([ 0.8656368 ,  0.51298674]),
 '8451': array([ 0.8623705 ,  0.39820355]),
 '8452': array([ 0.8597894 ,  0.36067589]),
 '8459': array([ 0.89969589,  0.33741519]),
 '8461': array([ 0.85197991,  0.45072645]),
 '8462': array([ 0.87602485,  0.32024115]),
 '8463': array([ 0.8322223,  0.3583221]),
 '8464': array([ 0.88188603,  0.4363494 ]),
 '8465': array([ 0.85903803,  0.32410864]),
 '8471': array([ 0.87051975,  0.56974247]),
 '8472': array([ 0.90139444,  0.4122819 ]),
 '8481': array([ 0.89930226,  0.56997233]),
 '8482': array([ 0.10229584,  0.16685863]),
 '8483': array([ 0.76231808,  0.79956909]),
 '8484': array([ 0.8839007 ,  0.58842387]),
 '8510': array([ 0.85693744,  0.47583752]),
 '8710': array([ 0.11135255,  0.3096711 ]),
 '8720': array([ 0.23663327,  0.50932656]),
 '8731': array([ 0.57970754,  0.71914089]),
 '8732': array([ 0.43515751,  0.69341623]),
 '8741': array([ 0.32200003,  0.35169012]),
 '8742': array([ 0.28681281,  0.56180632]),
 '8743': array([ 0.42274905,  0.53476033]),
 '8744': array([ 0.28829572,  0.53424067]),
 '8745': array([ 0.28280009,  0.66957508]),
 '8748': array([ 0.30160617,  0.57274343]),
 '8749': array([ 0.31739558,  0.59084811]),
 '8811': array([ 0.17630314,  0.09980692]),
 '8812': array([ 0.29917587,  0.69796137]),
 '8813': array([ 0.24521625,  0.73754679]),
 '8821': array([ 0.20848049,  0.58080659]),
 '8822': array([ 0.23073045,  0.57692612]),
 '8830': array([ 0.0511401 ,  0.41524665]),
 '8841': array([ 0.14322101,  0.31788579]),
 '8842': array([ 0.95635533,  0.83930074]),
 '8851': array([ 0.08340829,  0.29280246]),
 '8852': array([ 0.12617427,  0.1832428 ]),
 '8921': array([ 0.50263072,  0.23618703]),
 '8922': array([ 0.59775234,  0.43155301]),
 '8924': array([ 0.18241229,  0.39622961]),
 '8928': array([ 0.61512117,  0.4730495 ]),
 '8931': array([ 0.77247015,  0.3388703 ]),
 '8932': array([ 0.67961649,  0.5321406 ]),
 '8933': array([ 0.12394852,  0.24453451]),
 '8935': array([ 0.53127688,  0.5166906 ]),
 '8939': array([ 0.60569648,  0.50569481]),
 '8941': array([ 0.01561549,  0.335989  ]),
 '8942': array([ 0.0597842 ,  0.21303431]),
 '8946': array([ 0.39006641,  0.69382423]),
 '8947': array([ 0.08591551,  0.16389595]),
 '8951': array([ 0.39496986,  0.73307224]),
 '8952': array([ 0.26056641,  0.74298155]),
 '8959': array([ 0.13669079,  0.585711  ]),
 '8960': array([ 0.11225375,  0.46670194]),
 '8972': array([ 0.11373697,  0.14567796]),
 '8973': array([ 0.88718185,  0.79525253]),
 '8974': array([ 0.14439303,  0.69584547]),
 '8981': array([ 0.08232806,  0.34341559]),
 '8982': array([ 0.1147929 ,  0.33892643]),
 '8983': array([ 0.22617619,  0.25913205]),
 '8989': array([ 0.36358074,  0.33880172]),
 '8991': array([ 0.91715766,  0.66362781]),
 '8993': array([ 0.66076173,  0.16165711]),
 '8994': array([ 0.        ,  0.26769767]),
 '8996': array([ 0.22303989,  0.48619762]),
 '8997': array([ 0.92323727,  0.56249946]),
 '8998': array([ 0.89796377,  0.6392392 ]),
 '8999': array([ 0.01559968,  0.08715906]),
 '9310': array([ 0.57264804,  0.22410863]),
 '9410': array([ 0.80084307,  0.22545207]),
 '9510': array([ 0.48398887,  0.72090865]),
 '9610': array([ 0.31456093,  0.1802939 ]),
 '9710': array([ 0.96938373,  0.22468388])}

In [78]:
#-Check Entry-#
ps_nodes[ps_nodes.sitc == "0011"]


Out[78]:
sitc community x y nodesize leamer pname ncolor x_scaled y_scaled
491 0011 239 2847.516846 3320.90686 60.540527 5 ANIMALS OF THE BOVINE SPECIES,INCL.BUFFALOES,LIVE #9fb3bf 0.636358 0.293057

In [79]:
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(50,20))
ax = fig.gca()
nx.draw_networkx(ps, ax=ax, pos=coord, node_size=750, width=1)
plt.savefig("productspace1.png")


Note: File is saved Locally to view the network in greater detail


In [80]:
#-Let's See where Apparel Chapter 84 Nodes are Located-#
def choose_color(x):
    if x[:2] == "84":
        return "b"
    else:
        return "r"
    
nodes = pd.DataFrame(sorted(list(coord.keys())), columns=["nodeid"])
nodes["color"] = nodes["nodeid"].apply(lambda x: choose_color(x))

In [81]:
nodes[nodes.color == 'b'].head()


Out[81]:
nodeid color
689 8421 b
690 8422 b
691 8423 b
692 8424 b
693 8429 b

In [82]:
#-Get the Order of Nodes the Same as Network Node List
order = pd.DataFrame(ps.nodes()).reset_index()
order.columns = ["order", "nodeid"]
nodes = nodes.merge(order, how="inner", on="nodeid")
nodes = nodes.sort_values(by="order")

In [83]:
fig = plt.figure(figsize=(50,20))
ax = fig.gca()
nx.draw_networkx(ps, ax=ax, pos=coord, node_size=750, width=1, node_color = nodes.color.values, )
plt.savefig("productspace2.png")



In [84]:
#Can Output to use with Gephi / Cytoscape (Exploratory Network Tools)
nx.write_gml(ps, "product_space.gml")

References

[1] Zachary, W. (1977), "An Information Flow Model for Conflict and Fission in Small Groups", Journal of Anthropological Research, Vol. 33, No. 4 (Winter, 1977), pp. 452-473

[2] Cao, X., Wang X., Jin D., Cao Y. & He, D. (2013), "Identifying overlapping communities as well as hubs and outliers via nonnegative matrix factorization", Scientific Reports, Vol 3, Issue 2993

[3] Hidalgo, C.A., Klinger, B., Barabasi, A.-L., Hausmann, R. (2007), "The Product Space Conditions the Development of Nations", Science, Vol 317, pp 482-487

[4] Atlas of Complexity (http://atlas.cid.harvard.edu/)

[5] The Observatory of Economic Complexity (http://atlas.media.mit.edu/en/)

[6] Atlas of Complexity Gride Points for Nodes sourced from http://www.michelecoscia.com/?page_id=223

[7] Balassa, B. (1965), "Trade Liberalisation and Revealed Comparative Advantage", The Manchester School, 33, 99-123.


In [ ]: