Advanced Data Analysis using Python

Matthew McKay

In this notebook we demonstrate a few of the Python ecosystem tools that enable research in areas that can be difficult to do using traditional tools such as Stata that are typically fit-for-purpose tools.

The agility of a full programming language environment allows for a high degree of flexibility and the Python ecosystem provides a vast toolkit to remain productive.

The Product Space Network (Hidalgo, 2007)
Quick introduction to Networks and Graphs
Replicate Product Space Proximity Measure
- Compute Revealed Comparative Advantage and $M_{cp}$ matrices [Tools: Pandas] (786 Products, 200+ Countries, and 53 Years)
- Compute Proximity Matrices ($\phi_{pp'}$) and make this code run fast [Tools: Pandas, Numpy, Numba, Dask]
- (Extension) Building Networks and Plotting Product Space Network Diagrams - albiet not as fancy [Tools: NetworkX]

Atlas of Complexity Product Space Map

Some Initial Observations

Oil (3330), has a large world export share, but is not strongly co-exported (i.e. connected in the network) with any other products (other than LNG).

Machinery, Electronics, Garments are all sectors that have a high degree of co-export potential with other related products and form part of a densely connected core of the network.

Developing Economies typically occupy products in the weakly connected periphery of the network and new products tend to emerge close to exisiting products in the network. (established from analysis using the Product Space network). Middle Income Countries manage to diffuse into the densely connected core of the product space

Network Analysis

Interest in studying networks is increasing within Economics with recent publications building network type features into their models, or using network analysis to uncover structural features of data that may otherwise go unexplored.

What is a Network (Graph)?

Many people who have interacted with tools from network analysis have done so via the idea of Social Network Analysis (SNA).

A Graph is a way of specifying relationships among a collection of items

They consist of a collection of nodes (or vertices) that are joined together by edges.



In [1]:

    
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('seaborn')



In [2]:

    
g = nx.Graph()



In [3]:

    
g.add_nodes_from(["A", "B", "C", "D"])



In [4]:

    
nx.draw_networkx(g)
plt.show()



In [5]:

    
g.add_edge("A", "B")   #Add Edge between Nodes A and B
g.add_edge("A", "C")   #Add Edge between Nodes A and C
g.add_edge("A", "D")   #Add Edge between Nodes A and D



In [6]:

    
nx.draw_networkx(g)
plt.show()

You can use network metrics to learn more about the structure. What is the most central node?



In [7]:

    
nx.degree_centrality(g)









    Out[7]:





{'A': 1.0,
 'B': 0.3333333333333333,
 'C': 0.3333333333333333,
 'D': 0.3333333333333333}



In [8]:

    
g.add_edge("C","D")



In [9]:

    
nx.draw_networkx(g)
plt.show()



In [10]:

    
nx.degree_centrality(g)









    Out[10]:





{'A': 1.0,
 'B': 0.3333333333333333,
 'C': 0.6666666666666666,
 'D': 0.6666666666666666}

What can we learn from Networks?

One early example of Social Network Analysis was conducted by Zachary (1977) who set out to use network analysis to explain factional dynamics and to understand fission in small groups. A network of friendship was used to understand and identify how this Karate group eventually split due to an initial conflict between two members.

Nodes: Individuals
Edges: Connections were added between two individuals if they were consistently observed to interact outside the normal activities of the club.

We can learn things by considering the structure of these networks

The structure of these relationships can be exploited to uncover new insights into the data:

Communities (through Clustering)
Identification of main actors in Social Networks (Centrality Metrics)
Identifying indirect relationships through shortest / longest paths
Diffusion characteristics on temporal networks (such as disease transmission modeling)
... + many other applications across many different sciences

One visualization (Cao, 2013) demonstrates how algorithmic analysis can reveal meaningful structure that clearly identifies roles played by certain individuals, that is based on observing simpler relational information on friendship between pairs of individuals.

Replicating the Product Space Network using International Trade Data (Hidalgo, 2007)

Let's focus on an application of network analysis that is applied to international trade data to replicate some of the results contained in the Hidalgo (2007) paper and later in the The Atlas of Complexity and The Observatory of Economic Complexity.

The Hidalgo (2007) paper is used as a motivating example to demonstrate various tools that are available in the Python ecosystem.

In this setting we want to looking at a characterisation of International Trade data by considering:

Nodes: Products
Edges: the likelihood of two products being co-exported

Assumption: If products are highly co-exported across countries, then the products are revealed to be more likely to share similar factors of production (or capabilities) required to produce them. For example, Shirts and Pants require a set of similar skills that lend themselves to be co-exported, while shirts and cars are much more dissimilar.

This relational information between products can be represented by a edge weights.

A high value means they have a high likelihood of being co-exported



In [11]:

    
g = nx.Graph()
g.add_edge("P1", "P2")
pos = nx.spring_layout(g)
nx.draw_networkx(g, pos=pos, node_size=600, font_size=14)
nx.draw_networkx_edge_labels(g, pos=pos, edge_labels={("P1", "P2") : "Coexport Probability: Min{P(P1 | P2), P(P2 | P1)}"}, font_size=14)
plt.show()

Let's work with a Toy Example with 8 products



In [12]:

    
adj = pd.read_csv("./data/simple_productspace.csv", names=["P1", "P2", "weight"])
adj



In [13]:

    
g = nx.read_weighted_edgelist("./data/simple_productspace.csv", delimiter=",")



In [14]:

    
g.nodes()









    Out[14]:





['Shirts', 'Pants', 'Cars', 'Cows', 'Sugar', 'Oil', 'Aircraft', 'Wheat']



In [15]:

    
#Visualize the Network
pos = nx.spring_layout(g)
weights = [g[u][v]['weight']*10 for u,v in g.edges()]
nx.draw_networkx(g, node_size=1200, pos=pos, width=weights)
plt.show()

Scale Up to Full set of Products (SITC R2 L4)

We now want to compute the edge weights to explore the full product space network derived from product level international trade data by computing the proximity matrix:

$$ \phi_{ij} = \min \{ P(RCA_i >= 1 \hspace{0.25cm}| \hspace{0.25cm} RCA_j >= 1), P(RCA_j >= 1 \hspace{0.25cm} | \hspace{0.25cm} RCA_i >= 1) \} $$

Proximity: A high proximity value suggests any two products are exported by a similar set of countries.

The tasks involve:

Compute Revealed Comparative Advantage and $M_{cp}$ matrices [Tools: Pandas]
Compute Proximity Matrices ($\phi_{pp'}$) and make this code run fast [Tools: Pandas, Numpy, Numba, Dask]
Building Networks and Plotting Product Space Network Diagrams - albiet not as fancy [Tools: NetworkX]

Computing Proximity



In [16]:

    
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from numba import jit
import networkx as nx
from bokeh.io import output_notebook



In [17]:

    
#-Load Jupyter Extensions-#
%matplotlib inline
output_notebook()









    





    
        
        Loading BokehJS ...

Data

International Trade Data is largely available in SITC and HS product classification systems.

In this notebook we will focus on SITC revision 2 Level 4 data with 786 defined products.

Classification	Level	Products
SITC	4	786
HS	6	5016

Note:

We use SITC data in this seminar, but as you can see performance of code becomes even more important when working with fully disaggregated HS international trade data



In [18]:

    
fl = "./data/year_origin_sitc_rev2.csv"
data = pd.read_csv(fl, converters={'sitc':str})   #Import SITC codes as strings to preserve formatting



In [19]:

    
data.head()

Question 1: What years are available in this dataset?

Hint: There is a method named unique(), so you should get the array of years and then call .unique()



In [ ]:

data['year'].unique()

Question 2: How many non-zero trade flow values are in this dataset?



In [ ]:

data.shape[0]

Question 3: What countries are available in this dataset?



In [ ]:

data['origin'].unique()



In [20]:

    
data[(data['year'] == 2000)&(data['origin']=="AUS")].head()









    Out[20]:







  
    
      
      year
      origin
      sitc
      export
    
  
  
    
      2213361
      2000
      AUS
      0011
      260132101.0
    
    
      2213362
      2000
      AUS
      0012
      167611315.0
    
    
      2213363
      2000
      AUS
      0013
      280098.0
    
    
      2213364
      2000
      AUS
      0014
      548603.0
    
    
      2213365
      2000
      AUS
      0015
      134571371.0



In [21]:

    
data[data['origin'] == 'AUS'].set_index(["year","sitc"])["export"].unstack(level="year").head()









    Out[21]:







  
    
      year
      1962
      1963
      1964
      1965
      1966
      1967
      1968
      1969
      1970
      1971
      ...
      2005
      2006
      2007
      2008
      2009
      2010
      2011
      2012
      2013
      2014
    
    
      sitc
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      0010
      NaN
      NaN
      NaN
      1000.0
      NaN
      NaN
      NaN
      1000.0
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      0011
      1286000.0
      675000.0
      1177000.0
      1700000.0
      748000.0
      548000.0
      586000.0
      747000.0
      331000.0
      443000.0
      ...
      283381176.0
      284544151.0
      365401097.0
      571444736.0
      586230117.0
      793934945.0
      650001457.0
      627274883.0
      728998394.0
      1.114734e+09
    
    
      0012
      2600000.0
      3619000.0
      5593000.0
      3560000.0
      4725000.0
      5078000.0
      5578000.0
      6828000.0
      11517000.0
      13321000.0
      ...
      211748259.0
      215548475.0
      216467212.0
      307677017.0
      296862273.0
      319842757.0
      347639153.0
      296233596.0
      174194459.0
      2.209887e+08
    
    
      0013
      26000.0
      52000.0
      81000.0
      69000.0
      60000.0
      32000.0
      99000.0
      108000.0
      108000.0
      189000.0
      ...
      260620.0
      309274.0
      549802.0
      153458.0
      37764.0
      224888.0
      NaN
      NaN
      345256.0
      2.302240e+05
    
    
      0014
      375000.0
      398000.0
      445000.0
      187000.0
      83000.0
      73000.0
      30000.0
      83000.0
      263000.0
      258000.0
      ...
      592790.0
      2842029.0
      3599939.0
      4708910.0
      6542608.0
      5466152.0
      5372783.0
      4237839.0
      1165708.0
      3.056300e+05
    
  

5 rows × 53 columns

Computing Revealed Comparative Advantage

The literature uses the standard Balassa definition for Revealed Comparative Advantage

$$ \large RCA_{cpt} = \frac{\frac{E_{cpt}}{E_{ct}}}{\frac{E_{pt}}{E_t}} $$

where,

$E_{cpt}$ are exports from country $c$ in product $p$ at time $t$
$E_{ct}$ are total country $c$ exports at time $t$
$E_{pt}$ are total product $p$ exports at time $t$
$E_{t}$ are total world exports at time $t$

Reference: Balassa, B. (1965), Trade Liberalisation and Revealed Comparative Advantage, The Manchester School, 33, 99-123.

To compute RCA we need to aggregate data at difference levels to obtain each component of the fraction defined above.

Let's break the equation down to figure out what needs to be computed:

$$ \large E_{ct} = \sum_{p}{E_{cpt}} $$



In [22]:

    
cntry_export = data[["year", "origin", "export"]].groupby(by=["year", "origin"]).sum()
cntry_export.head(n=2)









    Out[22]:







  
    
      
      
      export
    
    
      year
      origin
      
    
  
  
    
      1962
      AFG
      86135000.0
    
    
      AGO
      119458000.0

This gives us a pandas.DataFrame that is indexed by a multi-index object. This can be very useful but we would like to use this data in the original data table for each product exported at time t by each country. We could use this new object and:

merge the data back into the original data DataFrame
use transform to request an object that is of the same shape as the original data DataFrame.



In [23]:

    
data["cntry_export"] = data[["year", "origin", "export"]].groupby(by=["year", "origin"]).transform(np.sum)
data["prod_export"] = data[["year", "sitc", "export"]].groupby(by=["year", "sitc"]).transform(np.sum)
data["world_export"] = data[["year", "export"]].groupby(by=["year"]).transform(np.sum)

Now that the components of the equation have been computed we can now simply calculate $RCA$ as expressed by the original fraction



In [24]:

    
data["rca"] = (data["export"] / data["cntry_export"]) / (data["prod_export"] / data["world_export"])



In [25]:

    
data.head()









    Out[25]:







  
    
      
      year
      origin
      sitc
      export
      cntry_export
      prod_export
      world_export
      rca
    
  
  
    
      0
      1962
      AFG
      0230
      4000.0
      86135000.0
      438581000.0
      1.428420e+11
      0.015125
    
    
      1
      1962
      AFG
      0250
      66000.0
      86135000.0
      261448000.0
      1.428420e+11
      0.418634
    
    
      2
      1962
      AFG
      0540
      74000.0
      86135000.0
      48924000.0
      1.428420e+11
      2.508338
    
    
      3
      1962
      AFG
      0545
      17000.0
      86135000.0
      349188000.0
      1.428420e+11
      0.080736
    
    
      4
      1962
      AFG
      0548
      33000.0
      86135000.0
      85126000.0
      1.428420e+11
      0.642877

Computing $M_{cp}$ Matrix: Who Exports What Products and When?

$RCA >= 1$ is where country $c$ has a revealed comparative advantage in product $p$ at time $t$

Therefore we can define the matrix $M_{cp}$:

$$ M_{cp} = \begin{cases} 1 \text{ if }RCA \ge 1\\ 0 \text{ if }RCA \lt 1 \end{cases} $$

We can first construct $RCA$ matrices and then compute $M_{cp}$ using a conditional map



In [26]:

    
#-Generate Yearly RCA Mcp Matrices and store them in a Dictionary-#
rca = {}
for year in data.year.unique():
    yr = data[data.year == year].set_index(['origin', 'sitc']).unstack('sitc')['rca']
    rca[year] = yr



In [27]:

    
rca[2000].head()









    Out[27]:







  
    
      sitc
      0011
      0012
      0013
      0014
      0015
      0111
      0112
      0113
      0114
      0115
      ...
      8994
      8996
      8997
      8998
      8999
      9310
      9410
      9510
      9610
      9710
    
    
      origin
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      ABW
      NaN
      NaN
      NaN
      NaN
      NaN
      0.000189
      NaN
      0.008223
      NaN
      NaN
      ...
      NaN
      0.009006
      0.000910
      0.006053
      0.001468
      1.264113
      0.104766
      NaN
      NaN
      0.76756
    
    
      AFG
      NaN
      NaN
      NaN
      NaN
      NaN
      0.176481
      0.066896
      NaN
      NaN
      NaN
      ...
      NaN
      0.067773
      0.782384
      NaN
      NaN
      0.066374
      NaN
      NaN
      NaN
      NaN
    
    
      AGO
      NaN
      NaN
      0.000012
      NaN
      NaN
      0.001756
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      0.000137
      NaN
      0.026715
      NaN
      NaN
      0.005944
      NaN
    
    
      AIA
      NaN
      NaN
      NaN
      NaN
      NaN
      15.716737
      6.270237
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      0.649915
      6.906314
      NaN
      NaN
      NaN
      NaN
    
    
      ALB
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.006836
      NaN
      ...
      0.304894
      0.014938
      0.146457
      0.093441
      NaN
      0.357895
      25.006762
      NaN
      3.426721
      NaN
    
  

5 rows × 775 columns

Question 6: How can we use `rca` to compute the `mcp` matrix?



In [28]:

    
#-Generate Yearly Binary Mcp Matrices-#
mcp = {}
for year in rca.keys():
    mcp[year] = rca[year].fillna(0.0).applymap(lambda x: 1 if x >= 1.0 else 0.0)

#-Generate Yearly Binary Mcp Matrices-#
mcp = {}
for year in rca.keys():
    mcp[year] = rca[year].fillna(0.0).applymap(lambda x: 1 if x >= 1.0 else 0.0)

Question: What is the key assumption implied by the above code?



In [29]:

    
mcp[2000].head()









    Out[29]:







  
    
      sitc
      0011
      0012
      0013
      0014
      0015
      0111
      0112
      0113
      0114
      0115
      ...
      8994
      8996
      8997
      8998
      8999
      9310
      9410
      9510
      9610
      9710
    
    
      origin
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      ABW
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      1.0
      0.0
      0.0
      0.0
      0.0
    
    
      AFG
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      AGO
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      AIA
      0.0
      0.0
      0.0
      0.0
      0.0
      1.0
      1.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      1.0
      0.0
      0.0
      0.0
      0.0
    
    
      ALB
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      1.0
      0.0
      1.0
      0.0
    
  

5 rows × 775 columns

Question 6: What products did Australia ("AUS") export with RCA in 1998?



In [ ]:

products = mcp[1998].loc['AUS']
products[products == 1.0]

Computing Proximity Matrix $\phi_{ij}$

Proximity: A high proximity value suggests any two products are exported by a similar set of countries.

$$ \phi_{ij} = \min \{ P(RCA_i >=1 \hspace{0.25cm} | \hspace{0.25cm} RCA_j >= 1), P(RCA_j >= 1 \hspace{0.25cm} | \hspace{0.25cm} RCA_i >= 1) \} $$

The minimum conditional probability of coexport can be computed:

$$ \phi_{ij} = \frac{\sum_c \{ M_{cp_i} * M_{cp_j} \}}{max \{k_{p_i}, k_{p_j}\}} $$

where,

$k_{p_i}$ is the ubiquity of product $i$ (i.e. the number of countries that export product $i$)
$k_{p_j}$ is the ubiquity of product $j$ (i.e. the number of countries that export product $j$)
$M_{cp_i}$ is the column vector in $M_{cp}$ matrix for product $i$
$M_{cp_j}$ is the column vector in $M_{cp}$ matrix for product $j$
$\sum_c \{ M_{cp_i} * M_{cp_j} \}$ is the number of countries that export both product $i$ and product $j$

The $\phi_{ij}$ matrix is therefore computed through all pairwise combinations of column vectors which is computationally intensive.

Step 1: Compute Proximity Matrix using Pandas



In [30]:

    
def proximity_matrix_pandas(mcp, fillna=True):
    products = sorted(list(mcp.columns))
    sum_products = mcp.sum(axis=0)
    proximity = pd.DataFrame(index=products, columns=products)
    for i, product1 in enumerate(products):
        for j, product2 in enumerate(products):
            if j > i:  #Symmetric Matrix Condition
                continue
            numerator = (mcp[product1] * mcp[product2]).sum()
            denominator = max(sum_products[product1], sum_products[product2])
            if denominator == 0:
                cond_prob =  np.nan
            else:
                cond_prob = numerator / denominator
            proximity.set_value(index=product1, col=product2, value=cond_prob)
            proximity.set_value(index=product2, col=product1, value=cond_prob)
    if fillna:
        proximity = proximity.fillna(0.0)
    return proximity



In [31]:

    
%%time
prox_2000 = proximity_matrix_pandas(mcp[2000])









    



CPU times: user 44.9 s, sys: 528 ms, total: 45.4 s
Wall time: 45.1 s

Check the Data (simple stats and visualizations)

Hidalgo (2007) suggests that 32% of values are < 0.1 and 65% of values are < 0.2



In [32]:

    
prox_2000.unstack().describe()









    Out[32]:





count    600625.000000
mean          0.159894
std           0.106652
min           0.000000
25%           0.081081
50%           0.142857
75%           0.222222
max           1.000000
dtype: float64



In [33]:

    
prox_2000.unstack().hist()









    Out[33]:





<matplotlib.axes._subplots.AxesSubplot at 0x1211ae6d8>

But Wait - Problem!

at ~1 minute this is taking a reasonably long time to compute for one year. This makes working with this data in an agile way problematic and computing for 50 years would take an hour to compute. While this was easy to implement, it isn't very fast!

Let's profile this code to get an understanding where we spend most of our time

For this line to run you will need to install line_profiler by running:

conda install line_profiler



In [34]:

    
# !conda install line_profiler



In [35]:

    
import line_profiler
%load_ext line_profiler



In [36]:

    
# %lprun -f proximity_matrix_pandas proximity_matrix_pandas(mcp[2000])

Step 2: Consider other Python Tools (NumPy)

Most of the time you will want to conduct numerical type computing in NumPy.

The code actually looks pretty similar - the main difference is conducting operations on pure numpy arrays



In [37]:

    
def proximity_matrix_numpy(mcp, fillna=False):
    products = sorted(list(mcp.columns))
    num_products = len(products)
    proximity = np.empty((num_products, num_products))
    col_sums = mcp.sum().values  
    data = mcp.T.as_matrix()                  #This generates a c x p numpy array
    for index1 in range(0,num_products):
        for index2 in range(0,num_products):
            if index2 > index1:
                continue
            numerator = (data[index1] * data[index2]).sum()
            denominator = max(col_sums[index1], col_sums[index2])
            if denominator == 0.0:
                cond_prob = np.nan
            else:
                cond_prob = numerator / denominator
            proximity[index1][index2] = cond_prob
            proximity[index2][index1] = cond_prob
    # Return DataFrame Representation #
    proximity = pd.DataFrame(proximity, index=products, columns=products)
    proximity.index.name = 'productcode1'
    proximity.columns.name = 'productcode2'
    if fillna:
        proximity = proximity.fillna(0.0)
    return proximity



In [38]:

    
%%time
prox_2000_numpy = proximity_matrix_numpy(mcp[2000])









    



CPU times: user 1.39 s, sys: 3.76 ms, total: 1.39 s
Wall time: 1.39 s



In [39]:

    
prox_2000.equals(prox_2000_numpy)









    Out[39]:





True

Step 3: Just in Time Compilation (Numba)

Numba is a package you can use to accelerate your code by using a technique called just in time (or JIT) compilation. It converts your high-level python code to low level llvm code to run it closer to the raw machine level.

nopython=True ensures the jit compiles without any python objects. If it cannot achieve this it will throw an error.

Numba now supports a lot of the NumPy api and can be checked here



In [40]:

    
@jit(nopython=True)
def coexport_probability(data, num_products, col_sums):
    proximity = np.empty((num_products, num_products))
    for index1 in range(0,num_products):
        for index2 in range(0,num_products):
            if index2 > index1:
                continue
            numerator = (data[index1] * data[index2]).sum()
            denominator = max(col_sums[index1], col_sums[index2])
            if denominator == 0.0:
                cond_prob = np.nan
            else:
                cond_prob = numerator / denominator
            proximity[index1][index2] = cond_prob
            proximity[index2][index1] = cond_prob
    return proximity

def proximity_matrix_numba(mcp, fillna=False):
    products = sorted(list(mcp.columns))
    num_products = len(products)
    col_sums = mcp.sum().values  
    data = mcp.T.as_matrix()                  
    proximity = coexport_probability(data, num_products, col_sums)   #Call Jit Function
    # Return DataFrame Representation #
    proximity = pd.DataFrame(proximity, index=products, columns=products)
    proximity.index.name = 'productcode1'
    proximity.columns.name = 'productcode2'
    if fillna:
        proximity = proximity.fillna(0.0)
    return proximity



In [41]:

    
prox_2000_numba = proximity_matrix_numba(mcp[2000])



In [42]:

    
%%timeit
prox_2000_numba = proximity_matrix_numba(mcp[2000])









    



10 loops, best of 3: 131 ms per loop



In [43]:

    
prox_2000_numba.equals(prox_2000)









    Out[43]:





True

Computing All Years



In [44]:

    
%%time
proximity = {}
for year in mcp.keys():
    proximity[year] = proximity_matrix_numba(mcp[year])









    



CPU times: user 6.36 s, sys: 61 ms, total: 6.42 s
Wall time: 6.42 s



In [45]:

    
proximity[2000].head()









    Out[45]:







  
    
      productcode2
      0011
      0012
      0013
      0014
      0015
      0111
      0112
      0113
      0114
      0115
      ...
      8994
      8996
      8997
      8998
      8999
      9310
      9410
      9510
      9610
      9710
    
    
      productcode1
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      0011
      1.000000
      0.448276
      0.206897
      0.206897
      0.275862
      0.37500
      0.275862
      0.344828
      0.206897
      0.241379
      ...
      0.068966
      0.137931
      0.162791
      0.068966
      0.034483
      0.244898
      0.178082
      0.206897
      0.137931
      0.175
    
    
      0012
      0.448276
      1.000000
      0.178571
      0.107143
      0.321429
      0.25000
      0.392857
      0.142857
      0.178571
      0.142857
      ...
      0.035714
      0.178571
      0.116279
      0.071429
      0.000000
      0.163265
      0.219178
      0.071429
      0.107143
      0.125
    
    
      0013
      0.206897
      0.178571
      1.000000
      0.227273
      0.100000
      0.12500
      0.083333
      0.500000
      0.312500
      0.153846
      ...
      0.153846
      0.210526
      0.093023
      0.080000
      0.142857
      0.163265
      0.095890
      0.090909
      0.071429
      0.075
    
    
      0014
      0.206897
      0.107143
      0.227273
      1.000000
      0.181818
      0.15625
      0.125000
      0.318182
      0.363636
      0.090909
      ...
      0.090909
      0.181818
      0.139535
      0.120000
      0.045455
      0.142857
      0.178082
      0.409091
      0.045455
      0.200
    
    
      0015
      0.275862
      0.321429
      0.100000
      0.181818
      1.000000
      0.28125
      0.250000
      0.150000
      0.200000
      0.250000
      ...
      0.050000
      0.350000
      0.139535
      0.040000
      0.050000
      0.122449
      0.150685
      0.136364
      0.150000
      0.125
    
  

5 rows × 775 columns

Using Dask to Compute all Years in Parallel

NOTE: THIS WON'T WORK ON DEMO DOCKER ENVIRONMENT

Now that we have a fast single year computation, we can compute all cross-sections serially using a loop.

Alternatively, we can parallelize these operations using Dask to delay computation and then ask the Dask scheduler to coordinate the computation over the number of cores available to you. This is particularly useful when using HS data.

Note: This simple approach to parallelization does have some overhead to coordinate the computations so you won't get a full 4 x speed up when using a 4-core machine.



In [46]:

    
import dask
from distributed import Client
Client()









    Out[46]:







Client

  Scheduler: tcp://127.0.0.1:62314
  
Dashboard: http://127.0.0.1:8787



Cluster

  Workers: 8
  Cores: 8
  Memory: 10.31 GB



In [47]:

    
#-Setup the Computations as a Collection of Tasks-#
collection = []
for year in sorted(mcp.keys()):
    collection.append((year, dask.delayed(proximity_matrix_numba)(mcp[year])))



In [48]:

    
%%time
#-Compute the Results-#
result = dask.compute(*collection)









    



CPU times: user 1.25 s, sys: 1.11 s, total: 2.35 s
Wall time: 5.11 s



In [49]:

    
#-Organise the list of returned tuples into a convenient dictionary-#
results = {}
for year, df in result:
    results[year] = df



In [50]:

    
results[2000].equals(prox_2000)









    Out[50]:





True



In [51]:

    
results[2000].head()









    Out[51]:







  
    
      productcode2
      0011
      0012
      0013
      0014
      0015
      0111
      0112
      0113
      0114
      0115
      ...
      8994
      8996
      8997
      8998
      8999
      9310
      9410
      9510
      9610
      9710
    
    
      productcode1
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      0011
      1.000000
      0.448276
      0.206897
      0.206897
      0.275862
      0.37500
      0.275862
      0.344828
      0.206897
      0.241379
      ...
      0.068966
      0.137931
      0.162791
      0.068966
      0.034483
      0.244898
      0.178082
      0.206897
      0.137931
      0.175
    
    
      0012
      0.448276
      1.000000
      0.178571
      0.107143
      0.321429
      0.25000
      0.392857
      0.142857
      0.178571
      0.142857
      ...
      0.035714
      0.178571
      0.116279
      0.071429
      0.000000
      0.163265
      0.219178
      0.071429
      0.107143
      0.125
    
    
      0013
      0.206897
      0.178571
      1.000000
      0.227273
      0.100000
      0.12500
      0.083333
      0.500000
      0.312500
      0.153846
      ...
      0.153846
      0.210526
      0.093023
      0.080000
      0.142857
      0.163265
      0.095890
      0.090909
      0.071429
      0.075
    
    
      0014
      0.206897
      0.107143
      0.227273
      1.000000
      0.181818
      0.15625
      0.125000
      0.318182
      0.363636
      0.090909
      ...
      0.090909
      0.181818
      0.139535
      0.120000
      0.045455
      0.142857
      0.178082
      0.409091
      0.045455
      0.200
    
    
      0015
      0.275862
      0.321429
      0.100000
      0.181818
      1.000000
      0.28125
      0.250000
      0.150000
      0.200000
      0.250000
      ...
      0.050000
      0.350000
      0.139535
      0.040000
      0.050000
      0.122449
      0.150685
      0.136364
      0.150000
      0.125
    
  

5 rows × 775 columns

Note: Dask does a lot more than this and is worth looking into for medium to large scale computations



In [52]:

    
#-Save Results into a HDF5 File-#
fl = "data/sitcr2l4_proximity.h5"
store = pd.HDFStore(fl, mode='w')
for year in results.keys():
    store["Y{}".format(year)] = results[year]
store.close()



In [53]:

    
%%html
<style>
  table {margin-left: 0 !important;}
</style>

Performance Comparison (SITC and HS Data)

For SITC Data: (786 Products, 229 Countries, 52 Years)

Function	Time/Year	Total Time	Speedup
pandas	220 seconds	~177 minutes	-
pandas_symmetric	104 seconds	~84 minutes	BASE
numpy	2.5 seconds	120 seconds	~41x
numba	124 milliseconds	6 seconds	~800x
numba + dask	N/A	5 seconds	-

For HS Data: (5016 Products, 222 Countries, 20 Years)

Function	Time/Year	Total Time	Speedup
pandas	1 Hour 25 minutes	-	-
pandas_symmetric	43 minutes	-	BASE
numpy	1 min 37 seconds	-	~28x
numba	5 seconds	1min 45 seconds	~516x
numba + dask	N/A	45 seconds	-

These were run on the following machine:

Item	Details
Processor	Xeon E5 @ 3.6Ghz
Cores	8
RAM	32Gb RAM
Python	Python 3.6

(Extension) Preparing Graph Data: Product Space Network

Here we will use NetworkX to construct our version of the Product Space using Python



In [54]:

    
prox = pd.read_hdf("data/sitcr2l4_proximity.h5", key="Y2000")



In [55]:

    
prox.head()









    Out[55]:







  
    
      productcode2
      0011
      0012
      0013
      0014
      0015
      0111
      0112
      0113
      0114
      0115
      ...
      8994
      8996
      8997
      8998
      8999
      9310
      9410
      9510
      9610
      9710
    
    
      productcode1
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      0011
      1.000000
      0.448276
      0.206897
      0.206897
      0.275862
      0.37500
      0.275862
      0.344828
      0.206897
      0.241379
      ...
      0.068966
      0.137931
      0.162791
      0.068966
      0.034483
      0.244898
      0.178082
      0.206897
      0.137931
      0.175
    
    
      0012
      0.448276
      1.000000
      0.178571
      0.107143
      0.321429
      0.25000
      0.392857
      0.142857
      0.178571
      0.142857
      ...
      0.035714
      0.178571
      0.116279
      0.071429
      0.000000
      0.163265
      0.219178
      0.071429
      0.107143
      0.125
    
    
      0013
      0.206897
      0.178571
      1.000000
      0.227273
      0.100000
      0.12500
      0.083333
      0.500000
      0.312500
      0.153846
      ...
      0.153846
      0.210526
      0.093023
      0.080000
      0.142857
      0.163265
      0.095890
      0.090909
      0.071429
      0.075
    
    
      0014
      0.206897
      0.107143
      0.227273
      1.000000
      0.181818
      0.15625
      0.125000
      0.318182
      0.363636
      0.090909
      ...
      0.090909
      0.181818
      0.139535
      0.120000
      0.045455
      0.142857
      0.178082
      0.409091
      0.045455
      0.200
    
    
      0015
      0.275862
      0.321429
      0.100000
      0.181818
      1.000000
      0.28125
      0.250000
      0.150000
      0.200000
      0.250000
      ...
      0.050000
      0.350000
      0.139535
      0.040000
      0.050000
      0.122449
      0.150685
      0.136364
      0.150000
      0.125
    
  

5 rows × 775 columns

use pandas to construct and edge list



In [56]:

    
edge_list = prox.unstack()



In [57]:

    
#-Construct Sequence of node pairs as a pd.Series
edge_list.head()









    Out[57]:





productcode2  productcode1
0011          0011            1.000000
              0012            0.448276
              0013            0.206897
              0014            0.206897
              0015            0.275862
dtype: float64



In [58]:

    
#-Remove Self Loops-#
edge_list = edge_list[edge_list != 1.0]     #TODO: do this operation properly to compare node1 == node2



In [59]:

    
edge_list.head()









    Out[59]:





productcode2  productcode1
0011          0012            0.448276
              0013            0.206897
              0014            0.206897
              0015            0.275862
              0111            0.375000
dtype: float64

We would like to construct the maximum_spanning_tree, but the current version of networkx supports minimum_spanning_tree so we need to add inv_weight for this computation.



In [60]:

    
#-Construct DataFrame-#
edge_list = edge_list.reset_index()
edge_list.columns = ["P1","P2","weight"]



In [61]:

    
edge_list["inv_weight"] = 1 - edge_list['weight']    #Useful when working with minimum spanning tree in networkx



In [62]:

    
edge_list.head()



In [63]:

    
edge_list[["weight","inv_weight"]].hist();

Network Tools

We want to now construct a maximum_spanning_tree and then add in all nodes that are highly connected above a threshold value of 0.5



In [64]:

    
import networkx as nx



In [65]:

    
#-Construct the complete network-#
g = nx.from_pandas_dataframe(edge_list, source="P1", target="P2", edge_attr=["weight", "inv_weight"])
print("# of Nodes: {}".format(g.number_of_nodes()))
print("# of Edges: {}".format(g.number_of_edges()))









    



# of Nodes: 775
# of Edges: 299925



In [66]:

    
mst = nx.minimum_spanning_tree(g, weight='inv_weight') #Maximum Spanning Tree
print("# of Nodes: {}".format(mst.number_of_nodes()))
print("# of Edges: {}".format(mst.number_of_edges()))









    



# of Nodes: 775
# of Edges: 774



In [67]:

    
mst["0011"]









    Out[67]:





{'0012': {'inv_weight': 0.5517241379310345, 'weight': 0.4482758620689655},
 '0223': {'inv_weight': 0.5517241379310345, 'weight': 0.4482758620689655}}



In [68]:

    
#-Build Maximum Spanning Tree + Keep Edges > 0.50-#
ps = nx.Graph()
#Add MST ('weight' attribute only)
for u,v,w in mst.edges_iter(data=True):
    ps.add_edge(u,v,attr_dict={'weight' : w["weight"]})
#Add Edges > 0.50
for u,v,w in g.edges_iter(data=True):
    if w['weight'] >= 0.50:
        ps.add_edge(u,v,attr_dict={'weight' : w["weight"]})



In [69]:

    
print("# of Nodes: {}".format(ps.number_of_nodes()))
print("# of Edges: {}".format(ps.number_of_edges()))









    



# of Nodes: 775
# of Edges: 1547

Visualizations



In [70]:

    
ps_nodes = pd.read_csv("data/PS_SITC_nodes", sep="\t", converters={'sitc' : str},
                       names=["sitc", "community", "x", "y", "nodesize","leamer","pname","ncolor"])
ps_edges = pd.read_csv("data/PS_SITC_edges", sep="\t", converters={'sourceid' : str, 'targetid' : str},
                       names=["sourceid", "sourcex", "sourcey","targetid","targetx","targety", "width","color"])



In [71]:

    
ps_nodes.head()









    Out[71]:







  
    
      
      sitc
      community
      x
      y
      nodesize
      leamer
      pname
      ncolor
    
  
  
    
      0
      6932
      999
      4551.899658
      2540.087158
      48.780762
      8
      WIRE,TWISTED HOOP FOR FENCING OF IRON OR STEEL
      #9c9a87
    
    
      1
      7362
      10
      216.835098
      5013.330811
      65.180725
      9
      METAL FORMING MACHINE TOOLS
      #4037ab
    
    
      2
      7911
      10
      538.914902
      5650.589111
      53.997589
      9
      RAIL LOCOMOTIVES,ELECTRIC
      #4037ab
    
    
      3
      8946
      10
      696.394257
      5316.897949
      57.695251
      7
      NON-MILITARY ARMS AND AMMUNITION THEREFOR
      #4037ab
    
    
      4
      7264
      10
      57.284065
      5879.528076
      73.333267
      9
      PRINTING PRESSES
      #4037ab



In [72]:

    
ps_nodes.shape









    Out[72]:





(774, 8)



In [73]:

    
def normalize(df, column):
    max_value = df[column].max()
    min_value = df[column].min()
    df[column+"_scaled"] = (df[column] - min_value) / (max_value - min_value)
    return df



In [74]:

    
#Preprocess Coordinates to be Normalized between 0,1
ps_nodes = normalize(ps_nodes, 'x')
ps_nodes = normalize(ps_nodes, 'y')



In [75]:

    
ps_nodes.head()









    Out[75]:







  
    
      
      sitc
      community
      x
      y
      nodesize
      leamer
      pname
      ncolor
      x_scaled
      y_scaled
    
  
  
    
      0
      6932
      999
      4551.899658
      2540.087158
      48.780762
      8
      WIRE,TWISTED HOOP FOR FENCING OF IRON OR STEEL
      #9c9a87
      0.831500
      0.136279
    
    
      1
      7362
      10
      216.835098
      5013.330811
      65.180725
      9
      METAL FORMING MACHINE TOOLS
      #4037ab
      0.335160
      0.632872
    
    
      2
      7911
      10
      538.914902
      5650.589111
      53.997589
      9
      RAIL LOCOMOTIVES,ELECTRIC
      #4037ab
      0.372036
      0.760825
    
    
      3
      8946
      10
      696.394257
      5316.897949
      57.695251
      7
      NON-MILITARY ARMS AND AMMUNITION THEREFOR
      #4037ab
      0.390066
      0.693824
    
    
      4
      7264
      10
      57.284065
      5879.528076
      73.333267
      9
      PRINTING PRESSES
      #4037ab
      0.316892
      0.806793



In [76]:

    
import numpy as np
#-Obtain Dictionary of Coordinates-#
coord = {}
xy = ps_nodes[["x_scaled","y_scaled"]].values
for idx, productcode in enumerate(ps_nodes["sitc"]):
   coord[productcode] = xy[idx]
#-Add Missing Nodes-#
coord['6784'] = np.array([0,0])



In [77]:

    
coord









    Out[77]:





{'0011': array([ 0.63635789,  0.29305675]),
 '0012': array([ 0.89501582,  0.00749056]),
 '0013': array([ 0.49858259,  0.76763095]),
 '0014': array([ 0.5520262 ,  0.40525301]),
 '0015': array([ 0.74470858,  0.71973192]),
 '0111': array([ 0.79874644,  0.84198768]),
 '0112': array([ 0.87884575,  0.01180109]),
 '0113': array([ 0.49590829,  0.67857286]),
 '0114': array([ 0.48507251,  0.76832806]),
 '0115': array([ 0.76640784,  0.76481862]),
 '0116': array([ 0.81179179,  0.81664421]),
 '0118': array([ 0.64906381,  0.19757255]),
 '0121': array([ 0.10098441,  0.53733217]),
 '0129': array([ 0.68994095,  0.30165181]),
 '0141': array([ 0.52012477,  0.78594745]),
 '0142': array([ 0.68846539,  0.38211879]),
 '0149': array([ 0.66141205,  0.31634175]),
 '0223': array([ 0.65156137,  0.37792959]),
 '0224': array([ 0.68550201,  0.40946486]),
 '0230': array([ 0.71059287,  0.6434541 ]),
 '0240': array([ 0.70160323,  0.42749238]),
 '0251': array([ 0.71759312,  0.38431479]),
 '0252': array([ 0.4006868 ,  0.32952457]),
 '0341': array([ 0.94629419,  0.19430359]),
 '0342': array([ 0.91927217,  0.20276235]),
 '0343': array([ 0.97770012,  0.14887625]),
 '0344': array([ 0.93688035,  0.14636667]),
 '0350': array([ 0.93374579,  0.17769192]),
 '0360': array([ 0.93429738,  0.21478526]),
 '0371': array([ 0.91780697,  0.16919306]),
 '0372': array([ 1.        ,  0.14001901]),
 '0411': array([ 0.65493384,  0.8134331 ]),
 '0412': array([ 0.64433633,  0.75996328]),
 '0421': array([ 0.90644388,  0.93082028]),
 '0422': array([ 0.88236159,  0.92425238]),
 '0430': array([ 0.62964211,  0.74067222]),
 '0440': array([ 0.6439859 ,  0.82573238]),
 '0451': array([ 0.72568476,  0.85958402]),
 '0452': array([ 0.52897962,  0.84234945]),
 '0459': array([ 0.85069646,  0.96400206]),
 '0460': array([ 0.76932035,  0.19184465]),
 '0470': array([ 0.75047206,  0.19027125]),
 '0481': array([ 0.67451731,  0.24032582]),
 '0482': array([ 0.53932578,  0.41319688]),
 '0483': array([ 0.917674 ,  0.8337203]),
 '0484': array([ 0.74867081,  0.42008408]),
 '0488': array([ 0.68121005,  0.32931457]),
 '0541': array([ 0.86236723,  0.11724096]),
 '0542': array([ 0.88106794,  0.16425865]),
 '0544': array([ 0.84585708,  0.18388889]),
 '0545': array([ 0.82882269,  0.27524436]),
 '0546': array([ 0.84539276,  0.21776117]),
 '0548': array([ 0.95186634,  0.39403734]),
 '0561': array([ 0.86091215,  0.17606901]),
 '0564': array([ 0.80157051,  0.27752462]),
 '0565': array([ 0.80736858,  0.33735847]),
 '0571': array([ 0.84980292,  0.05240932]),
 '0572': array([ 0.85395723,  0.07730166]),
 '0573': array([ 0.92352244,  0.10519153]),
 '0574': array([ 0.87084448,  0.06825976]),
 '0575': array([ 0.87665644,  0.09216018]),
 '0576': array([ 0.88937099,  0.0854536 ]),
 '0577': array([ 0.93917723,  0.35682213]),
 '0579': array([ 0.82376707,  0.3387675 ]),
 '0582': array([ 0.86347977,  0.79212344]),
 '0583': array([ 0.78106859,  0.36023599]),
 '0585': array([ 0.78217058,  0.4039315 ]),
 '0586': array([ 0.79101679,  0.32925432]),
 '0589': array([ 0.80365997,  0.31380688]),
 '0611': array([ 0.89877183,  0.14383523]),
 '0612': array([ 0.75952269,  0.17560915]),
 '0615': array([ 0.75901759,  0.14102924]),
 '0616': array([ 0.7945135 ,  0.06174391]),
 '0619': array([ 0.38018554,  0.32189533]),
 '0620': array([ 0.76761188,  0.41850043]),
 '0711': array([ 0.91184664,  0.13190565]),
 '0712': array([ 0.63294722,  0.17489395]),
 '0721': array([ 0.04181841,  0.04137577]),
 '0722': array([ 0.05211408,  0.02292486]),
 '0723': array([ 0.06483267,  0.07456639]),
 '0730': array([ 0.73716717,  0.45194727]),
 '0741': array([ 0.87066799,  0.9471269 ]),
 '0742': array([ 0.6723572 ,  0.91495096]),
 '0751': array([ 0.92768681,  0.12685502]),
 '0752': array([ 0.86969386,  0.29415243]),
 '0811': array([ 0.84528249,  1.        ]),
 '0812': array([ 0.92814406,  0.26480632]),
 '0813': array([ 0.65228856,  0.85849676]),
 '0814': array([ 0.95241466,  0.10117223]),
 '0819': array([ 0.67600939,  0.28060519]),
 '0913': array([ 0.52059946,  0.60956515]),
 '0914': array([ 0.7349096 ,  0.29270094]),
 '0980': array([ 0.74237336,  0.38104099]),
 '1110': array([ 0.74606296,  0.34412809]),
 '1121': array([ 0.8669751 ,  0.04067182]),
 '1122': array([ 0.68477878,  0.17416487]),
 '1123': array([ 0.69575134,  0.27443376]),
 '1124': array([ 0.69092375,  0.22269325]),
 '1211': array([ 0.95953241,  0.35811837]),
 '1212': array([ 0.98245759,  0.33012482]),
 '1213': array([ 0.98199081,  0.37120124]),
 '1221': array([ 0.59588984,  0.21548835]),
 '1222': array([ 0.69925874,  0.65791095]),
 '1223': array([ 0.63627746,  0.13192413]),
 '2111': array([ 0.78592494,  0.13730379]),
 '2112': array([ 0.78661895,  0.16876345]),
 '2114': array([ 0.82984682,  0.07978413]),
 '2116': array([ 0.8011108 ,  0.15361493]),
 '2117': array([ 0.80209171,  0.1806093 ]),
 '2119': array([ 0.79439504,  0.12168673]),
 '2120': array([ 0.54353332,  0.84488026]),
 '2221': array([ 0.85765862,  0.94001341]),
 '2222': array([ 0.6585641 ,  0.88853322]),
 '2223': array([ 0.86361739,  0.9184408 ]),
 '2224': array([ 0.63855844,  0.7928457 ]),
 '2225': array([ 0.82130265,  0.91107735]),
 '2226': array([ 0.6313761 ,  0.70555257]),
 '2231': array([ 0.03254904,  0.0518298 ]),
 '2232': array([ 0.03509435,  0.1086498 ]),
 '2234': array([ 0.2260525 ,  0.04864769]),
 '2235': array([ 0.89982842,  0.89565456]),
 '2238': array([ 0.89761814,  0.27345767]),
 '2239': array([ 0.64338789,  0.89526098]),
 '2320': array([ 0.08957042,  0.13976592]),
 '2331': array([ 0.18510404,  0.60265307]),
 '2332': array([ 0.65965969,  0.65254653]),
 '2440': array([ 0.92682841,  0.03693209]),
 '2450': array([ 0.83349767,  0.33944937]),
 '2460': array([ 0.67228629,  0.60652272]),
 '2471': array([ 0.70826501,  0.76196805]),
 '2472': array([ 0.71409037,  0.32612238]),
 '2479': array([ 0.72532445,  0.27162363]),
 '2481': array([ 0.70200867,  0.2377914 ]),
 '2482': array([ 0.69333768,  0.6876424 ]),
 '2483': array([ 0.72583599,  0.33236362]),
 '2511': array([ 0.60326761,  0.2717982 ]),
 '2512': array([ 0.26727696,  0.3284363 ]),
 '2516': array([ 0.28198083,  0.20645341]),
 '2517': array([ 0.28380044,  0.2854971 ]),
 '2518': array([ 0.30709595,  0.29114455]),
 '2519': array([ 0.26960684,  0.22832685]),
 '2613': array([ 0.95041755,  0.86696198]),
 '2614': array([ 0.95643846,  0.85600747]),
 '2631': array([ 0.83970555,  0.87558221]),
 '2632': array([ 0.86337747,  0.89885954]),
 '2633': array([ 0.8858473 ,  0.34237038]),
 '2634': array([ 0.83590764,  0.84100296]),
 '2640': array([ 0.88716846,  0.90016298]),
 '2651': array([ 0.77513431,  0.98050717]),
 '2652': array([ 0.57734971,  0.76283782]),
 '2654': array([ 0.89256758,  0.8735841 ]),
 '2655': array([ 0.07427784,  0.31152607]),
 '2659': array([ 0.04486092,  0.07931673]),
 '2665': array([ 0.76720212,  0.65406317]),
 '2666': array([ 0.87284019,  0.80768682]),
 '2667': array([ 0.8687509 ,  0.76724703]),
 '2671': array([ 0.16250802,  0.59394005]),
 '2672': array([ 0.83193274,  0.56803593]),
 '2681': array([ 0.82133468,  0.58656586]),
 '2682': array([ 0.8040943,  0.580486 ]),
 '2683': array([ 0.81835573,  0.64206305]),
 '2685': array([ 0.82385602,  0.61964779]),
 '2686': array([ 0.79764116,  0.59487207]),
 '2687': array([ 0.81130387,  0.62895415]),
 '2690': array([ 0.76825037,  0.31272338]),
 '2711': array([ 0.78379204,  0.31477512]),
 '2712': array([ 0.74363215,  0.9548006 ]),
 '2713': array([ 0.67388406,  0.82900501]),
 '2714': array([ 0.75322086,  0.91333095]),
 '2731': array([ 0.87287947,  0.14234056]),
 '2732': array([ 0.78951561,  0.29040494]),
 '2733': array([ 0.83057122,  0.97508158]),
 '2734': array([ 0.74239301,  0.66152177]),
 '2741': array([ 0.27829636,  0.09395584]),
 '2742': array([ 0.42074917,  0.81771088]),
 '2771': array([ 0.0281311 ,  0.16531763]),
 '2772': array([ 0.00843426,  0.18785427]),
 '2782': array([ 0.48017127,  0.27235467]),
 '2783': array([ 0.84413469,  0.13412067]),
 '2784': array([ 0.2786335 ,  0.12279635]),
 '2785': array([ 0.47610507,  0.24909347]),
 '2786': array([ 0.79159524,  0.90178481]),
 '2789': array([ 0.72492325,  0.21828666]),
 '2814': array([ 0.45309889,  0.17766834]),
 '2815': array([ 0.43892666,  0.21527658]),
 '2816': array([ 0.45250286,  0.25757046]),
 '2820': array([ 0.77408505,  0.23747915]),
 '2860': array([ 0.44023875,  0.12169914]),
 '2871': array([ 0.92439498,  0.06866619]),
 '2872': array([ 0.25038505,  0.11265106]),
 '2873': array([ 0.43882644,  0.18292515]),
 '2874': array([ 0.88864811,  0.03565057]),
 '2875': array([ 0.88159709,  0.05911271]),
 '2876': array([ 0.83201931,  0.65372713]),
 '2877': array([ 0.43953774,  0.1520278 ]),
 '2879': array([ 0.89455958,  0.10782383]),
 '2881': array([ 0.8630308 ,  0.00104601]),
 '2882': array([ 0.77729795,  0.27832595]),
 '2890': array([ 0.99922303,  0.21329083]),
 '2911': array([ 0.88504165,  0.24126752]),
 '2919': array([ 0.77917735,  0.03597681]),
 '2922': array([ 0.81284161,  0.96245573]),
 '2923': array([ 0.91000654,  0.53332889]),
 '2924': array([ 0.85546701,  0.30850957]),
 '2925': array([ 0.07335365,  0.        ]),
 '2926': array([ 0.08074328,  0.02176416]),
 '2927': array([ 0.08039992,  0.04975361]),
 '2929': array([ 0.93825216,  0.24504966]),
 '3221': array([ 0.36525415,  0.86900837]),
 '3222': array([ 0.34622862,  0.85539388]),
 '3223': array([ 0.34309396,  0.90723761]),
 '3224': array([ 0.55060698,  0.76872763]),
 '3231': array([ 0.35863778,  0.77780698]),
 '3232': array([ 0.35859708,  0.81352398]),
 '3330': array([ 0.79101103,  0.93930801]),
 '3345': array([ 0.24854934,  0.53092416]),
 '3351': array([ 0.5904266 ,  0.29555009]),
 '3352': array([ 0.50993242,  0.82164543]),
 '3353': array([ 0.46727892,  0.33350608]),
 '3354': array([ 0.54617512,  0.22026095]),
 '3413': array([ 0.5427787 ,  0.15929723]),
 '3414': array([ 0.24644195,  0.06940038]),
 '3415': array([ 0.00462231,  0.16217638]),
 '3510': array([ 0.7207862,  0.5978446]),
 '4111': array([ 0.9687307 ,  0.06569214]),
 '4113': array([ 0.78741144,  0.8114121 ]),
 '4232': array([ 0.65416533,  0.93610695]),
 '4233': array([ 0.83357449,  0.91714447]),
 '4234': array([ 0.09625087,  0.66562548]),
 '4235': array([ 0.89763832,  0.06786765]),
 '4236': array([ 0.62580473,  0.82483688]),
 '4239': array([ 0.62203945,  0.65065887]),
 '4241': array([ 0.14177874,  0.54877923]),
 '4242': array([ 0.07557083,  0.09729116]),
 '4243': array([ 0.05954498,  0.09497869]),
 '4244': array([ 0.0776234 ,  0.11891703]),
 '4245': array([ 0.19628943,  0.78602735]),
 '4249': array([ 0.83176592,  0.94717509]),
 '4311': array([ 0.10639469,  0.58077953]),
 '4312': array([ 0.73798905,  0.24154914]),
 '4313': array([ 0.43115874,  0.23462559]),
 '4314': array([ 0.06234906,  0.0097022 ]),
 '5111': array([ 0.1182396 ,  0.39758889]),
 '5112': array([ 0.13964068,  0.65654201]),
 '5113': array([ 0.2408855 ,  0.61994932]),
 '5114': array([ 0.50727379,  0.74187816]),
 '5121': array([ 0.67354621,  0.19453222]),
 '5122': array([ 0.2369537 ,  0.55618519]),
 '5123': array([ 0.21698937,  0.51091868]),
 '5137': array([ 0.19683758,  0.5789503 ]),
 '5138': array([ 0.14626244,  0.41431799]),
 '5139': array([ 0.48896044,  0.23895828]),
 '5145': array([ 0.25736227,  0.48709669]),
 '5146': array([ 0.17164977,  0.46116762]),
 '5147': array([ 0.27403819,  0.69152813]),
 '5148': array([ 0.13066138,  0.50622197]),
 '5154': array([ 0.21480677,  0.56121935]),
 '5155': array([ 0.17332244,  0.72248356]),
 '5156': array([ 0.19836384,  0.50859743]),
 '5157': array([ 0.15055788,  0.46377554]),
 '5161': array([ 0.15621654,  0.62436338]),
 '5162': array([ 0.22154389,  0.53296654]),
 '5163': array([ 0.26443174,  0.60476054]),
 '5169': array([ 0.30361748,  0.47660635]),
 '5221': array([ 0.46423497,  0.25323167]),
 '5222': array([ 0.66119708,  0.74469191]),
 '5223': array([ 0.11047742,  0.64475102]),
 '5224': array([ 0.41880427,  0.23688287]),
 '5225': array([ 0.6746201 ,  0.74894367]),
 '5231': array([ 0.6539533 ,  0.69733754]),
 '5232': array([ 0.64683007,  0.64638447]),
 '5233': array([ 0.46798502,  0.22021629]),
 '5239': array([ 0.47432143,  0.29511219]),
 '5241': array([ 0.25789256,  0.09773386]),
 '5249': array([ 0.43976042,  0.8306188 ]),
 '5311': array([ 0.2615462 ,  0.67774162]),
 '5312': array([ 0.24684077,  0.77994729]),
 '5322': array([ 0.14780281,  0.52193584]),
 '5323': array([ 0.16483818,  0.52566235]),
 '5331': array([ 0.15093628,  0.7437643 ]),
 '5332': array([ 0.28657723,  0.59255729]),
 '5334': array([ 0.61928322,  0.38998485]),
 '5335': array([ 0.62132804,  0.42463101]),
 '5411': array([ 0.23510525,  0.65566053]),
 '5413': array([ 0.49946297,  0.30598876]),
 '5414': array([ 0.37999854,  0.36168953]),
 '5415': array([ 0.2383942 ,  0.47588987]),
 '5416': array([ 0.28117032,  0.50736272]),
 '5417': array([ 0.51303938,  0.35335896]),
 '5419': array([ 0.39193688,  0.39493862]),
 '5513': array([ 0.89794748,  0.25117236]),
 '5514': array([ 0.19265012,  0.43933923]),
 '5530': array([ 0.51249065,  0.30261755]),
 '5541': array([ 0.76061005,  0.22435716]),
 '5542': array([ 0.66101476,  0.3592627 ]),
 '5543': array([ 0.60384159,  0.34035948]),
 '5621': array([ 0.66674643,  0.70111621]),
 '5622': array([ 0.66747715,  0.78889336]),
 '5623': array([ 0.73299791,  0.90919155]),
 '5629': array([ 0.96421058,  0.12554447]),
 '5721': array([ 0.71034517,  0.23708365]),
 '5722': array([ 0.70676156,  0.17141196]),
 '5723': array([ 0.10320986,  0.20534443]),
 '5821': array([ 0.60728432,  0.44374991]),
 '5822': array([ 0.47539385,  0.67607381]),
 '5823': array([ 0.13149249,  0.13068481]),
 '5824': array([ 0.2639243 ,  0.53004729]),
 '5825': array([ 0.47569113,  0.76236762]),
 '5826': array([ 0.16724095,  0.40149505]),
 '5827': array([ 0.22163546,  0.60160306]),
 '5829': array([ 0.17214926,  0.43555172]),
 '5831': array([ 0.5573469,  0.2643158]),
 '5832': array([ 0.5487989 ,  0.29336149]),
 '5833': array([ 0.14702528,  0.18044327]),
 '5834': array([ 0.54195332,  0.33145707]),
 '5835': array([ 0.20486581,  0.61774253]),
 '5836': array([ 0.1848795 ,  0.57368452]),
 '5837': array([ 0.61025138,  0.27755482]),
 '5838': array([ 0.33353248,  0.34209283]),
 '5839': array([ 0.18615722,  0.53841698]),
 '5841': array([ 0.31173794,  0.68176828]),
 '5842': array([ 0.11728222,  0.62398151]),
 '5843': array([ 0.19358744,  0.75290674]),
 '5849': array([ 0.55371141,  0.81029081]),
 '5851': array([ 0.32337169,  0.28502621]),
 '5852': array([ 0.37614729,  0.28343139]),
 '5911': array([ 0.63259489,  0.21667949]),
 '5912': array([ 0.60951457,  0.23396647]),
 '5913': array([ 0.62235698,  0.24476569]),
 '5914': array([ 0.62512728,  0.28033367]),
 '5921': array([ 0.62001031,  0.68929521]),
 '5922': array([ 0.51536335,  0.7194377 ]),
 '5981': array([ 0.32864126,  0.25585456]),
 '5982': array([ 0.18515119,  0.74272067]),
 '5983': array([ 0.23537272,  0.68705152]),
 '5989': array([ 0.2538109 ,  0.59635806]),
 '6112': array([ 0.11960962,  0.52932729]),
 '6113': array([ 0.80345508,  0.03655887]),
 '6114': array([ 0.80339286,  0.08908231]),
 '6115': array([ 0.8141217 ,  0.14192756]),
 '6116': array([ 0.80688516,  0.11900779]),
 '6118': array([ 0.79180416,  0.60836622]),
 '6121': array([ 0.4024542 ,  0.82458109]),
 '6122': array([ 0.94099771,  0.86642639]),
 '6123': array([ 0.84585476,  0.47133674]),
 '6129': array([ 0.7789375 ,  0.58126782]),
 '6130': array([ 0.73579098,  0.75559118]),
 '6210': array([ 0.52922059,  0.44971595]),
 '6251': array([ 0.5914618 ,  0.58573414]),
 '6252': array([ 0.58283612,  0.42871921]),
 '6253': array([ 0.13587762,  0.68083567]),
 '6254': array([ 0.25565177,  0.80323344]),
 '6259': array([ 0.58099404,  0.40789842]),
 '6281': array([ 0.14166451,  0.72954294]),
 '6282': array([ 0.46495837,  0.49206625]),
 '6289': array([ 0.52058877,  0.49442299]),
 '6330': array([ 0.90761362,  0.0461323 ]),
 '6341': array([ 0.70851853,  0.35765606]),
 '6342': array([ 0.71926057,  0.72956388]),
 '6343': array([ 0.69478525,  0.55933423]),
 '6344': array([ 0.75620804,  0.70145846]),
 '6349': array([ 0.7737313 ,  0.71876574]),
 '6351': array([ 0.71900676,  0.51996288]),
 '6352': array([ 0.55568805,  0.71636694]),
 '6353': array([ 0.72384296,  0.49437926]),
 '6354': array([ 0.88276861,  0.53119118]),
 '6359': array([ 0.7177487 ,  0.54539234]),
 '6411': array([ 0.28785093,  0.34077968]),
 '6412': array([ 0.28639342,  0.41149598]),
 '6413': array([ 0.31941734,  0.23644026]),
 '6415': array([ 0.57426156,  0.45762499]),
 '6416': array([ 0.66813101,  0.54237726]),
 '6417': array([ 0.74688477,  0.4757809 ]),
 '6418': array([ 0.40626467,  0.73183203]),
 '6419': array([ 0.56086123,  0.79400527]),
 '6421': array([ 0.759203  ,  0.37490863]),
 '6422': array([ 0.69600167,  0.50252404]),
 '6423': array([ 0.83273999,  0.53707692]),
 '6424': array([ 0.64261753,  0.40849622]),
 '6428': array([ 0.63694952,  0.45416822]),
 '6511': array([ 0.93739829,  0.83281852]),
 '6512': array([ 0.7783232,  0.5600287]),
 '6513': array([ 0.82931717,  0.78161074]),
 '6514': array([ 0.74963088,  0.58886941]),
 '6515': array([ 0.91429133,  0.80559366]),
 '6516': array([ 0.80853784,  0.71002683]),
 '6517': array([ 0.76129712,  0.57515536]),
 '6518': array([ 0.92577216,  0.83092071]),
 '6519': array([ 0.63740831,  0.65841502]),
 '6521': array([ 0.84509442,  0.8076354 ]),
 '6522': array([ 0.83932557,  0.75143913]),
 '6531': array([ 0.84404161,  0.700481  ]),
 '6532': array([ 0.86307119,  0.73691507]),
 '6534': array([ 0.85417137,  0.63932685]),
 '6535': array([ 0.86984207,  0.70915452]),
 '6536': array([ 0.90487109,  0.81415227]),
 '6538': array([ 0.88901898,  0.77088569]),
 '6539': array([ 0.85237916,  0.77012548]),
 '6541': array([ 0.95415405,  0.8184305 ]),
 '6542': array([ 0.5330556 ,  0.77175682]),
 '6543': array([ 0.53122327,  0.74144565]),
 '6544': array([ 0.52849655,  0.70497414]),
 '6545': array([ 0.87967458,  0.86734434]),
 '6546': array([ 0.5600304 ,  0.46628081]),
 '6549': array([ 0.88171563,  0.70925668]),
 '6551': array([ 0.90953419,  0.83660523]),
 '6552': array([ 0.90843738,  0.62318645]),
 '6553': array([ 0.91493282,  0.87159286]),
 '6560': array([ 0.93437582,  0.41551112]),
 '6571': array([ 0.54890691,  0.66432772]),
 '6572': array([ 0.45946214,  0.74604727]),
 '6573': array([ 0.5286919 ,  0.67733789]),
 '6574': array([ 0.87834487,  0.61230861]),
 '6575': array([ 0.9448   ,  0.3065047]),
 '6576': array([ 0.93743301,  0.79210089]),
 '6577': array([ 0.5808668 ,  0.48763773]),
 '6579': array([ 0.57520449,  0.57138092]),
 '6581': array([ 0.88311632,  0.2899568 ]),
 '6582': array([ 0.68526021,  0.75011074]),
 '6583': array([ 0.87852581,  0.68114058]),
 '6584': array([ 0.9045484,  0.460431 ]),
 '6589': array([ 0.90265332,  0.4952028 ]),
 '6591': array([ 0.57420275,  0.84683782]),
 '6592': array([ 0.90628262,  0.74264915]),
 '6593': array([ 0.92341927,  0.7569882 ]),
 '6594': array([ 0.97418221,  0.48608614]),
 '6595': array([ 0.22443524,  0.7288323 ]),
 '6596': array([ 0.94422664,  0.48019854]),
 '6597': array([ 0.02941287,  0.0867315 ]),
 '6611': array([ 0.73576825,  0.66829281]),
 '6612': array([ 0.85094465,  0.28015372]),
 '6613': array([ 0.87937297,  0.11517697]),
 '6618': array([ 0.69051189,  0.45352442]),
 '6623': array([ 0.54969634,  0.64317287]),
 '6624': array([ 0.79568112,  0.50407798]),
 '6631': array([ 0.35104102,  0.53937321]),
 '6632': array([ 0.46009821,  0.61922843]),
 '6633': array([ 0.69775114,  0.48135051]),
 '6635': array([ 0.57359612,  0.55533846]),
 '6637': array([ 0.25713825,  0.43115661]),
 '6638': array([ 0.50837813,  0.78294154]),
 '6639': array([ 0.36015203,  0.68455277]),
 '6641': array([ 0.21866973,  0.6290544 ]),
 '6642': array([ 0.10244763,  0.48788263]),
 '6643': array([ 0.08399068,  0.26835208]),
 '6644': array([ 0.51908231,  0.37779463]),
 '6645': array([ 0.60263127,  0.57977541]),
 '6646': array([ 0.35842476,  0.73846857]),
 '6647': array([ 0.58529454,  0.51034779]),
 '6648': array([ 0.23908675,  0.4142625 ]),
 '6649': array([ 0.64801598,  0.29747236]),
 '6651': array([ 0.7307748 ,  0.18933036]),
 '6652': array([ 0.56593882,  0.36097502]),
 '6658': array([ 0.38868749,  0.76038554]),
 '6664': array([ 0.56849225,  0.33617508]),
 '6665': array([ 0.57080554,  0.31563831]),
 '6666': array([ 0.90745479,  0.59015006]),
 '6671': array([ 0.10361032,  0.28247717]),
 '6672': array([ 0.04306214,  0.159214  ]),
 '6673': array([ 0.00830894,  0.13359332]),
 '6674': array([ 0.06017912,  0.16161407]),
 '6712': array([ 0.45366876,  0.21486859]),
 '6713': array([ 0.45144687,  0.29645154]),
 '6716': array([ 0.41680492,  0.35613347]),
 '6724': array([ 0.40594627,  0.30081004]),
 '6725': array([ 0.39563042,  0.21225847]),
 '6727': array([ 0.4542804 ,  0.34605115]),
 '6731': array([ 0.60169608,  0.79382943]),
 '6732': array([ 0.75155727,  0.25047991]),
 '6733': array([ 0.73846127,  0.16612411]),
 '6744': array([ 0.47313714,  0.39064951]),
 '6745': array([ 0.46076123,  0.40748129]),
 '6746': array([ 0.43455449,  0.37541035]),
 '6747': array([ 0.41617801,  0.30077396]),
 '6749': array([ 0.39073669,  0.2875204 ]),
 '6760': array([ 0.4434554 ,  0.76259154]),
 '6770': array([ 0.61232758,  0.56711779]),
 '6781': array([ 0.41030279,  0.76757546]),
 '6782': array([ 0.42799839,  0.67089774]),
 '6783': array([ 0.36763853,  0.21803602]),
 '6784': array([0, 0]),
 '6785': array([ 0.44994438,  0.4546245 ]),
 '6793': array([ 0.55204451,  0.68659617]),
 '6794': array([ 0.61487515,  0.53987106]),
 '6811': array([ 0.48341865,  0.17432698]),
 '6812': array([ 0.27982188,  0.62920602]),
 '6821': array([ 0.9121584 ,  0.08202509]),
 '6822': array([ 0.42419346,  0.28497293]),
 '6831': array([ 0.27449512,  0.14887924]),
 '6832': array([ 0.32862157,  0.4819488 ]),
 '6841': array([ 0.73153043,  0.79751672]),
 '6842': array([ 0.66065235,  0.45012372]),
 '6851': array([ 0.85044558,  0.14882738]),
 '6852': array([ 0.81178533,  0.19513238]),
 '6861': array([ 0.43569425,  0.29002013]),
 '6863': array([ 0.43248562,  0.26814243]),
 '6871': array([ 0.13860658,  0.43822707]),
 '6872': array([ 0.09787971,  0.68407521]),
 '6880': array([ 0.22974846,  0.63353836]),
 '6891': array([ 0.43467765,  0.8652452 ]),
 '6899': array([ 0.4151838 ,  0.20011889]),
 '6911': array([ 0.69763491,  0.52348493]),
 '6912': array([ 0.64686444,  0.53202113]),
 '6921': array([ 0.66764804,  0.49468137]),
 '6924': array([ 0.72459458,  0.41828609]),
 '6931': array([ 0.58991413,  0.46705626]),
 '6932': array([ 0.83150017,  0.13627893]),
 '6935': array([ 0.62988326,  0.54528298]),
 '6940': array([ 0.34435839,  0.52320267]),
 '6951': array([ 0.68879593,  0.70794715]),
 '6953': array([ 0.39997205,  0.70886368]),
 '6954': array([ 0.37189079,  0.49133355]),
 '6960': array([ 0.45229297,  0.79439096]),
 '6973': array([ 0.64498784,  0.48758861]),
 '6974': array([ 0.88282197,  0.73528138]),
 '6975': array([ 0.60103237,  0.66417792]),
 '6978': array([ 0.912547  ,  0.77794909]),
 '6991': array([ 0.56764178,  0.50370469]),
 '6992': array([ 0.44422482,  0.72707704]),
 '6993': array([ 0.5450297 ,  0.70142567]),
 '6994': array([ 0.45357289,  0.60297553]),
 '6996': array([ 0.62927048,  0.46866495]),
 '6997': array([ 0.55462023,  0.54193995]),
 '6998': array([ 0.59843194,  0.53369345]),
 '6999': array([ 0.44266166,  0.79759849]),
 '7111': array([ 0.78299206,  0.70418702]),
 '7112': array([ 0.80125788,  0.65897846]),
 '7119': array([ 0.78643533,  0.6235441 ]),
 '7126': array([ 0.3535368 ,  0.69776103]),
 '7129': array([ 0.41808184,  0.48431372]),
 '7131': array([ 0.10079132,  0.72521148]),
 '7132': array([ 0.39573747,  0.58906157]),
 '7133': array([ 0.26619172,  0.80217829]),
 '7138': array([ 0.14552812,  0.60598159]),
 '7139': array([ 0.44158378,  0.47817323]),
 '7144': array([ 0.06918631,  0.4474497 ]),
 '7148': array([ 0.09061127,  0.42891357]),
 '7149': array([ 0.13362591,  0.48105541]),
 '7161': array([ 0.36504107,  0.6012461 ]),
 '7162': array([ 0.55505369,  0.59432961]),
 '7163': array([ 0.36270462,  0.36025417]),
 '7169': array([ 0.54706216,  0.5079711 ]),
 '7187': array([ 0.17753982,  0.68656445]),
 '7188': array([ 0.48893716,  0.42472552]),
 '7211': array([ 0.58545152,  0.61810572]),
 '7212': array([ 0.50393949,  0.59676782]),
 '7213': array([ 0.43055259,  0.40568384]),
 '7219': array([ 0.40205993,  0.35339318]),
 '7223': array([ 0.34205465,  0.71383412]),
 '7224': array([ 0.36311234,  0.62755439]),
 '7233': array([ 0.23771603,  0.4414971 ]),
 '7234': array([ 0.37894045,  0.6081294 ]),
 '7239': array([ 0.51705402,  0.46546139]),
 '7243': array([ 0.4681649 ,  0.73995858]),
 '7244': array([ 0.29543643,  0.82116797]),
 '7245': array([ 0.28034235,  0.80174843]),
 '7246': array([ 0.36363926,  0.40811367]),
 '7247': array([ 0.57562465,  0.66388767]),
 '7248': array([ 0.7694627 ,  0.85134527]),
 '7251': array([ 0.31878809,  0.41024952]),
 '7252': array([ 0.33262925,  0.43282861]),
 '7259': array([ 0.32598237,  0.45326066]),
 '7263': array([ 0.30392473,  0.52763251]),
 '7264': array([ 0.31689188,  0.80679259]),
 '7267': array([ 0.32016046,  0.56574871]),
 '7268': array([ 0.27874871,  0.86410348]),
 '7269': array([ 0.32626899,  0.53011587]),
 '7271': array([ 0.61783637,  0.72647066]),
 '7272': array([ 0.3825399 ,  0.33982403]),
 '7281': array([ 0.35329976,  0.47956917]),
 '7283': array([ 0.59422616,  0.72004237]),
 '7284': array([ 0.28413942,  0.73099065]),
 '7361': array([ 0.31444576,  0.72506839]),
 '7362': array([ 0.33515958,  0.63287213]),
 '7367': array([ 0.29863928,  0.77262448]),
 '7368': array([ 0.35636586,  0.58416133]),
 '7369': array([ 0.39396456,  0.53584328]),
 '7371': array([ 0.34361129,  0.60922176]),
 '7372': array([ 0.40862659,  0.58969947]),
 '7373': array([ 0.37051417,  0.51554328]),
 '7411': array([ 0.30758091,  0.65328061]),
 '7412': array([ 0.35432753,  0.50740605]),
 '7413': array([ 0.36003401,  0.45541394]),
 '7414': array([ 0.64461802,  0.43001559]),
 '7415': array([ 0.22847831,  0.29343766]),
 '7416': array([ 0.37853714,  0.56547645]),
 '7421': array([ 0.2706857 ,  0.42055161]),
 '7422': array([ 0.35285934,  0.4132923 ]),
 '7423': array([ 0.44191851,  0.62878915]),
 '7428': array([ 0.32048991,  0.61346783]),
 '7429': array([ 0.34892629,  0.56237711]),
 '7431': array([ 0.49895019,  0.63958327]),
 '7432': array([ 0.40188343,  0.52612177]),
 '7434': array([ 0.42777724,  0.76238958]),
 '7435': array([ 0.32212875,  0.66867816]),
 '7436': array([ 0.48472533,  0.64492014]),
 '7439': array([ 0.40385219,  0.47697277]),
 '7441': array([ 0.55824511,  0.85718571]),
 '7442': array([ 0.37717431,  0.44692482]),
 '7449': array([ 0.42393041,  0.44029554]),
 '7451': array([ 0.39657537,  0.61962323]),
 '7452': array([ 0.31213793,  0.50805802]),
 '7491': array([ 0.39085064,  0.49279807]),
 '7492': array([ 0.44777004,  0.52780879]),
 '7493': array([ 0.44357512,  0.5786204 ]),
 '7499': array([ 0.40424483,  0.43523493]),
 '7511': array([ 0.21147281,  0.15472268]),
 '7512': array([ 0.23020612,  0.22725312]),
 '7518': array([ 0.17173592,  0.25252904]),
 '7521': array([ 0.22056914,  0.18477747]),
 '7522': array([ 0.1080467 ,  0.37179404]),
 '7523': array([ 0.15374747,  0.34544928]),
 '7525': array([ 0.17031443,  0.31034572]),
 '7528': array([ 0.21055285,  0.27111853]),
 '7591': array([ 0.2046156 ,  0.19754963]),
 '7599': array([ 0.20893098,  0.3238278 ]),
 '7611': array([ 0.21927849,  0.39787855]),
 '7612': array([ 0.13249222,  0.23645634]),
 '7621': array([ 0.21478579,  0.35991937]),
 '7622': array([ 0.0982712 ,  0.25571895]),
 '7628': array([ 0.07537459,  0.25307804]),
 '7631': array([ 0.10701435,  0.24058682]),
 '7638': array([ 0.13808367,  0.27939741]),
 '7641': array([ 0.19334456,  0.10392544]),
 '7642': array([ 0.0535842 ,  0.27261462]),
 '7643': array([ 0.15894842,  0.10935721]),
 '7648': array([ 0.19801184,  0.41721466]),
 '7649': array([ 0.18555615,  0.19227612]),
 '7711': array([ 0.79555354,  0.54173005]),
 '7712': array([ 0.19630172,  0.16102436]),
 '7721': array([ 0.46218656,  0.65429317]),
 '7722': array([ 0.15087716,  0.25272253]),
 '7723': array([ 0.17694919,  0.23485623]),
 '7731': array([ 0.77325681,  0.47774131]),
 '7732': array([ 0.41882484,  0.57850638]),
 '7741': array([ 0.10795824,  0.61134124]),
 '7742': array([ 0.25236234,  0.56680808]),
 '7751': array([ 0.23741773,  0.35617705]),
 '7752': array([ 0.65050003,  0.4753557 ]),
 '7753': array([ 0.24413683,  0.26112643]),
 '7754': array([ 0.01223659,  0.28807345]),
 '7757': array([ 0.02393682,  0.31112548]),
 '7758': array([ 0.45186625,  0.70401927]),
 '7761': array([ 0.20132396,  0.38818846]),
 '7762': array([ 0.21428958,  0.4404275 ]),
 '7763': array([ 0.19476213,  0.35896387]),
 '7764': array([ 0.17353918,  0.35881009]),
 '7768': array([ 0.18841445,  0.29642223]),
 '7781': array([ 0.16478119,  0.17786682]),
 '7782': array([ 0.22862391,  0.7638567 ]),
 '7783': array([ 0.4658243 ,  0.44753166]),
 '7784': array([ 0.38132673,  0.71146836]),
 '7788': array([ 0.195401  ,  0.24316837]),
 '7810': array([ 0.50751312,  0.54251133]),
 '7821': array([ 0.48202722,  0.59392936]),
 '7822': array([ 0.35902993,  0.37821753]),
 '7831': array([ 0.61000633,  0.7490222 ]),
 '7832': array([ 0.54714296,  0.73803464]),
 '7841': array([ 0.39849108,  0.67918399]),
 '7842': array([ 0.50519799,  0.61806944]),
 '7849': array([ 0.47644139,  0.51200698]),
 '7851': array([ 0.3821887 ,  0.78542543]),
 '7852': array([ 0.78078701,  0.66253986]),
 '7853': array([ 0.24682526,  0.86619129]),
 '7861': array([ 0.66451891,  0.39075549]),
 '7868': array([ 0.61149888,  0.59279421]),
 '7911': array([ 0.37203591,  0.76082481]),
 '7912': array([ 0.74056954,  0.85014031]),
 '7913': array([ 0.33702822,  0.68221044]),
 '7914': array([ 0.33372558,  0.74154188]),
 '7915': array([ 0.64866235,  0.73852416]),
 '7919': array([ 0.5219639 ,  0.56033219]),
 '7921': array([ 0.69845476,  0.17758417]),
 '7922': array([ 0.7113436 ,  0.85918171]),
 '7923': array([ 0.08588507,  0.56201818]),
 '7924': array([ 0.04434181,  0.45333998]),
 '7928': array([ 0.1218429 ,  0.55963179]),
 '7929': array([ 0.16271909,  0.55777584]),
 '7931': array([ 0.37016692,  0.30251858]),
 '7932': array([ 0.36137539,  0.27768864]),
 '7933': array([ 0.34746463,  0.23921304]),
 '7938': array([ 0.57225467,  0.15687839]),
 '8121': array([ 0.53300242,  0.5703839 ]),
 '8122': array([ 0.77428007,  0.52232614]),
 '8124': array([ 0.03725617,  0.29147652]),
 '8211': array([ 0.75448997,  0.53243006]),
 '8212': array([ 0.67234332,  0.66020945]),
 '8219': array([ 0.65457673,  0.57164582]),
 '8310': array([ 0.92751023,  0.80489694]),
 '8421': array([ 0.84309681,  0.33939413]),
 '8422': array([ 0.86852507,  0.33846755]),
 '8423': array([ 0.82728249,  0.38966156]),
 '8424': array([ 0.83563695,  0.45307968]),
 '8429': array([ 0.87837461,  0.47333504]),
 '8431': array([ 0.82758338,  0.43273366]),
 '8432': array([ 0.88029672,  0.43019572]),
 '8433': array([ 0.91289227,  0.35640919]),
 '8434': array([ 0.86936941,  0.44665087]),
 '8435': array([ 0.85375164,  0.43375587]),
 '8439': array([ 0.88826292,  0.39065529]),
 '8441': array([ 0.89702303,  0.44324032]),
 '8442': array([ 0.9217298 ,  0.39491965]),
 '8443': array([ 0.8656368 ,  0.51298674]),
 '8451': array([ 0.8623705 ,  0.39820355]),
 '8452': array([ 0.8597894 ,  0.36067589]),
 '8459': array([ 0.89969589,  0.33741519]),
 '8461': array([ 0.85197991,  0.45072645]),
 '8462': array([ 0.87602485,  0.32024115]),
 '8463': array([ 0.8322223,  0.3583221]),
 '8464': array([ 0.88188603,  0.4363494 ]),
 '8465': array([ 0.85903803,  0.32410864]),
 '8471': array([ 0.87051975,  0.56974247]),
 '8472': array([ 0.90139444,  0.4122819 ]),
 '8481': array([ 0.89930226,  0.56997233]),
 '8482': array([ 0.10229584,  0.16685863]),
 '8483': array([ 0.76231808,  0.79956909]),
 '8484': array([ 0.8839007 ,  0.58842387]),
 '8510': array([ 0.85693744,  0.47583752]),
 '8710': array([ 0.11135255,  0.3096711 ]),
 '8720': array([ 0.23663327,  0.50932656]),
 '8731': array([ 0.57970754,  0.71914089]),
 '8732': array([ 0.43515751,  0.69341623]),
 '8741': array([ 0.32200003,  0.35169012]),
 '8742': array([ 0.28681281,  0.56180632]),
 '8743': array([ 0.42274905,  0.53476033]),
 '8744': array([ 0.28829572,  0.53424067]),
 '8745': array([ 0.28280009,  0.66957508]),
 '8748': array([ 0.30160617,  0.57274343]),
 '8749': array([ 0.31739558,  0.59084811]),
 '8811': array([ 0.17630314,  0.09980692]),
 '8812': array([ 0.29917587,  0.69796137]),
 '8813': array([ 0.24521625,  0.73754679]),
 '8821': array([ 0.20848049,  0.58080659]),
 '8822': array([ 0.23073045,  0.57692612]),
 '8830': array([ 0.0511401 ,  0.41524665]),
 '8841': array([ 0.14322101,  0.31788579]),
 '8842': array([ 0.95635533,  0.83930074]),
 '8851': array([ 0.08340829,  0.29280246]),
 '8852': array([ 0.12617427,  0.1832428 ]),
 '8921': array([ 0.50263072,  0.23618703]),
 '8922': array([ 0.59775234,  0.43155301]),
 '8924': array([ 0.18241229,  0.39622961]),
 '8928': array([ 0.61512117,  0.4730495 ]),
 '8931': array([ 0.77247015,  0.3388703 ]),
 '8932': array([ 0.67961649,  0.5321406 ]),
 '8933': array([ 0.12394852,  0.24453451]),
 '8935': array([ 0.53127688,  0.5166906 ]),
 '8939': array([ 0.60569648,  0.50569481]),
 '8941': array([ 0.01561549,  0.335989  ]),
 '8942': array([ 0.0597842 ,  0.21303431]),
 '8946': array([ 0.39006641,  0.69382423]),
 '8947': array([ 0.08591551,  0.16389595]),
 '8951': array([ 0.39496986,  0.73307224]),
 '8952': array([ 0.26056641,  0.74298155]),
 '8959': array([ 0.13669079,  0.585711  ]),
 '8960': array([ 0.11225375,  0.46670194]),
 '8972': array([ 0.11373697,  0.14567796]),
 '8973': array([ 0.88718185,  0.79525253]),
 '8974': array([ 0.14439303,  0.69584547]),
 '8981': array([ 0.08232806,  0.34341559]),
 '8982': array([ 0.1147929 ,  0.33892643]),
 '8983': array([ 0.22617619,  0.25913205]),
 '8989': array([ 0.36358074,  0.33880172]),
 '8991': array([ 0.91715766,  0.66362781]),
 '8993': array([ 0.66076173,  0.16165711]),
 '8994': array([ 0.        ,  0.26769767]),
 '8996': array([ 0.22303989,  0.48619762]),
 '8997': array([ 0.92323727,  0.56249946]),
 '8998': array([ 0.89796377,  0.6392392 ]),
 '8999': array([ 0.01559968,  0.08715906]),
 '9310': array([ 0.57264804,  0.22410863]),
 '9410': array([ 0.80084307,  0.22545207]),
 '9510': array([ 0.48398887,  0.72090865]),
 '9610': array([ 0.31456093,  0.1802939 ]),
 '9710': array([ 0.96938373,  0.22468388])}



In [78]:

    
#-Check Entry-#
ps_nodes[ps_nodes.sitc == "0011"]









    Out[78]:







  
    
      
      sitc
      community
      x
      y
      nodesize
      leamer
      pname
      ncolor
      x_scaled
      y_scaled
    
  
  
    
      491
      0011
      239
      2847.516846
      3320.90686
      60.540527
      5
      ANIMALS OF THE BOVINE SPECIES,INCL.BUFFALOES,LIVE
      #9fb3bf
      0.636358
      0.293057



In [79]:

    
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(50,20))
ax = fig.gca()
nx.draw_networkx(ps, ax=ax, pos=coord, node_size=750, width=1)
plt.savefig("productspace1.png")

Note: File is saved Locally to view the network in greater detail



In [80]:

    
#-Let's See where Apparel Chapter 84 Nodes are Located-#
def choose_color(x):
    if x[:2] == "84":
        return "b"
    else:
        return "r"
    
nodes = pd.DataFrame(sorted(list(coord.keys())), columns=["nodeid"])
nodes["color"] = nodes["nodeid"].apply(lambda x: choose_color(x))



In [81]:

    
nodes[nodes.color == 'b'].head()



In [82]:

    
#-Get the Order of Nodes the Same as Network Node List
order = pd.DataFrame(ps.nodes()).reset_index()
order.columns = ["order", "nodeid"]
nodes = nodes.merge(order, how="inner", on="nodeid")
nodes = nodes.sort_values(by="order")



In [83]:

    
fig = plt.figure(figsize=(50,20))
ax = fig.gca()
nx.draw_networkx(ps, ax=ax, pos=coord, node_size=750, width=1, node_color = nodes.color.values, )
plt.savefig("productspace2.png")



In [84]:

    
#Can Output to use with Gephi / Cytoscape (Exploratory Network Tools)
nx.write_gml(ps, "product_space.gml")

References

[1] Zachary, W. (1977), "An Information Flow Model for Conflict and Fission in Small Groups", Journal of Anthropological Research, Vol. 33, No. 4 (Winter, 1977), pp. 452-473

[2] Cao, X., Wang X., Jin D., Cao Y. & He, D. (2013), "Identifying overlapping communities as well as hubs and outliers via nonnegative matrix factorization", Scientific Reports, Vol 3, Issue 2993

[3] Hidalgo, C.A., Klinger, B., Barabasi, A.-L., Hausmann, R. (2007), "The Product Space Conditions the Development of Nations", Science, Vol 317, pp 482-487

[4] Atlas of Complexity (http://atlas.cid.harvard.edu/)

[5] The Observatory of Economic Complexity (http://atlas.media.mit.edu/en/)

[6] Atlas of Complexity Gride Points for Nodes sourced from http://www.michelecoscia.com/?page_id=223

[7] Balassa, B. (1965), "Trade Liberalisation and Revealed Comparative Advantage", The Manchester School, 33, 99-123.



In [ ]:

	P1	P2	weight
0	Shirts	Pants	0.900
1	Shirts	Cars	0.050
2	Shirts	Cows	0.010
3	Shirts	Sugar	0.010
4	Pants	Cars	0.050
5	Oil	Sugar	0.010
6	Oil	Cars	0.005
7	Oil	Aircraft	0.005
8	Cars	Aircraft	0.500
9	Cars	Aircraft	0.500
10	Wheat	Sugar	0.400
11	Wheat	Cows	0.200
12	Cows	Sugar	0.100
13	Cows	Cars	0.001
14	Cows	Pants	0.060

	year	origin	sitc	export
0	1962	AFG	0230	4000.0
1	1962	AFG	0250	66000.0
2	1962	AFG	0540	74000.0
3	1962	AFG	0545	17000.0
4	1962	AFG	0548	33000.0

	year	origin	sitc	export
2213361	2000	AUS	0011	260132101.0
2213362	2000	AUS	0012	167611315.0
2213363	2000	AUS	0013	280098.0
2213364	2000	AUS	0014	548603.0
2213365	2000	AUS	0015	134571371.0

year	1962	1963	1964	1965	1966	1967	1968	1969	1970	1971	...	2005	2006	2007	2008	2009	2010	2011	2012	2013	2014
sitc
0010	NaN	NaN	NaN	1000.0	NaN	NaN	NaN	1000.0	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
0011	1286000.0	675000.0	1177000.0	1700000.0	748000.0	548000.0	586000.0	747000.0	331000.0	443000.0	...	283381176.0	284544151.0	365401097.0	571444736.0	586230117.0	793934945.0	650001457.0	627274883.0	728998394.0	1.114734e+09
0012	2600000.0	3619000.0	5593000.0	3560000.0	4725000.0	5078000.0	5578000.0	6828000.0	11517000.0	13321000.0	...	211748259.0	215548475.0	216467212.0	307677017.0	296862273.0	319842757.0	347639153.0	296233596.0	174194459.0	2.209887e+08
0013	26000.0	52000.0	81000.0	69000.0	60000.0	32000.0	99000.0	108000.0	108000.0	189000.0	...	260620.0	309274.0	549802.0	153458.0	37764.0	224888.0	NaN	NaN	345256.0	2.302240e+05
0014	375000.0	398000.0	445000.0	187000.0	83000.0	73000.0	30000.0	83000.0	263000.0	258000.0	...	592790.0	2842029.0	3599939.0	4708910.0	6542608.0	5466152.0	5372783.0	4237839.0	1165708.0	3.056300e+05

sitc	0011	0012	0013	0014	0015	0111	0112	0113	0114	0115	...	8994	8996	8997	8998	8999	9310	9410	9510	9610	9710
origin
ABW	NaN	NaN	NaN	NaN	NaN	0.000189	NaN	0.008223	NaN	NaN	...	NaN	0.009006	0.000910	0.006053	0.001468	1.264113	0.104766	NaN	NaN	0.76756
AFG	NaN	NaN	NaN	NaN	NaN	0.176481	0.066896	NaN	NaN	NaN	...	NaN	0.067773	0.782384	NaN	NaN	0.066374	NaN	NaN	NaN	NaN
AGO	NaN	NaN	0.000012	NaN	NaN	0.001756	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	0.000137	NaN	0.026715	NaN	NaN	0.005944	NaN
AIA	NaN	NaN	NaN	NaN	NaN	15.716737	6.270237	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	0.649915	6.906314	NaN	NaN	NaN	NaN
ALB	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.006836	NaN	...	0.304894	0.014938	0.146457	0.093441	NaN	0.357895	25.006762	NaN	3.426721	NaN

productcode2	0011	0012	0013	0014	0015	0111	0112	0113	0114	0115	...	8994	8996	8997	8998	8999	9310	9410	9510	9610	9710
productcode1
0011	1.000000	0.448276	0.206897	0.206897	0.275862	0.37500	0.275862	0.344828	0.206897	0.241379	...	0.068966	0.137931	0.162791	0.068966	0.034483	0.244898	0.178082	0.206897	0.137931	0.175
0012	0.448276	1.000000	0.178571	0.107143	0.321429	0.25000	0.392857	0.142857	0.178571	0.142857	...	0.035714	0.178571	0.116279	0.071429	0.000000	0.163265	0.219178	0.071429	0.107143	0.125
0013	0.206897	0.178571	1.000000	0.227273	0.100000	0.12500	0.083333	0.500000	0.312500	0.153846	...	0.153846	0.210526	0.093023	0.080000	0.142857	0.163265	0.095890	0.090909	0.071429	0.075
0014	0.206897	0.107143	0.227273	1.000000	0.181818	0.15625	0.125000	0.318182	0.363636	0.090909	...	0.090909	0.181818	0.139535	0.120000	0.045455	0.142857	0.178082	0.409091	0.045455	0.200
0015	0.275862	0.321429	0.100000	0.181818	1.000000	0.28125	0.250000	0.150000	0.200000	0.250000	...	0.050000	0.350000	0.139535	0.040000	0.050000	0.122449	0.150685	0.136364	0.150000	0.125

	sitc	community	x	y	nodesize	leamer	pname	ncolor
0	6932	999	4551.899658	2540.087158	48.780762	8	WIRE,TWISTED HOOP FOR FENCING OF IRON OR STEEL	#9c9a87
1	7362	10	216.835098	5013.330811	65.180725	9	METAL FORMING MACHINE TOOLS	#4037ab
2	7911	10	538.914902	5650.589111	53.997589	9	RAIL LOCOMOTIVES,ELECTRIC	#4037ab
3	8946	10	696.394257	5316.897949	57.695251	7	NON-MILITARY ARMS AND AMMUNITION THEREFOR	#4037ab
4	7264	10	57.284065	5879.528076	73.333267	9	PRINTING PRESSES	#4037ab