In this section we'll cover some common network analysis techniques. This doesn't cover everything NetworkX is capable of, but is a should get you started exploring the rest of the package.

First we are going to need import some other packages.

```
In [ ]:
```import networkx as nx
import numpy as np
import matplotlib.pyplot as plt

This is a little jupyter magic (literally], to make sure plots show up in the notebook.

```
In [ ]:
```%matplotlib inline

```
In [ ]:
```BA = nx.barabasi_albert_graph(10000,1)

```
In [ ]:
```degrees = BA.degree()
degree_sequence = sorted(degrees.values(),reverse=True)

We could do this in one line if we wanted...

```
In [ ]:
```degree_sequence = sorted(BA.degree().values(),reverse=True)

```
In [ ]:
```# loglog tells matplotlib to use log scales.
# The x values, range(1,10001), are the ranks,
# and the degree_sequence are the y values.
# The String 'k.' means use black (k) dots (.)
plt.loglog(range(1,BA.order()+1),degree_sequence,'k.')
plt.xlabel('Rank')
plt.ylabel('Degree')
plt.ylim(1,max(degree_sequence)+1)
plt.xlim(.9,10001)

Matplotlib is a powerful tool more info can be found on using it here.

In the original paper where the Barabási–Albert model was introduced it was stated that it provided a good explanation for the Autonomous Sytems Network. Let's build a network with similar degree structure to a recent snapshot of the Autonomous Systems Network. The data was retrieved from UCLA's Internet Research Lab's Topology Data.

First, read in the network, it is in the data folder labeled `AS2013Sep.pickle`

```
In [ ]:
```AS = nx.read_gpickle('./data/AS2013Sep.pickle')

Let's find out the number of nodes and edges in the networks, and the average degree of the network

```
In [ ]:
```AS.order(),AS.size(),(2.0*AS.size())/AS.order()

```
In [ ]:
```BA = nx.barabasi_albert_graph(#Fill in the rest AS.order(),8)

Find the degree sequence of each, and use the code below to plot each. Is this a good model?

```
In [ ]:
```BA_deg_seq = #
AS_deg_seq = #

```
In [ ]:
```plt.loglog(BA_deg_seq,'b.',label='BA Model')
plt.loglog(AS_deg_seq,'r.',label='AS Data')
plt.xlabel('Rank')
plt.ylabel('Degree')
plt.legend()

It is oftern claimed that networks have power-law degree distribution. That is the probability of degree k is proportional to

$$Pr[k] \sim \frac{1}{k^\alpha}$$Where, $\alpha$ is some constant. Often this is claimed pased on linear regressions of degree/rank plots. However, the appropriate way to fit power-laws is using maximum likelihood techniques. See [1] for more info

[1] Clauset, Aaron, Cosma Rohilla Shalizi, and Mark EJ Newman. "Power-law distributions in empirical data." *SIAM review* 51.4 (2009): 661-703.

Identifying important nodes is often a common technique in complex network analysis. Degree is a simple measure of centrality, but there are many others. Let's explore a few on some real data on Terrorists in Colonial America. I wish I could claim I came up with this, but I didn't all credit goes to

[1] http://kieranhealy.org/blog/archives/2013/06/09/using-metadata-to-find-paul-revere/

[2] Fischer, David Hackett. Historians' fallacies: Toward a logic of historical thought. Vol. 1970. London: Routledge & Kegan Paul, 1971.

The data file contains a graph with two types of nodes, 'Organization' and 'Person'. Organizations are different groups who met in colonial America and supported independence from England. People are people attending those meetings. First let's read the file

```
In [ ]:
```G = nx.read_gpickle('./data/Revolutionaries.gpickle')

Let's Check out the edges

```
In [ ]:
```G.edges()

```
In [ ]:
```orgs = [n for n in G.nodes() if G.node[n]['type']=='Organization']
people = [n for n in G.nodes() if G.node[n]['type']=='Person']

Now we can use that handy function

```
In [ ]:
```O = nx.bipartite.weighted_projected_graph(G,orgs) # Connections between people based on co-meeting attendence
R = nx.bipartite.weighted_projected_graph(G,people) # Overlap among meeting attendees

*weighted* edges.

```
In [ ]:
```deg = R.degree(weight='weight')
sorted(deg.items(), # Look at all the degrees as a tuple (node,degree)
key=lambda i: i[1], # Sort the list by the second item in the tuple
reverse=True)[:5] # Reverse the list (highest degree first), and only give the first 5

Another important measure of centrality is the betweenneess centrality

```
In [ ]:
```btw = nx.betweenness_centrality(R,weight='weight')
sorted(btw.items(),
key=lambda i: i[1],
reverse=True)[:5]

`plt.loglog`

, use `plt.semilogy`

```
In [ ]:
```

Do a similar analysis with the network of oranizations `O`

, which is organziation is most central to the revolution?

First calculate the degree and betweenness for each of the nodes

```
In [ ]:
```deg = O.degree()
btw = nx.betweenness_centrality(O)

```
In [ ]:
```size = [G.degree(org)**2 for org in O.nodes()]
width = [d['weight'] for (u,v,d) in O.edges(data=True)]
plt.figure(figsize=(14,10)) #We'll make it bigger
nx.draw(M, width=width,
with_labels=True,
edge_color='r',
node_color='b',
node_size=size,
font_size=24)

```
In [ ]:
```

```
In [ ]:
```

Some random graph models experience phase transitions like other physical phenomona. For example, the Erdos-Renyi graph that we already explored experiences a phase transition in the size of the giant connected component when the average degree of the model cross a certain threshold. We are going to use NetworkX to explore that threshold.

Recall that an Erdos-Renyi random graph is one where there is an edge between each node with probability $p$. The ER model has expected number of Edges $\mathbb{E}[|E|]={n \choose 2}p$. With a little math on the degree distribution, we can find that the average degree will be $k=np$, and $p=\frac{k}{n}$.

The giant component is defined as the largest connected component in the graph. Let's explore how the size of the giant component changes for a graph of size 100, as we incrase the average degree.

```
In [ ]:
```n = 100
ks = np.arange(.1,3,.02)
GCC_size = []
for k in ks:
G = nx.gnp_random_graph(n,k/n)
GC = sorted(nx.connected_component_subgraphs(G),key=lambda C: len(C),reverse=True)[0]
GCC_size.append(float(len(GC))/n)

```
In [ ]:
```plt.plot(ks,GCC_size,marker='.',lw=0)
plt.xlabel('Average Degree')
plt.ylabel('Relative size of Giant Component')

```
In [ ]:
```N = 50 # Number of times to create a graph
n = 100 # Graph Size
ks = np.arange(.1,3,.1) # A bunch of average degrees, separated by .1 from .1 to 3
GCC_size = [] #List to store the size of the giant component
for k in ks:
GCs = []
for _ in range(N):
G = nx.gnp_random_graph(n,k/n) #generate teh graph
GC = sorted(nx.connected_component_subgraphs(G),key=lambda C: len(C),reverse=True)[0] #graph teh largest component
GCs.append(float(len(GC))/n) # Measure it's size
GCC_size.append(np.mean(GCs)) # Take the average and append it to the list

```
In [ ]:
```plt.plot(ks,GCC_size,marker='.',lw=0)
plt.xlabel('Average Degree')
plt.ylabel('Average Relative size of Giant Component')

Now let's test it for various values of $n$

```
In [ ]:
```ks = np.arange(.1,3,.1)
N = 50
for n in [10,50,100,250,500]:
print(n)
GCC_size = []
for k in ks:
GCs = []
for _ in range(N):
G = nx.gnp_random_graph(n,k/n)
GC = sorted(nx.connected_component_subgraphs(G),key=lambda C: len(C),reverse=True)[0]
GCs.append(float(len(GC))/n)
GCC_size.append(np.mean(GCs))
plt.plot(ks,GCC_size,marker='.',label=str(n),lw=0)
plt.legend(loc='lower right')
plt.xlabel('Average Degree')
plt.ylabel('Relative size of Giant Component')

Notice how the transition gets sharper as $n$ gets larger.

```
In [ ]:
```import community

The first is a classic algorithm due to Girvan-Newman. It progressively removes the highest betweenness edges from a graph until graph becomes disconnected. Then it repeateds the process into successively smaller groups. Let's test it on a generated graph we know has two communities.

A connected caveman graph is a graph with $k$ complete graphs connected in a ring.

```
In [ ]:
```G = nx.connected_caveman_graph(2,10)

```
In [ ]:
```group1 = range(10)
group2 = range(10,20)

Let's see what partitions Girvan Newman can partition the graph correctly

```
In [ ]:
```comm = community.girvan_newman(G)

`comm`

will have the breakdown of the graph at each stage of the algorithm, so the first item in the list is the graph broken into two parts.

```
In [ ]:
```comm[0]

Now you try with the karate club graph

```
In [ ]:
```KC = nx.karate_club_graph()

```
In [ ]:
```comm = community.girvan_newman(KC)

`club`

which is the group affiliatino and is either `Mr. Hi`

or `Officer`

.

```
In [ ]:
```KC.nodes(data=True)

```
In [ ]:
```mrHi = #
officer = #

Compare them to the divisions found by the algorithm.

```
In [ ]:
```sorted(comm[0][0]),sorted(mrHi)

```
In [ ]:
```sorted(comm[0][1]),sorted(officer)

```
In [ ]:
```colors = []
for n in KC.nodes():
if n in comm[0][0]:
colors.append('r')
else:
colors.append('b')
lwidths = []
for n in KC.nodes():
if n in comm[0][0] and n in mrHi:
lwidths.append(0.5)
elif n in comm[0][1] and n in officer:
lwidths.append(0.5)
else:
lwidths.append(2)
nx.draw(KC,node_color=colors,linewidths=lwidths)

```
In [ ]:
```

```
In [ ]:
```