### DATA 620: HW3

##### Daina Bouquin

Dataset:
The file astro-ph.gml contains the collaboration network of scientists posting preprints on the astrophysics archive at www.arxiv.org, 1995-1999, as compiled by M. Newman.

``````

In [27]:

import networkx as nx
import pylab as plt # for plotting

``````
``````

In [4]:

``````
``````

In [7]:

H # verify that the data has been read in

``````
``````

Out[7]:

<networkx.classes.graph.Graph at 0x105c36690>

``````
``````

In [12]:

print "Nodes:", H.number_of_nodes()
print "Edges:", H.number_of_edges()

``````
``````

Nodes: 16706
Edges: 121251

``````
``````

In [17]:

# draw the network
%matplotlib inline
nx.draw(H) # Well, that isn't very useful...

``````
``````

``````
``````

In [121]:

#Create small subgraph (first 20 nodes) just for graphing demo
Hsub = H.subgraph([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19])
nx.draw_random(Hsub)

``````
``````

``````
``````

In [122]:

# Degree centrality (valency) of a node of a graph:
# the fraction of nodes a node v is connected to
nx.degree(H)

``````
``````

``````
``````

In [87]:

# Looking to find diameter, hit issue with disconnected graph
nx.is_connected(H)

``````
``````

Out[87]:

False

``````
``````

In [99]:

# Create list of connected graphs
Gcc = nx.connected_component_subgraphs(H)

``````
``````

In [102]:

# Find number of nodes for each connected graph
[len(g) for g in Gcc] # the first and largest is most important

``````
``````

``````
``````

In [117]:

# Show all of the connected components
sorted(nx.connected_components(H))

``````
``````

Out[117]:

``````

### Analysis with Gephi:

I was able to calculate the diameter of the complete graph (maximum eccentricity of any vertex in the graph/greatest distance between any pair of vertices) despite disconnection issues, and average path length of the network using Gephi: