Network Mining

What is a network?

Difference between network mining and other data mining?

  • In data mining, we only look at attributes of each instance
  • In network mining, we also consider links/relationship between instances

Types of questions in network mining

  • Identify the most popular user
  • Identify th emost influential user
  • Find users who are likely to ..
  • Recommending who should be linked to whom?
  • Detect different communities
  • Which user has unusual characteristics?

Network analysis tasks

  • Construction/extraction of a network
  • Characterize the network (average path-length, clustering-coefficient, ...)
  • Node classification (collective classification)
  • Node anomaly detection
  • Link prediction
  • Influence maximization
  • Community finding

Network Extraction/Construction

  • What are the nodes, attributes, and links?
    • For example, in a social network, the nodes are users, the links indicate friendship/followers/supporters of nodes. Links represent some sort of interactions between users.
    • For a publication database, node could be authors, or the papers, ...
    • FourSquare data: data contain users, venues, spatio-temporal data
    • Brain connectivity: data includes time-series of voxels intensity

Network Characterization

  • Network metrics
    • Node
      • Degree
      • Closeness
      • Betweenness
      • Clustering coefficient
    • Node pair
      • Graph distance
      • Common neighbors
      • Jaccard's coef
      • Adameic/adar
      • Pref. attachment
      • Katz
      • Hitting time
      • Rooted pageRank
      • simRank
      • Bibliographic metric
    • Network
      • Characteristic path-length
      • Clustering coefficient
      • Min-cut

Small-world network:

  • High clustering coefficient
  • Low average path-length

Power-law distribution

  • Degree of nodes have power-law distribution $P(d=k) \approx \frac{1}{k^\alpha}$

Random graph model

  • Binomial degree distribution

Preferential attachement model

  • a new node being added to the network, has probabity of creating the links to existing nodes proportional to their degrees

Node classification

  • Nodes that are linked together are more likely to be in the same class

Label Propagation

  • $f^t = (1-\alpha)y + \alpha S f^{t-1}$
    • $S=D^{-1/2}WD^{1/2}$
  • Metric-based approcach
  • Classification approach

Finding Influential Nodes

  • Hub: a node with many outgoing link to other influential nodes
    • obtained by computing the eigenvectors of $AA^T$
  • Authority: a node with many incoming links from hub nodes
    • obtained by computing the eigenvectors of $A^TA$

PageRank algorithm

  • Random-walk
  • Teleportation to allow visiting nodes with no path
    • step1: initialize
    • step2:
Example: Zachary karate club

Community Detection

  • Partition the graph using the links
$$ A_{ij} = \left\{\begin{array}{ll}1 && \text{if there is an edge} (i,j)\\ 0 && \text{otherwise}\end{array} \right. $$

In [ ]: