Note | Applied Social Network Analysis in Python

date
Apr 22, 2021
slug
social-network-analysis-python
status
Published
summary
A network is a representation of connections among a set of nodes and edges.
tags
Academic
Sociology
Reading
Communication
Data Analysis
type
Post
Contents

I. Why Study Networks and Basics of NetworkX

Definition

  1. Networks (of Graph): a representation of connections among a set of
      • Nodes: items
      • Edges: connections
      • G = nx.Graph()
  1. Edge Direction:
      • Symmetric relationships
      • Asymmetric relationships
      • G.add_edge('A', 'B')
  1. Weighted networks: a network where edges are assigned a (typically numerical) weight
      • G.add_edge('A', 'B', weight = 6)
  1. Signed networks: a networks where edges are assined positive of negative sign
      • G.add_edge('A', 'B', sigh = '+')
  1. Other edge attributes
      • G.add_edge('A', 'B', relation = 'friend')
  1. Multigraph: a network where multiple edges can connect the same nodes (parallel edges nx.MultiGraph
    1. G.add_edge('A', 'B', relation = 'friend')
      G.add_edge('A', 'B', relation = 'neighbor')

Node and Edge Attributes

  1. Adding
    1. G = nx Grapho
      G.add edge('A', 'B', weight=6, relation='family')
      G.add node('A', role='trader')
  1. Accessing
    1. G.nodes(data=True)  #list of all nodes with attributes
      G.node['A']['role'] #role of node A

Bipartite graphs

  1. Bipartite Graph: a graph whose nodes can be split into two sets L and R and every edge connects an node in L with a node in R
    1. from networkx algorithms import bipartite
      B = nx.Grapho #No separate class for bipartite graphs
      B.add_nodes_from(['A', 'B', 'C', 'D', 'E'], bipartite=0) #label one set of nodes 0
  1. Check if bipartite graph: bipartite.is_bipartite(B)
  1. Check if bipartite nodes:
    1. X = set([1, 2, 3, 4]
      bipartite.is_bipartite_node(B, X)
  1. Getting each set of a bipartite graph: bipartite.sets(B)
  1. L-bipartite graph projection: Network of nodes in group L, where a pair of nodes is connected if they have a common neighbor in R in the bipartite graph.
    1. X = set(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'])
      P = bipartite.projected_graph(B, X)
  1. L-bipartite weighted graph projection: An L-bipartite graph projection with Weights on the edges that are proportional to the number of common neighbors between the nodes bipartite.weighted_projected_graph(B, X)

II. Network Connectivity

Clustering coefficient

  1. Triadic closure: The tendency for people who share connections in a social network to become connected
  1. Clustering coefficient: measures the degree to which nodes in a network tend to "cluster" or form triangles
  1. Local clustering coefficient of a node: Fraction of pairs of the nodes friends that are friends with each other nx.clustering(G, F)
  1. Global clustering coefficient
      • Average: nx.average_clustering(G)
      • Transitivity: Ratio of number of triangles and number of open triads in a network: nx.transitivity(G)
        • notion image

Distance measures

  1. Distance:
  1. Paths: a sequence of nodes connected by an edge
  1. Path length: number of steps it contains from beginning to end
  1. Distance between two nodes: the length of the shortest path between them
    1. nx.shortest_path(G, 'A', 'H')
      nx.shortest_path_length(G, 'A', 'H')
  1. Breadth-first search: a systematic and efficient procedure for computing distances from a node to all other nodes in a large network by"discovering nodes in layers nx.bfs_tree(G, 'A')
    1. notion image
  1. Eccentricity of a node N is the largest distance between N and all other nodes nx.eccentricity(G)
  1. Characterizing distances in a network:
      • Average distance: between every pair of nodes
      • Diameter: maximum distance between any pair of nodes
      • Radius of a graph is the minimum eccentricity nx.radius(G)
  1. Identifying central and peripheral nodes
      • The periphery of a graph is the set of nodes that have eccentricity equal to the diameter.
      • The center of a graph is the set of nodes that have eccentricity equal to the radius

Connected components

Undirected graphs

  1. An undirected graph is connected if, for every pair nodes, there is a path between them nx.is_connected(G)
  1. Connected components is a subset of nodes such as:
      • Every node in the subset has a path to every other node
      • No other node has a path to any node in the subset
      nx.number_connected_components(G)
      sorted(nx.number_connected_components(G))
      nx.number_connected_components(G, 'M')

Directed graphs

  1. Connectivity in Directed Graphs
      • A directed graph is strongly connected if for every pair nodes u and v, there is a directed path from u to v and a directed path from y to u nx.is_strongly_connected(G)
      • A directed graph is weakly connected if replacing all directed edges with undirected edges produces a connected undirected graph nx.is_weakly_connecte(G)
  1. Strongly connected component is a subset of nodes such as:
      • Every node in the subset has a directed path to every other node
      • No other node has a directed path to and from every node in the subset

Network robustness

  1. Network robustness: the ability of a network to maintain its general structural properties when it faces failures or attacks
      • Type of attacks: removal of nodes or edges
      • Structural properties: connectivity
        • Node connectivity: Minimum number of nodes needed to disconnect a graph or pair of nodes nx.node_connectivity(G) / nx.minimum_node_cut(G)
        • Edge connectivity: Minimum number of edges needed to disconnect a graph or pair of nodes nx.edge_connectivity(G) / nx.minimum_edge_cut(G)
  1. Robust networks have large minimum node and edge cuts

III. Influence Measures and Network Centralization

 

Degree and Closeness Centrality

  1. Network Centrality: measures identify the most important nodes in a network
  1. Measures
      • Degree centrality
        • Assumption: important nodes have many connections
        • Number of neighbors
          • Undirected: degree nx.degree_centrality(G)
          • Directed: in/out-degree
            • nx.in_degree_centrality(G)
            • nx.out_degree_centrality(G)
      • Closeness centrality
        • Assumption: important nodes are close to other nodes
          • nx.closeness_centrality(G)
        • Disconnected Nodes: Normalized Closeness Centrality
          • nx.closeness_centrality(G, normalized=True)

      Betweenness centrality

      • Assumption: important nodes connect other nodes
      • When computing betweenness centrality, we only consider nodes such that there is at least one path between them
      • Normalized Betweenness Centrality: betwenness centrality values will be larger in graphs with many nodes. To control for this, we divide centrality values by the number of pairs of nodes in the graph (excluding v) nx.betweeness_centrality(G, normalized=Treu,endpoints=False)
        • undirected:
        • directed:
      • Approximation: computationally: betweenness_centrality(G, normalized=True, endpoints=False,k=10)
      • Subsets: betweenness_centrality_subset(G, normalized=True)
      • Edge: nx.edge_betweenness_centrality(G, normalized=True)
      • Load centrality
      • Page Rank
      • Katz centrality
      • Percolation centrality

Basic PageRank

  1. Start with
  1. Perform the Basic PageRank Update Rule:
      • Each node gives an equal share of its current PageRank to all the nodes it links to
      • The new PangeRank of each node is the sum of all the PageRank it received from other nods
      • For most networks, convergence

Scaled PageRank

  1. Problem of Basic PageRank: "stuck" ←The PageRank of a node at step k is the probability that a random walker lands on the node after taking k steps
  1. Solution: damping parameter nx.pagerank(G, alpha=0.8)
      • with probability , random walk
      • with probability , randomly choose a node to walk

Hubs and Authorities 暂不要

  1. root
  1. base
  1. HITS Algorithm
      • authoreity update rule
      • hub update rule

Centrality Examples

notion image

IV. Network Evolution

Preferential Attachment Model

Degree Distribution

  1. The degree of a node in an undirected graph is the number of neighbors it has
  1. The degree distribution of a graph is the probability distribution of the degrees over the entire network

In-degree Distribution

  1. The in-degree of a node in a directed graph is the number of in-links it has
  1. The in-degree distribution, Pin (k), of this network has the following values:

Degree distribution in Real Networks

Power law: Networks with power law distribution have many nodes with small degree and a few nodes with very large degree
notion image

Preferential Attachment Model

notion image

Small World Networks

Milgram Small World Experiment

notion image

Small World of Instant Message

notion image

Small World of Facebook

notion image

Clustering Coefficient

  • Facebook2011: High Average CC (decreases with degree)
  • Microsoft IM: Average CC of 0.13
  • IMDB actor network: Average CC 0.78

Path Length and Clustering

Social networks tento have high CC and small average path length

Small World Model

notion image
notion image
 
notion image

Summary

  • Real social networks appear to have small shortest paths between nodes and high clustering coefficient
  • The preferential attachment model produces networks with small shortest paths but very small clustering coefficient
  • The small world model starts with a ring lattice with nodes connected to k nearest neighbors (high local clustering), and it rewires edges with probability p
  • For small values of p, small world networks have small average shortest path and high clustering coefficient, matching what we observe in real networks

Link Prediction

Basic

  1. Measure I: Common Neighbors
  1. Measure II: Jaccard coeffient
  1. Measure III: Resource Allocation
  1. Measure IV: Adamic-Adar Index
  1. Measure V: Pre. Attachment

Community Information Based

  1. Measure VI: Community Common Neighbors
      • Community Structure:
        • Assume the nodes in this network belong to different communities(sets of nodes)
        • Pairs of nodes who belong to the same community and have many common neighbors in their community are likely to form an edge
  1. Measure VII: Community Resource Allocation

RSS | Reynard © 2021 - 2023

Powered byVercel