HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Social network evolution

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
socialnetworkevolution

Problem

I am writing a piece of code which models the evolution of a social network. The idea is that each person is assigned to a node and relationships between people (edges on the network) are given a weight of +1 or -1 depending on whether the relationship is friendly or unfriendly.

Using this simple model you can say that a triad of three people is either "balanced" or "unbalanced" depending on whether the product of the edges of the triad is positive or negative.

Finally, what I am trying to do is implement an ising type model. In other words, random edges are flipped and the new relationship is kept if the new network has more balanced triangles (a lower energy) than the network before the flip, if that is not the case then the new relationship is only kept with a certain probability.

I have written the following code, however the dataset I have contains ~120k triads, and as a result it will take 4 days to run!

Could anyone offer any tips on how I might optimise the code?

```
#Importing required librarys

try:
import matplotlib.pyplot as plt
except:
raise

import networkx as nx
import csv
import random
import math

def prod(iterable):
p= 1
for n in iterable:
p *= n
return p

def Sum(iterable):
p= 0
for n in iterable:
p += n[3]
return p

def CalcTriads(n):
firstgen=G.neighbors(n)
Edges=[]
Triads=[]

for i in firstgen:
Edges.append(G.edges(i))

for i in xrange(len(Edges)):
for j in range(len(Edges[i])):# For node n go through the list of edges (j) for the neighboring nodes (i)
if set([Edges[i][j][1]]).issubset(firstgen):# If the second node on the edge is also a neighbor of n (its in firstgen) then keep the edge.
t=[n,Edges[i][j][0],Edges[i][j][1]]
t.sort()
Triads.append(t)# Add found nodes to Triads.

new_Triads = []# Delete duplicate triads.
for elem in Triads:
if elem not in new_Triads:

Solution

#Importing required librarys


I can see that. Don't comment on what's obvious from the syntax

try:
    import matplotlib.pyplot as plt
except:
    raise


Catching an exception only to rethrow it? That's a waste

import networkx as nx
import csv
import random
import math

def prod(iterable):
    p= 1
    for n in iterable:
        p *= n
    return p


You may want to look at numpy, it has function like this written in efficient C.

def Sum(iterable):
    p= 0


p? why p?

for n in iterable:
        p += n[3]
    return p


The python style guide recommends lowercase_with_underscores for global function names. The name also suggests its a generic summing function which its not. No indication is given for what kind of data it expects. A docstring should at least explain that.

def CalcTriads(n):  
    firstgen=G.neighbors(n)


Don't use global variables. Instead pass G in as a parameter. Also, I recommend against calling it G.

Edges=[]
    Triads=[]


Python style guide recommemnds lowercase_with_underscores for local variables names

for i in firstgen:
        Edges.append(G.edges(i))


I recommend a list comprehnsion

Edges = [G.edges(i) for i in firstgen]

    for i in xrange(len(Edges)):


Don't iterate over the integers, iterate over the edges. Use for edge in Edges:

for j in range(len(Edges[i])):# For node n go through the list of edges (j) for the neighboring nodes (i)


I recommend putting the comment on a seperate line from the if. Otherwise it tends to wrap off the side of the screen and that's annoying. Also, iterate over the container, not the indexes. Use something like for source, dest in edges:

if set([Edges[i][j][1]]).issubset(firstgen):# If the second node on the edge is also a neighbor of n (its in firstgen) then keep the edge.


Create a list with a single element, putting that in a set, simply to call issubet is silly. Use Edges[i][j][1] in firstgen.

t=[n,Edges[i][j][0],Edges[i][j][1]]
                t.sort()
                Triads.append(t)# Add found nodes to Triads.

    new_Triads = []# Delete duplicate triads.
    for elem in Triads:
        if elem not in new_Triads:
            new_Triads.append(elem)
    Triads = new_Triads


That is an inefficient way of doing that. Instead, make Triads a set and put tuples rather then lists in it. That'll be faster and easier to read.

for i in xrange(len(Triads)):# Go through list of all Triads finding the weights of their edges using G[node1][node2]. Multiply the three weights and append value to each triad.


This code would be way cleaner if you used for triad in Triads:

a=G[Triads[i][0]][Triads[i][1]].values()
            b=G[Triads[i][1]][Triads[i][2]].values()
            c=G[Triads[i][2]][Triads[i][0]].values()
            Q=prod(a+b+c)
            Triads[i].append(Q)

    return Triads

###### Import sorted edge data ######       
li=[]


Bad name, gives no idea what it intends

with open('Sorted Data.csv', 'rU') as f:


I think you are supposed to read csv files in a binary mode.

reader = csv.reader(f)
    for row in reader:
        li.append([float(row[0]),float(row[1]),float(row[2])])


Use for source, dest, value in row: li.append(float(source), float(dest), float(value))

Do you really want floats? Are the source and destination really floats or are they ints?

G=nx.Graph()
G.add_weighted_edges_from(li)

for i in xrange(800000):


Don't execute loop at the module level. Code written inside a function will run faster.
e = random.choice(li) # Choose random edge

TriNei=[]
    a=CalcTriads(e[0]) # Find triads of first node in the chosen edge


Don't use variable names like a, a one letter variable name is terrible waste of information capacity.

for i in xrange(0,len(a)):


Don't use xrange to iterate over containers.

if set([e[1]]).issubset(a[i]): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
            TriNei.append(a[i])         
    preH=-Sum(TriNei) # Save the "energy" of all the triads of which the edge is a member

    e[2]=-1*e[2]# Flip the weight of the random edge and create a new graph with the flipped edge   
    G.clear()
    G.add_weighted_edges_from(li)


You should be able to find and change the edge weight directly. Reconstructing the whole graph is going to be kinda slow.

TriNei=[]
    a=CalcTriads(e[0])  
    for i in xrange(0,len(a)):
        if set([e[1]]).issubset(a[i]):
            TriNei.append(a[i])
    postH=-Sum(TriNei)# Calculate the post flip "energy".


Seeing the same piece of code in two place is a big hint. Write a function!

```
if postH<preH:# If the post flip energy is lower then the pre flip energy keep the change
continue

elif random.random() < 0.92: # If the post flip energy is higher then only keep the change with some small probability. (0.92 is an approximate placeholder for exp(

Code Snippets

#Importing required librarys
try:
    import matplotlib.pyplot as plt
except:
    raise
import networkx as nx
import csv
import random
import math

def prod(iterable):
    p= 1
    for n in iterable:
        p *= n
    return p
def Sum(iterable):
    p= 0
for n in iterable:
        p += n[3]
    return p

Context

StackExchange Code Review Q#3511, answer score: 6

Revisions (0)

No revisions yet.