patternpythonMinor
Creating nodes and edges
Viewed 0 times
creatingnodesandedges
Problem
I'm looking for a way to optimize this piece of code in Python.
The format of
all_users holds all the Twitter users and friends_ids is an array of IDs.g = Graph(directed=True)
all_users = users.find({}, { 'id_str' : 1, 'friends_ids' : 1}) # This is from MongoDb
names = []
edges = []
print 'Processing first 1000 batch'
for idx, user in enumerate(all_users):
if idx % 1000 == 0 and idx != 0:
print 'Processing ' + str(idx) + ' batch'
names.append(str(user['id_str']))
for friend_id in user['friends_ids']:
edges.append((user['id_str'], str(friend_id)))
if not str(friend_id) in names:
names.append(str(friend_id))
print len(names)
print len(edges)all_users has 4k records but each of the user has at least 10k of friends_ids. In order to create a graph I need a list of nodes which I have in names and list of edges edges.The format of
edges would be [(node1, node2), (node2, node3), ...], which means that node1 connects to node2 and node2 connect to node3. I'm not sure why but this script takes almost full 10 min to run. So, I'm looking for a way to optimize this.Solution
Try this minor variation over your script, that will probably have a dramatic effect on performance:
The main modification is making
If you need
after the processing is completed.
names = set()
edges = []
for idx, user in enumerate(all_users):
if idx and not idx % 1000:
print('Processing {} batch',format(idx))
user_id = str(user['id_str'])
names.add(user_id)
for friend_id in user['friends_ids']:
friend_id = str(friend_id)
edges.append((user_id, friend_id))
names.add(friend_id)The main modification is making
names a set, not a list, Your script is checking membership of every friend_id in names, an operation that takes time linear in the size of names. In contrast, membership in a set is done in constant time.If you need
names to be a list, simply do:names = list(names)after the processing is completed.
Code Snippets
names = set()
edges = []
for idx, user in enumerate(all_users):
if idx and not idx % 1000:
print('Processing {} batch',format(idx))
user_id = str(user['id_str'])
names.add(user_id)
for friend_id in user['friends_ids']:
friend_id = str(friend_id)
edges.append((user_id, friend_id))
names.add(friend_id)names = list(names)Context
StackExchange Code Review Q#104275, answer score: 5
Revisions (0)
No revisions yet.