patternpythonMinor
Extracting nodes from a road network
Viewed 0 times
nodesroadextractingfromnetwork
Problem
My code takes 143.023 seconds for extracting nodes from a road network of city like Göteborg in Sweden. Please check it out if I can optimize it.
```
# -- coding: utf-8 --
import arcpy
from Class import loading_layer
import time
def node_extraction(network_input):
time_start=time.time()
nodes_list = []
list_of_existing_nodes = []
for feature in network_input:
coordinates_list = list()
feature_startX = feature["Shape"].firstPoint.X
feature_startY = feature["Shape"].firstPoint.Y
start_list =[]
start_list.append(feature_startX)
start_list.append(feature_startY)
feature_endX = feature["Shape"].lastPoint.X
feature_endY = feature["Shape"].lastPoint.Y
end_list =[]
end_list.append(feature_endX)
end_list.append(feature_endY)
if nodes_list.__len__() is not 0:
if start_list in list_of_existing_nodes:
point_index=list_of_existing_nodes.index(start_list)
nodes_list[point_index][1].append(feature["FID"])
else:
nodes_list.append((feature["Shape"].firstPoint, [feature["FID"]]))
list_of_existing_nodes.append(start_list)
if end_list in list_of_existing_nodes:
point_index=list_of_existing_nodes.index(end_list)
nodes_list[point_index][1].append(feature["FID"])
else:
nodes_list.append((feature["Shape"].lastPoint, [feature["FID"]]))
list_of_existing_nodes.append(end_list)
else:
nodes_list.append((feature["Shape"].firstPoint, [feature["FID"]]))
nodes_list.append((feature["Shape"].lastPoint, [feature["FID"]]))
list_of_existing_nodes.append(start_list)
list_of_existing_nodes.append(end_list)
time_end=time.time()
print time_end-time_start
layer1 = loading_layer("D:/path/to/your/road/data.shp")
layer1.create_new_dict()
node_e
```
# -- coding: utf-8 --
import arcpy
from Class import loading_layer
import time
def node_extraction(network_input):
time_start=time.time()
nodes_list = []
list_of_existing_nodes = []
for feature in network_input:
coordinates_list = list()
feature_startX = feature["Shape"].firstPoint.X
feature_startY = feature["Shape"].firstPoint.Y
start_list =[]
start_list.append(feature_startX)
start_list.append(feature_startY)
feature_endX = feature["Shape"].lastPoint.X
feature_endY = feature["Shape"].lastPoint.Y
end_list =[]
end_list.append(feature_endX)
end_list.append(feature_endY)
if nodes_list.__len__() is not 0:
if start_list in list_of_existing_nodes:
point_index=list_of_existing_nodes.index(start_list)
nodes_list[point_index][1].append(feature["FID"])
else:
nodes_list.append((feature["Shape"].firstPoint, [feature["FID"]]))
list_of_existing_nodes.append(start_list)
if end_list in list_of_existing_nodes:
point_index=list_of_existing_nodes.index(end_list)
nodes_list[point_index][1].append(feature["FID"])
else:
nodes_list.append((feature["Shape"].lastPoint, [feature["FID"]]))
list_of_existing_nodes.append(end_list)
else:
nodes_list.append((feature["Shape"].firstPoint, [feature["FID"]]))
nodes_list.append((feature["Shape"].lastPoint, [feature["FID"]]))
list_of_existing_nodes.append(start_list)
list_of_existing_nodes.append(end_list)
time_end=time.time()
print time_end-time_start
layer1 = loading_layer("D:/path/to/your/road/data.shp")
layer1.create_new_dict()
node_e
Solution
Performance:
This is not an efficient way to update an existing element. The fact that you want to lookup a node based on it's coordinates means that you are using the wrong data structure. A dictionary is explicitly designed to make key-based lookups fast, making this type of manipulation easy. This way you can have the point object as the key and the value would be the data object that might have a
What you have now will cause each look to iterate over the entire list to see if the node exists. Then when it does, iterate again to find the index. As the number of nodes increase, the lookup will take longer and longer. Using a dictionary will take a constant amount of time independent of how many nodes are in the collection.
This is not providing any benefit. If
Other Points:
Don't suffix your variables with
A list is not a great data structure for when each index has specific meaning. If you just need a simple data store object,
The data you are pulling values from has
Creation of the start and stop point can be cleaned up a lot, even if you stick with lists.
should be
The built-in function is much cleaner and doesn't depend in the internal implementation.
Edit:
Since we are dealing with a list, this check is also equivalent to
This is the recommended style by the style guide.
You are performing the same operation for the start and end points. you can create a single function that does the operation instead of repeating the code. When you change to using a dictionary, write the generic function first, then you can call it once for each point.
The naming convention for instance variables is lower_case.
if start_list in list_of_existing_nodes:
point_index=list_of_existing_nodes.index(start_list)
nodes_list[point_index][1].append(feature["FID"])This is not an efficient way to update an existing element. The fact that you want to lookup a node based on it's coordinates means that you are using the wrong data structure. A dictionary is explicitly designed to make key-based lookups fast, making this type of manipulation easy. This way you can have the point object as the key and the value would be the data object that might have a
FID value.What you have now will cause each look to iterate over the entire list to see if the node exists. Then when it does, iterate again to find the index. As the number of nodes increase, the lookup will take longer and longer. Using a dictionary will take a constant amount of time independent of how many nodes are in the collection.
def iterate_features(self):
feature_set = arcpy.SearchCursor(self.Address)
for feature in feature_set:
yield feature
def create_new_dict(self):
iterator = self.iterate_features()
for item in iterator:
# ...This is not providing any benefit. If
SearchCursor creates a list, then the list has already been allocated containing all of the values. This means your generator method is just adding additional iteration overhead when switching back and forth between contexts. If SearchCursor is already a generator (which I suspect is the case based on the name cursor), you are just wrapping it without adding any additional values.Other Points:
coordinates_list is never used, so remove it. Additionally, this is the only case where you use list() instead of [], you should be consistent and use [].Don't suffix your variables with
_list, making the name plural should be sufficient to convey that the value is a collection.A list is not a great data structure for when each index has specific meaning. If you just need a simple data store object,
namedtuple is great for that.The data you are pulling values from has
X and Y attributes. Is there a reason you don't just use that object directly?Creation of the start and stop point can be cleaned up a lot, even if you stick with lists.
shape = feature["Shape"]
start = [shape.firstPoint.X, shape.firstPoint.Y]
end = [shape.lastPoint.X, shape.lastPoint.Y]if nodes_list.__len__() is not 0:should be
if len(nodes_list) == 0:The built-in function is much cleaner and doesn't depend in the internal implementation.
is is an identity comparison and == is an equivalence comparison. The fact that an identity comparison works is an implementation detail and is not guaranteed to always be true.Edit:
Since we are dealing with a list, this check is also equivalent to
if not nodes_list:This is the recommended style by the style guide.
You are performing the same operation for the start and end points. you can create a single function that does the operation instead of repeating the code. When you change to using a dictionary, write the generic function first, then you can call it once for each point.
The naming convention for instance variables is lower_case.
Address will make other people think it is a class.Code Snippets
if start_list in list_of_existing_nodes:
point_index=list_of_existing_nodes.index(start_list)
nodes_list[point_index][1].append(feature["FID"])def iterate_features(self):
feature_set = arcpy.SearchCursor(self.Address)
for feature in feature_set:
yield feature
def create_new_dict(self):
iterator = self.iterate_features()
for item in iterator:
# ...shape = feature["Shape"]
start = [shape.firstPoint.X, shape.firstPoint.Y]
end = [shape.lastPoint.X, shape.lastPoint.Y]if nodes_list.__len__() is not 0:if len(nodes_list) == 0:Context
StackExchange Code Review Q#55701, answer score: 4
Revisions (0)
No revisions yet.