patternpythonModerate
Rainfall - August 2016 Community Challenge (Hooray for graphs!)
Viewed 0 times
rainfallaugustchallengecommunitygraphsforhooray2016
Problem
A solution for the August 2016 Community Challenge (Rainfall).
I've intentionally left out error checking on the file formatting.
This seems like a very obvious graph problem to me - we want to partition some graph based off of relative elevations. I've used networkx 1.9.1 to operate on the graphs, and matplotlib to make it pretty(ish).
Haven't tested this on a large sample, but I suspect that it will go pretty slowly.
The general idea is to build up a graph of all of the plots and their neighbors, while also building up a list of plots sorted by elevation (which is fairly efficient due to the binary search insertion) and then starting from the highest elevation and moving down, removing edges that get disqualified. I then do a little extra at the end to set the basin property on each node to the correct value, but that isn't strictly necessary (I'd get the correct answer either way, and the coloring would only require a minor change to work).
I wrote this and ran it on Python 2, but at a first glance it should work on Python 3 without modification (famous last words...).
```
from collections import namedtuple
import bisect
import functools
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
@functools.total_ordering
class PlotData(namedtuple('PlotData', 'plot elevation')):
def __lt__(self, other):
return self.elevation < other.elevation
def __eq__(self, other):
return self.elevation == other.elevation
def build_topography(filename):
with open(filename) as elevation_data:
lines = iter(elevation_data)
next(lines)
topography = nx.Graph()
sorted_plots = []
for row_index, row in enumerate(lines):
for plot_index, elevation in enumerate(map(int, row.split())):
plot = (row_index,
I've intentionally left out error checking on the file formatting.
This seems like a very obvious graph problem to me - we want to partition some graph based off of relative elevations. I've used networkx 1.9.1 to operate on the graphs, and matplotlib to make it pretty(ish).
Haven't tested this on a large sample, but I suspect that it will go pretty slowly.
t.txt refers to a file like this - the name was for brevity's sake, and would be a different one were I making a production application.3
1 5 2
2 4 7
3 6 9The general idea is to build up a graph of all of the plots and their neighbors, while also building up a list of plots sorted by elevation (which is fairly efficient due to the binary search insertion) and then starting from the highest elevation and moving down, removing edges that get disqualified. I then do a little extra at the end to set the basin property on each node to the correct value, but that isn't strictly necessary (I'd get the correct answer either way, and the coloring would only require a minor change to work).
I wrote this and ran it on Python 2, but at a first glance it should work on Python 3 without modification (famous last words...).
```
from collections import namedtuple
import bisect
import functools
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
@functools.total_ordering
class PlotData(namedtuple('PlotData', 'plot elevation')):
def __lt__(self, other):
return self.elevation < other.elevation
def __eq__(self, other):
return self.elevation == other.elevation
def build_topography(filename):
with open(filename) as elevation_data:
lines = iter(elevation_data)
next(lines)
topography = nx.Graph()
sorted_plots = []
for row_index, row in enumerate(lines):
for plot_index, elevation in enumerate(map(int, row.split())):
plot = (row_index,
Solution
NetworkX is nice, but you really aren't using a lot of its features, and the features you are using end up not being that pivotal; your implementation won't be changed (much) if you remove NetworkX and use some homegrown features.
Also - your current method of interspersing drawing with doing calculations is kind of annoying, and it makes it hard to tell what the slow parts of your calculation are - that should be separated out into its own method. You can do that better as well - while the general problem of drawing a graph is hard, drawing a 2D matrix is not - NetworkX doesn't know we have a 2D matrix, but we do.
I've rewritten the display functionality. You'll notice that it is now an instance method - while rewriting it to not use NetworkX I encapsulated everything in a class like a good little object oriented programmer, and I've also referenced a few data members that I wasn't using previously. I'll talk about that a bit more in a second.
We've kept a lot of the logic the same, but in particular we've fixed the positioning of points so that they're in the correct location, we've colored everything (and the edges!) more appropriately, and we've fixed the label positioning (and also removed the pointless basin). If we wanted to we could add something special at the sinks, but this should be sufficient for now. There is still a bit of ickiness here - we could easily break this up into about three or four helper methods, but for demonstrative purposes this should be fine. Running this gives us the follower, much, much prettier graph.
Now lets look at how we can actually reimplement all of this without NetworkX. It starts pretty straightforward by creating a
You'll notice that the vast majority of this is the exact same as it was before; we've just removed the NetworkX kruft. I also took out the bisect sorted list - I'm not convinced that it was actually more performant, and it impaired the flow of the code for me. If after running benchmarks it turns out to be more performant then by all means put it back in, but until then use the cleaner, more idiomatic
The most substantial change is probably tha
Also - your current method of interspersing drawing with doing calculations is kind of annoying, and it makes it hard to tell what the slow parts of your calculation are - that should be separated out into its own method. You can do that better as well - while the general problem of drawing a graph is hard, drawing a 2D matrix is not - NetworkX doesn't know we have a 2D matrix, but we do.
I've rewritten the display functionality. You'll notice that it is now an instance method - while rewriting it to not use NetworkX I encapsulated everything in a class like a good little object oriented programmer, and I've also referenced a few data members that I wasn't using previously. I'll talk about that a bit more in a second.
def display(self):
colors = plt.cm.rainbow(np.linspace(0, 1, len(self.sinks)))
color_dict = {}
labels = {}
for index, sink in enumerate(self.sinks):
color_dict[sink] = colors[index]
labels[sink] = self.cells[sink]['elevation']
for parent in self.cells[sink]['parents']:
color_dict[parent] = colors[index]
labels[parent] = self.cells[parent]['elevation']
segments = []
for key, values in self.neighbors.iteritems():
for value in values:
segments.append((key, value))
fig=plt.figure()
ax=fig.add_subplot(111)
plt.xlim(-.5, self.size - .5)
plt.ylim(-.5, self.size - .5)
for s in segments:
first_point = s[0][1], self.size - s[0][0] - 1
second_point = s[1][1], self.size - s[1][0] - 1
plt.plot(*zip(first_point, second_point), marker='o', color=color_dict[s[0]])
plt.annotate(
labels[s[0]],
xy=first_point, xytext=(-5, 5),
textcoords='offset points', ha='right', va='bottom')
plt.show()We've kept a lot of the logic the same, but in particular we've fixed the positioning of points so that they're in the correct location, we've colored everything (and the edges!) more appropriately, and we've fixed the label positioning (and also removed the pointless basin). If we wanted to we could add something special at the sinks, but this should be sufficient for now. There is still a bit of ickiness here - we could easily break this up into about three or four helper methods, but for demonstrative purposes this should be fine. Running this gives us the follower, much, much prettier graph.
Now lets look at how we can actually reimplement all of this without NetworkX. It starts pretty straightforward by creating a
Topology class (as an aside, make sure you use the domain appropriate terminology if possible. I've swapped nodes/plots for cells, edges for neighbors, etc. Also you were using nodes/plots interchangeably, which was confusing). I've made the constructor take data instead of a file, but provided a convenient classmethod to construct it from a file if desired.import matplotlib.pyplot as plt
import numpy as np
from collections import defaultdict
class Topography(object):
@classmethod
def from_file(cls, filename):
with open(filename) as f:
data = iter(f)
size = int(next(data))
data = (map(int, row.strip().split()) for row in data)
return Topography(data, size)
def __init__(self, elevation_data, size):
self.size = size
self.cells = {}
self.neighbors = defaultdict(list)
self.sinks = []
for row_index, row in enumerate(elevation_data):
for cell_index, elevation in enumerate(row):
cell = (row_index, cell_index)
self.cells[cell] = {'elevation': elevation, 'parents': []}
self._add_neighbor_above(cell)
self._add_neighbor_left(cell)
self._clean_topography()
def _add_neighbor_above(self, cell):
above_cell = (cell[0] - 1, cell[1])
self._add_neighbor(cell, above_cell)
def _add_neighbor_left(self, cell):
left_cell = (cell[0], cell[1] - 1)
self._add_neighbor(cell, above_cell)
def _add_neighbor(self, cell, neighbor):
if neighbor in self.cells
self.neighbors[neighbor].append(cell)
self.neighbors[cell].append(neighbor)You'll notice that the vast majority of this is the exact same as it was before; we've just removed the NetworkX kruft. I also took out the bisect sorted list - I'm not convinced that it was actually more performant, and it impaired the flow of the code for me. If after running benchmarks it turns out to be more performant then by all means put it back in, but until then use the cleaner, more idiomatic
sorted (seen in self._clean_topography).The most substantial change is probably tha
Code Snippets
def display(self):
colors = plt.cm.rainbow(np.linspace(0, 1, len(self.sinks)))
color_dict = {}
labels = {}
for index, sink in enumerate(self.sinks):
color_dict[sink] = colors[index]
labels[sink] = self.cells[sink]['elevation']
for parent in self.cells[sink]['parents']:
color_dict[parent] = colors[index]
labels[parent] = self.cells[parent]['elevation']
segments = []
for key, values in self.neighbors.iteritems():
for value in values:
segments.append((key, value))
fig=plt.figure()
ax=fig.add_subplot(111)
plt.xlim(-.5, self.size - .5)
plt.ylim(-.5, self.size - .5)
for s in segments:
first_point = s[0][1], self.size - s[0][0] - 1
second_point = s[1][1], self.size - s[1][0] - 1
plt.plot(*zip(first_point, second_point), marker='o', color=color_dict[s[0]])
plt.annotate(
labels[s[0]],
xy=first_point, xytext=(-5, 5),
textcoords='offset points', ha='right', va='bottom')
plt.show()import matplotlib.pyplot as plt
import numpy as np
from collections import defaultdict
class Topography(object):
@classmethod
def from_file(cls, filename):
with open(filename) as f:
data = iter(f)
size = int(next(data))
data = (map(int, row.strip().split()) for row in data)
return Topography(data, size)
def __init__(self, elevation_data, size):
self.size = size
self.cells = {}
self.neighbors = defaultdict(list)
self.sinks = []
for row_index, row in enumerate(elevation_data):
for cell_index, elevation in enumerate(row):
cell = (row_index, cell_index)
self.cells[cell] = {'elevation': elevation, 'parents': []}
self._add_neighbor_above(cell)
self._add_neighbor_left(cell)
self._clean_topography()
def _add_neighbor_above(self, cell):
above_cell = (cell[0] - 1, cell[1])
self._add_neighbor(cell, above_cell)
def _add_neighbor_left(self, cell):
left_cell = (cell[0], cell[1] - 1)
self._add_neighbor(cell, above_cell)
def _add_neighbor(self, cell, neighbor):
if neighbor in self.cells
self.neighbors[neighbor].append(cell)
self.neighbors[cell].append(neighbor)def _clean_topography(self):
cells = sorted(self.cells, key=lambda x: self.cells[x]['elevation'], reverse=True)
for cell in cells:
min_elevation = self.cells[cell]['elevation']
basin = None
for neighbor in self.neighbors[cell]:
if self.cells[neighbor]['elevation'] < min_elevation:
min_elevation = self.cells[neighbor]['elevation']
basin = neighbor
if basin:
self.cells[basin]['parents'].append(cell)
self.cells[basin]['parents'].extend(self.cells[cell]['parents'])
for neighbor in self.neighbors[cell]:
if neighbor not in self.cells[cell]['parents'] and neighbor != basin:
self.remove_neighbor(cell, neighbor)
else:
self.sinks.append(cell)
def remove_neighbor(self, cell, neighbor):
self.neighbors[cell].remove(neighbor)
self.neighbors[neighbor].remove(cell)def get_basin_sizes(self):
return [
len(self.cells[sink]['parents']) + 1
for sink in self.sinks
]
if __name__ == '__main__':
with open("rainfall_data_simple.txt", 'w') as f:
f.write("""3
1 5 2
2 4 7
3 6 9""")
topo = Topography.from_file("rainfall_data_simple.txt")
basin_sizes = topo.get_basin_sizes()
basin_sizes.sort(reverse=True)
for basin in basin_sizes:
print basin,
topo.display()Context
StackExchange Code Review Q#137979, answer score: 10
Revisions (0)
No revisions yet.