patternpythonMinor
When it rains, it pours - August 2016 Community Challenge
Viewed 0 times
augustchallengecommunityrainspourswhen2016
Problem
- Introduction
This code is my attempt at solving the August 2016 Community Challenge. Coming from a city where it rains cats and dogs on a daily basis this challenge was right up my alley =)
- Algorithm
I used the solution 200_success ♦ outlined in his answer here:
- Each
Cellkeeps track of whichBasinit belongs to; eachCellis initially assume to be in its ownBasin. EachBasinhas
a
sink, or lowest Cell, which acts as a "representative element"of the
Basin, as well as a member count. Topography keeps trackof all
Basins.- For each
Basin, find lowest of the sink's neighbours. If the lowest is not already a member of thisBasin, transfer its cells
into the lowest neighbour's
Basin, and notify Topography that thehigher basin no longer exists.
- Repeat step 2 until no further action is necessary.
- Have
Topographyenumerate theBasins and their counts.
- Input and output
Example 1
Input: rainfall-example-1.txt
Output:
Height Farmland:
[[1 5 2]
[2 4 7]
[3 6 9]]
Basins:
(A) A (B)
A A B
A A A
Letter Size Sink
A 7 (0, 0)
B 2 (0, 2)Example 2
Input: rainfall-example-2.txt
Output:
Height Farmland:
[[1 0 2 5 8]
[2 3 4 7 9]
[3 5 7 8 9]
[1 2 5 4 3]
[3 3 5 2 1]]
Basins:
A (A) A A A
A A A A A
B B A C C
(B) B B C C
B B C C (C)
Letter Size Sink
A 11 (0, 1)
B 7 (3, 0)
C 7 (4, 4)Example 3
Input: rainfall-example-3.txt
Output:
Height Farmland:
[[0 2 1 3]
[2 1 0 4]
[3 3 3 3]
[5 5 2 1]]
Basins:
(B) B A A
B A (A) A
B A A C
B C C (C)
Letter Size Sink
A 7 (1, 2)
B 5 (0, 0)
C 4 (3, 3)Example 4
Input: rainfall-example-4.txt
Output:
```
Height Farmland:
[[ 4 23 25 21 29 16 23 29 12 28]
[ 0 12 26 0 19 23 9 13 11 29]
[26 24 18 21 22 4 29 1 5 28]
[13 15 18
Solution
The classes feel empty because you are basically just using fancy dictionaries and a bunch of functions. Don't get me wrong - that's basically all a class is, but if you're going to commit to using classes then you really do want to wrap up functionality inside of them. Pretty much all of your functions should become instance methods of one of your classes.
I'm going to overview your code as written, from the entry point to the end.
The biggest problem that jumps out at me is that your
There are three things I'd do to solve it.
Now that we're actually generating valid test files, let's look at the rest of your code.
It smells funny to me that your
This classmethod will end up basically being
Next I'm going to pull
I'll start with
I made it a static method because it doesn't require any intrinsic attribute of the class or instance. I still give it the
Next I'll dive into
The goal is to be able to iterate over all of the cells in an arbitrary order without having to do it way too many times. To do this I think we want to combine the "create basins and cells" stage with the "sort into the correct basins" stage. This should probably look something like this (pseudocode, I haven't completely figured out the actual code yet).
This is still pretty rough, but I think we can speed things up by only making one pass through the data. We can also load the data into memory in stages, and if we store the results of these calculations on the other cells then I think we'll avoid having to do too much extra work per-cell (that's another thing you could do with your computation - a lot of your calculations are repeated and could be stored on the cells). I hope to come back t
I'm going to overview your code as written, from the entry point to the end.
The biggest problem that jumps out at me is that your
create_test_file is- Poorly named - a better name is
create_test_matrix
- Broken. It doesn't guarantee that every cell will either be a sink or have a unique least neighbor.
There are three things I'd do to solve it.
- Write a method called
create_test_matrixthat makes the matrix
- Change
create_test_fileto actually write a valid test file
- Make the matrix creation an iterative process - generate new values left-to-right, top-to-bottom (or whatever orientation you prefer). Make sure that they don't violate the constraint of the already created values, and move on.
Now that we're actually generating valid test files, let's look at the rest of your code.
It smells funny to me that your
Topography class takes a filename. I'd expect a Topography to take data, and provide a classmethod that loads it from a file. This also means that with testing you don't need to go through the intermediate file. As an aside, your implementation is buggy (and your test files are invalid). They should have the size of the matrix as the first line, and your test files don't have that (and your code doesn't handle it).This classmethod will end up basically being
create_height_map so I'll pull the contents of that function out and put it right in there. While doing that, I'll also use a context manager to safely open and close the file. I'll also rename the file object variable so it doesn't mask the builtin file. I'll also use a list comprehension instead of map, as it's more Pythonic and easier to read. I forget if np.matrix can take a generator object, but if so then I'd use a generator comprehension instead of a list comprehension.@classmethod
def from_file(cls, filename):
with open(filename, 'r') as rainfall_data:
data_iter = iter(rainfall_data)
# skip the shape
next(data_iter)
rainfall_matrix = np.matrix(
[[int(data) for data in line.split()]
for line in data_iter]
)
return Topography(rainfall_matrix)Next I'm going to pull
create_basins_and_cells into Topography, and create_matrix_map as well.I'll start with
create_matrix_map. This can just be a dictionary comprehension. I'm also going to exchange the lists for tuples, as you should generally prefer immutability.@staticmethod
def create_matrix_map(cls, shape):
length, width = shape
return {(i, j): cls((i, j), shape) for i in xrange(length) for j in xrange(width)}I made it a static method because it doesn't require any intrinsic attribute of the class or instance. I still give it the
cls first argument instead of class_type because that's more understandable for me - YMMV.Next I'll dive into
create_basins_and_cells. I'm not a huge fan of your algorithm - it seems that you should be able to iterate over the cells in a more intelligent manner. When I solved this myself I sorted the cells by descending height, and that lent itself to a very clean algorithm. I've also run into performance issues with my solution, however, so I'm going to propose something new here (in pseudocode because I haven't quite worked out the code yet).The goal is to be able to iterate over all of the cells in an arbitrary order without having to do it way too many times. To do this I think we want to combine the "create basins and cells" stage with the "sort into the correct basins" stage. This should probably look something like this (pseudocode, I haven't completely figured out the actual code yet).
for row in topography
for cell in row
for each neighbor of cell
if the neighbor feeds into this cell and has already been loaded
add it to this cell's basin
if the cell is a sink
add it to an array of sinks
else
add this cell to the lowest neighbor (if it already has been loaded)
sizes = []
for sink in sinks
count the number of connected components to that sink, append it to sizes
print sizes sorted in descending order, space separated
This is still pretty rough, but I think we can speed things up by only making one pass through the data. We can also load the data into memory in stages, and if we store the results of these calculations on the other cells then I think we'll avoid having to do too much extra work per-cell (that's another thing you could do with your computation - a lot of your calculations are repeated and could be stored on the cells). I hope to come back t
Code Snippets
@classmethod
def from_file(cls, filename):
with open(filename, 'r') as rainfall_data:
data_iter = iter(rainfall_data)
# skip the shape
next(data_iter)
rainfall_matrix = np.matrix(
[[int(data) for data in line.split()]
for line in data_iter]
)
return Topography(rainfall_matrix)@staticmethod
def create_matrix_map(cls, shape):
length, width = shape
return {(i, j): cls((i, j), shape) for i in xrange(length) for j in xrange(width)}Context
StackExchange Code Review Q#137837, answer score: 5
Revisions (0)
No revisions yet.