HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

When it rains, it pours - August 2016 Community Challenge

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
augustchallengecommunityrainspourswhen2016

Problem


  1. Introduction



This code is my attempt at solving the August 2016 Community Challenge. Coming from a city where it rains cats and dogs on a daily basis this challenge was right up my alley =)

  1. Algorithm



I used the solution 200_success ♦ outlined in his answer here:



  • Each Cell keeps track of which Basin it belongs to; each Cell is initially assume to be in its own Basin. Each Basin has


a sink, or lowest Cell, which acts as a "representative element"
of the Basin, as well as a member count. Topography keeps track
of all Basins.

  • For each Basin, find lowest of the sink's neighbours. If the lowest is not already a member of this Basin, transfer its cells


into the lowest neighbour's Basin, and notify Topography that the
higher basin no longer exists.

  • Repeat step 2 until no further action is necessary.



  • Have Topography enumerate the Basins and their counts.




  1. Input and output



Example 1

Input: rainfall-example-1.txt

Output:

Height Farmland:
[[1 5 2]
 [2 4 7]
 [3 6 9]]

Basins:
 (A)  A  (B) 
  A   A   B  
  A   A   A  

Letter  Size  Sink
A       7     (0, 0) 
B       2     (0, 2)


Example 2

Input: rainfall-example-2.txt

Output:

Height Farmland:
[[1 0 2 5 8]
 [2 3 4 7 9]
 [3 5 7 8 9]
 [1 2 5 4 3]
 [3 3 5 2 1]]

Basins:
  A   (A)  A  A   A  
  A    A   A  A   A  
  B    B   A  C   C  
 (B)   B   B  C   C  
  B    B   C  C  (C) 

Letter  Size  Sink
A       11    (0, 1) 
B        7    (3, 0) 
C        7    (4, 4)


Example 3

Input: rainfall-example-3.txt

Output:

Height Farmland:
[[0 2 1 3]
 [2 1 0 4]
 [3 3 3 3]
 [5 5 2 1]]

Basins:
 (B)  B   A    A  
  B   A  (A)   A  
  B   A   A    C  
  B   C   C   (C) 

Letter  Size  Sink
A       7     (1, 2) 
B       5     (0, 0) 
C       4     (3, 3)


Example 4

Input: rainfall-example-4.txt

Output:

```
Height Farmland:
[[ 4 23 25 21 29 16 23 29 12 28]
[ 0 12 26 0 19 23 9 13 11 29]
[26 24 18 21 22 4 29 1 5 28]
[13 15 18

Solution

The classes feel empty because you are basically just using fancy dictionaries and a bunch of functions. Don't get me wrong - that's basically all a class is, but if you're going to commit to using classes then you really do want to wrap up functionality inside of them. Pretty much all of your functions should become instance methods of one of your classes.

I'm going to overview your code as written, from the entry point to the end.

The biggest problem that jumps out at me is that your create_test_file is

  • Poorly named - a better name is create_test_matrix



  • Broken. It doesn't guarantee that every cell will either be a sink or have a unique least neighbor.



There are three things I'd do to solve it.

  • Write a method called create_test_matrix that makes the matrix



  • Change create_test_file to actually write a valid test file



  • Make the matrix creation an iterative process - generate new values left-to-right, top-to-bottom (or whatever orientation you prefer). Make sure that they don't violate the constraint of the already created values, and move on.



Now that we're actually generating valid test files, let's look at the rest of your code.

It smells funny to me that your Topography class takes a filename. I'd expect a Topography to take data, and provide a classmethod that loads it from a file. This also means that with testing you don't need to go through the intermediate file. As an aside, your implementation is buggy (and your test files are invalid). They should have the size of the matrix as the first line, and your test files don't have that (and your code doesn't handle it).

This classmethod will end up basically being create_height_map so I'll pull the contents of that function out and put it right in there. While doing that, I'll also use a context manager to safely open and close the file. I'll also rename the file object variable so it doesn't mask the builtin file. I'll also use a list comprehension instead of map, as it's more Pythonic and easier to read. I forget if np.matrix can take a generator object, but if so then I'd use a generator comprehension instead of a list comprehension.

@classmethod
def from_file(cls, filename):
    with open(filename, 'r') as rainfall_data:
        data_iter = iter(rainfall_data)
        # skip the shape
        next(data_iter)
        rainfall_matrix = np.matrix(
            [[int(data) for data in line.split()]
             for line in data_iter]
        )
    return Topography(rainfall_matrix)


Next I'm going to pull create_basins_and_cells into Topography, and create_matrix_map as well.

I'll start with create_matrix_map. This can just be a dictionary comprehension. I'm also going to exchange the lists for tuples, as you should generally prefer immutability.

@staticmethod
def create_matrix_map(cls, shape):
    length, width = shape
    return {(i, j): cls((i, j), shape) for i in xrange(length) for j in xrange(width)}


I made it a static method because it doesn't require any intrinsic attribute of the class or instance. I still give it the cls first argument instead of class_type because that's more understandable for me - YMMV.

Next I'll dive into create_basins_and_cells. I'm not a huge fan of your algorithm - it seems that you should be able to iterate over the cells in a more intelligent manner. When I solved this myself I sorted the cells by descending height, and that lent itself to a very clean algorithm. I've also run into performance issues with my solution, however, so I'm going to propose something new here (in pseudocode because I haven't quite worked out the code yet).

The goal is to be able to iterate over all of the cells in an arbitrary order without having to do it way too many times. To do this I think we want to combine the "create basins and cells" stage with the "sort into the correct basins" stage. This should probably look something like this (pseudocode, I haven't completely figured out the actual code yet).

for row in topography
for cell in row
for each neighbor of cell
if the neighbor feeds into this cell and has already been loaded
add it to this cell's basin
if the cell is a sink
add it to an array of sinks
else
add this cell to the lowest neighbor (if it already has been loaded)

sizes = []
for sink in sinks
count the number of connected components to that sink, append it to sizes

print sizes sorted in descending order, space separated


This is still pretty rough, but I think we can speed things up by only making one pass through the data. We can also load the data into memory in stages, and if we store the results of these calculations on the other cells then I think we'll avoid having to do too much extra work per-cell (that's another thing you could do with your computation - a lot of your calculations are repeated and could be stored on the cells). I hope to come back t

Code Snippets

@classmethod
def from_file(cls, filename):
    with open(filename, 'r') as rainfall_data:
        data_iter = iter(rainfall_data)
        # skip the shape
        next(data_iter)
        rainfall_matrix = np.matrix(
            [[int(data) for data in line.split()]
             for line in data_iter]
        )
    return Topography(rainfall_matrix)
@staticmethod
def create_matrix_map(cls, shape):
    length, width = shape
    return {(i, j): cls((i, j), shape) for i in xrange(length) for j in xrange(width)}

Context

StackExchange Code Review Q#137837, answer score: 5

Revisions (0)

No revisions yet.