HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Querying houses similar to a given house

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
housesqueryinghousesimilargiven

Problem

I was given this task as an interview coding challenge and was wondering If the code is well structured and follows Python guidelines. I chose to sort the houses based on a similarity metric and then return the first N. I'd love to know if there are better ways to handle this.

I decided to use pandas to store and query data. I wrote a small Model class to encapsulate all the integration with pandas so that it can later be switched out with calls to a relational database if needed.

__author__ = 'Franklyn'

from pandas import Series, DataFrame
import pandas as pd

class Model(Series):
    """Base Class of Models, Used to represent the structure of data and define behaviours.

    This base class handles tasks that are common to all models such as serialization and saving objects in a collection

    Attributes:
        Objects: Store of model objects created from their respective class.
    """
    def __init__(self, data, fields):
        Series.__init__(self, data, index=fields)

    @classmethod
    def read_serialized_object(cls, path):
        try:
            cls.Objects = pd.read_pickle(path)
        except Exception as e:
            print "Could not read serialized objects: {0}".format(e.message)
        return cls

    @classmethod
    def write_serialized_object(cls, path):
        try:
            cls.Objects.to_pickle(path)
        except Exception as e:
            print "Could not write serialized objects: {0}".format(e.message)

    @classmethod
    def sort_by(cls, sort_callback, limit=10):
        """Sorts the collection using the callback function and return limit number of elements"""
        return cls.Objects.ix[cls.Objects.apply(lambda x: sort_callback(x), axis=1).argsort()[:limit]]

    def save(self):
        if self.__class__.Objects is None:
            self.__class__.Objects = DataFrame(columns=self.index)
        self.__class__.Objects = self.__class__.Objects.append(self, ignore_index=True)


I feel like the use of `__class__

Solution

was wondering If the code is well structured and follows python guidelines.

I can help with this. However I cannot with the other concerns. I have not done enough static class programming to make a sound review.

PEP8

  • Lines should be limited to 79 characters. The exception to this is comments and docstrings at 72.



-
It would seem that some people really dislike the use of name mangling, __.

Method Names and Instance Variables


Note: there is some controversy about the use of __names (see below).

Designing for inheritance


If your class is intended to be subclassed, and you have attributes that you do not want subclasses to use, consider naming them with double leading underscores and no trailing underscores.


...


Note 3: Not everyone likes name mangling. Try to balance the need to avoid accidental name clashes with potential use by advanced callers.

-
Operators should have a space either side of them: 2 + 2. The exception to this is to show precedence: 2*2 + 2.

House.DWELLING_COEFFICIENT*int(self.dwelling_type == house_to.dwelling_type)


-
Non-class names should be snake_case, not CamelCase. This avoids confusion about whether it is a class or not (is Objects a class, for example).

Listing = namedtuple(...)


-
Python has very strict indentation rules, even for vertical alignment:


Aligned with opening delimiter.

Listing should be defined like this:

Listing = namedtuple('Listing',
                     [ #...
                      'list_date', 'list_price', 'close_date', 'close_price'])


-
It is normal to not have blank lines unless PEP8 tells you to. You have too many in some cases, constant declarations in House, and too few around your classes.

-

Surround top-level function and class definitions with two blank lines.

-

Use blank lines in functions, sparingly, to indicate logical sections.

PEP257

-
Your one-line docstring is good, but it needs to be all on one line.


The closing quotes are on the same line as the opening quotes. This looks better for one-liners.

-
Most of your multi-line docstrings don't have a summary line, followed by a blank line.

-
:param house_to: is not the way PEP257 says to document arguments.


Positional arguments:

house_to -- Series object of second house to compare to

Overall, your code is nice, apart from the character limit. Your docstrings need work, and you need more of them. But it is very good.

Non-style

Using super is better than calling a class' __init__:

super(Model, self).__init__(data, index=fields)



I feel like the use of __class__ here to assign and initialize a static member is a bit grubby and would love to get feedback on how to handle initialization of static members that will be inherited.

I have little experience here, and so I am just speculating. But it seems that Objects is a variable, not a class. And you want it to be predefined to DataFrame(columns=self.index), but only to a specific class.

For a simple case where you don't want to reference self, this SO answer should help.

However, as this is more complicated, we can look at the docs for what __class__ is. After a quick read, it would seem that you would want to use type(self) rather than self.__class__.


type(x) is typically the same as x.__class__ (although this is not guaranteed – a new-style class instance is permitted to override the value returned for x.__class__).

def save(self):
    if type(self).Objects is None:
        type(self).Objects = DataFrame(columns=self.index)
    type(self).Objects = type(self).Objects.append(self, ignore_index=True)

Code Snippets

House.DWELLING_COEFFICIENT*int(self.dwelling_type == house_to.dwelling_type)
Listing = namedtuple(...)
Listing = namedtuple('Listing',
                     [ #...
                      'list_date', 'list_price', 'close_date', 'close_price'])
super(Model, self).__init__(data, index=fields)
def save(self):
    if type(self).Objects is None:
        type(self).Objects = DataFrame(columns=self.index)
    type(self).Objects = type(self).Objects.append(self, ignore_index=True)

Context

StackExchange Code Review Q#96642, answer score: 2

Revisions (0)

No revisions yet.