HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Identifying surface events happening at specific time intervals

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
happeningsurfaceeventstimeintervalsspecificidentifying

Problem

Here is some code I wrote to surface Mint.com transactions that occur at monthly intervals, in order to identify subscriptions I may be paying for without realizing it.

I'd like to have some friends try this out as well, but Pandas is not a very common package for non-Python users, and it's requirement is keeping a lot of people from trying this out.

I'm looking for any advice to improve the code, or better recognize the 'monthly interval' piece (without Pandas would be even better).

```
import sys
from datetime import datetime

import pandas as pd

MINT_DATE_FORMAT = "%m/%d/%Y"
MINT_MERCH_COLUMN = "Description"
MINT_DATE_COLUMN = "Date"
MINT_AMT_COLUMN = "Amount"

def parse_date(datestring, format):
return datetime.strptime(datestring, format)

def calculate_date_diff(list):
diffs = []
for i in range(0,len(list)-1):
diffs.append(diff_month(parse_date(list[i], MINT_DATE_FORMAT),
parse_date(list[i+1], MINT_DATE_FORMAT)))
return diffs

def diff_month(d1, d2):
return(d1.year - d2.year)*12 + d1.month - d2.month

def unique_merchants(df, merch_column = MINT_MERCH_COLUMN):
return df[merch_column].unique().tolist()

def merchant_price_txn_list(df, merchant_name, txn_amount,
merch_column = MINT_MERCH_COLUMN, date_column = MINT_DATE_COLUMN,
amt_column = MINT_AMT_COLUMN):
return df[(df[merch_column] == merchant_name) & (df[amt_column] == txn_amount)][date_column].tolist()

def unique_pricepoints(df, merchant_name,
merch_column = MINT_MERCH_COLUMN, amt_column = MINT_AMT_COLUMN):
return df[(df[merch_column] == merchant_name)][amt_column].unique().tolist()

def main(csv):
foo = pd.read_csv(csv)
print("reading csv")
print("Read rows:", len(foo.index))
for merch in unique_merchants(foo):
for pricepoint in unique_pricepoints(foo, merch):
unique_mo_diffs = set(calculate_date_diff(
merchant_price_txn_list(foo, merch, pricepoint)))
if len(

Solution

With the caveat that I am not familiar with either the Mint.com format or Pandas, I can make a few suggestions:

-
The Python convention is to indent by 4 spaces, and wrap lines at 79 characters. It’s not a hard requirement, but it would be less jarring for me to read your code. And use one space after commas, it’s easier to read.

-
There aren’t any comments or docstrings to help me understand your code, which will hinder anybody trying to review or understand it.

-
It’s not clear to me why you define the parse_date() function. It has the same argument signature as datetime.strptime(), but now somebody familiar with the datetime module will find it harder to read your code. It would be better to replace all calls to this function with datetime.strptime().

-
In the calculate_diff_list function:

  • Don’t use list as a parameter. Overriding builtins is a bad idea, and can lead to subtle bugs.



  • You don’t say if you’re using Python 3 or Python 2.x; if the latter, use xrange() instead. It’s faster and more memory efficient.



-
In the diff_month function, it would be good to provide a docstring explaining exactly what you mean by “difference in months”, and how this function should be used. You should also specify the ordering, because it’s not necessarily obvious.

For example, consider 31 Dec 2014 and 1 Jan 2015. Are they a month apart or not? Your function returns 1 and -1 (!) depending on the ordering.

-
In unique_merchants(), you could make a couple of tweaks:

  • Default keyword arguments don’t have spaces around the equals sign.



  • If you could drop the pandas requirement, this could be reduced to set(df[merch_column]). You only need an iterable, not a list. This goes for the other functions along here (which also need some docstrings and comments, really).



-
As far as I can tell, the only bit of functionality that relies on pandas is foo = pd.read_csv(csv). (foo and csv are terrible variable names.) Have a look and see if the builtin csv module can be used here, then you might be able to simplify some more of the code.

Context

StackExchange Code Review Q#78807, answer score: 4

Revisions (0)

No revisions yet.