HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonModerate

Find the peak stock price for each company from CSV data

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
peakthepriceeachcompanycsvstockforfindfrom

Problem

During the hiring process, a company gave me this test:


Q) Consider Share prices for a N number of companies given for each month since year 1990 in a CSV file. Format of the file is as below with first line as header.

Year,Month,Company A, Company B,Company C, .............Company N
1990, Jan, 10, 15, 20, , ..........,50
1990, Feb, 10, 15, 20, , ..........,50
.
.
.
.
2013, Sep, 50, 10, 15............500



a) List for each Company year and month in which the share price was highest.


b) Submit a unit test with sample data to support your solution.

They wanted me to not use any third party libraries so I did this:

`import csv

csv_file = open('demo.csv', "rb")
reader = csv.reader(csv_file)
master_dic = {}
final_list= []
rownum = 0
for row in reader:
if rownum == 0:
header = row
else:
for i in range(2,len(header)):
"""

# Here it will create a dictionary which till have a structure like this
{'Company': {'1990': {'Mar': 18.0, 'Feb': 19.0, 'Aug': 19.0},
'1991': {'Mar': 10.0, 'Feb': 21.0, 'Aug': 21.0, 'Sep': 23.0, 'May': 26.0}}}

"""
if header[i] in master_dic:
if row[0] in master_dic[header[i]]:
master_dic[header[i]][row[0]][row[1]] =float(row[i])
else:
master_dic[header[i]][row[0]] ={}
master_dic[header[i]][row[0]][row[1]] =float(row[i])
else:
master_dic[header[i]] = {}
master_dic[header[i]][row[0]] = {}
master_dic[header[i]][row[0]][row[1]] =float(row[i])

rownum += 1

# Here we will Iterate over the master_dic dictionary and find out the highest price
# of the shares in every company in all the months of all the years.

for company,items in master_dic.iteritems():
for year,items_1 in items.iteritems():
maxima = 0
maxima_month = ''
temp_list = []
for months,sh

Solution

Meeting the specifications

The specifications said to accept a CSV file, so your test data should be comma-delimited, not space-delimited. In practice, it's not a big deal, but if you're answering an interview question, don't deviate from the instructions unless you can justify it with a good reason.

Your output looks like this:

[['Company D', '1991', 'Jan, July', 26.0], ['Company D', '1990', 'May', 26.0], ['Company D', '1993', 'June', 26.0], ['Company D', '1992', 'Mar', 23.0], ['Company A', '1991', 'Oct', 24.0], ['Company A', '1990', 'June, Dec', 18.0], ['Company A', '1993', 'June', 23.0], ['Company A', '1992', 'April', 23.0], ['Company B', '1991', 'Sep', 22.0], ['Company B', '1990', 'April, Oct', 22.0], ['Company B', '1993', 'Sep', 42.0], ['Company B', '1992', 'July', 42.0], ['Company C', '1991', 'Oct', 42.0], ['Company C', '1990', 'Mar, Sep', 23.0], ['Company C', '1993', 'April, Oct', 26.0], ['Company C', '1992', 'Feb, Aug', 26.0]]


I would expect that it should look something like this:

Company A: 1991 Oct (24)
Company B: 1992 July (42)
Company C: 1991 Oct (42)
Company D: 1990 May (26)


So, you haven't actually solved the problem laid out for you. Based on the not-quite-correctly-formatted input that you chose for your test case, and the totally incorrect output, I think that would be strong case for rejection.

Impressions of the code

You used the csv module instead of trying to parse CSV yourself. That's good.

Ever since Python 2.5, it is almost always better to open your files using a with block, so that they will be closed for you automatically. You should open it in text mode, not binary mode.

You haven't defined any functions or classes. Your code is just a bunch of free-floating instructions. Functions and classes help organize your code and your way of thinking. It forces you to name each chunk of code according to its purpose, and to define what the inputs and outputs are for each chunk of code. You could use that kind of discipline.

This looks complicated:

master_dic[header[i]][row[0]][row[1]] =float(row[i])


Interviewers don't like to deal with complicated answers any more than you do, so they are unlikely to ask questions that require complicated code to solve. Don't give your interviewer a headache. You'll have to find a way to express yourself such that the interviewer will want to read your code. (If I, as an interviewer, saw a candidate produce headache-inducing code, I might ask, "Can you make it prettier?" I wouldn't spend too much time puzzling it out — it's just not worth it.)

You've implemented the solution in two passes. It should be possible to do it in one. I would accept a two-pass solution if it were done to maintain an elegant abstraction, but I don't think you can claim that justification.

In your first pass, you keep track of rownum. Why? You could just use csvreader.line_num Better yet, just fetch the first row using next(reader), then you can do for row in reader without ever having to think about encountering the header row ever again. Better still, use a DictReader, which interprets the first row as fieldnames.

In the second pass, you used some questionable variable names. It's not clear what items_1 is supposed to contain. Also, you have a variable named temp_list. I believe that any variable with "temp" in its name is likely to be a sign of muddled thinking.

Sample solution

Here's what I came up with.

import csv
from collections import namedtuple

class Peak (namedtuple('Peak', ['year', 'month', 'price'])):
    def __lt__(self, other):
        return self.price is None or self.price  other.price

def max_stock_prices(f):
    csv_reader = csv.DictReader(f)
    companies = csv_reader.fieldnames[2:]   # Discard year, month columns
    peaks = dict((c, Peak(None, None, None)) for c in companies)

    for row in csv_reader:
        year, month = row['year'], row['month']
        current = dict((c, Peak(year, month, float(row[c]))) for c in companies)
        peaks = dict((c, max(peaks[c], current[c])) for c in companies)

    return peaks

with open('demo.csv') as f:
    peaks = max_stock_prices(f)
for company in sorted(peaks):
    print("%s: %s %s (%.f)" % (company, peaks[company].year, peaks[company].month, peaks[company].price))


Notable points:

  • Use DictReader and namedtuple to avoid the master_dic[header[i]][row[0]][row[1]] =float(row[i]) headache mentioned above.



  • Define the ` comparison operators to let max(peakobj1, peakobj2) work.



  • The goal of all that preparation work is to beautify max_stock_prices()`.



  • The amount of free-floating code is just four lines at the end, which is OK since you can see at a glance what it does.



Caveat: If the same maximum is attained on several rows, the month chosen as the peak is arbitrary.

Code Snippets

[['Company D', '1991', 'Jan, July', 26.0], ['Company D', '1990', 'May', 26.0], ['Company D', '1993', 'June', 26.0], ['Company D', '1992', 'Mar', 23.0], ['Company A', '1991', 'Oct', 24.0], ['Company A', '1990', 'June, Dec', 18.0], ['Company A', '1993', 'June', 23.0], ['Company A', '1992', 'April', 23.0], ['Company B', '1991', 'Sep', 22.0], ['Company B', '1990', 'April, Oct', 22.0], ['Company B', '1993', 'Sep', 42.0], ['Company B', '1992', 'July', 42.0], ['Company C', '1991', 'Oct', 42.0], ['Company C', '1990', 'Mar, Sep', 23.0], ['Company C', '1993', 'April, Oct', 26.0], ['Company C', '1992', 'Feb, Aug', 26.0]]
master_dic[header[i]][row[0]][row[1]] =float(row[i])
import csv
from collections import namedtuple

class Peak (namedtuple('Peak', ['year', 'month', 'price'])):
    def __lt__(self, other):
        return self.price is None or self.price < other.price

    def __eq__(self, other):
        return self.price == other.price

    def __gt__(self, other):
        return other.price is None or self.price > other.price

def max_stock_prices(f):
    csv_reader = csv.DictReader(f)
    companies = csv_reader.fieldnames[2:]   # Discard year, month columns
    peaks = dict((c, Peak(None, None, None)) for c in companies)

    for row in csv_reader:
        year, month = row['year'], row['month']
        current = dict((c, Peak(year, month, float(row[c]))) for c in companies)
        peaks = dict((c, max(peaks[c], current[c])) for c in companies)

    return peaks


with open('demo.csv') as f:
    peaks = max_stock_prices(f)
for company in sorted(peaks):
    print("%s: %s %s (%.f)" % (company, peaks[company].year, peaks[company].month, peaks[company].price))

Context

StackExchange Code Review Q#56531, answer score: 13

Revisions (0)

No revisions yet.