HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Working with pandas dataframes for stock backtesting exercise

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
pandaswithstockworkingexercisebacktestingfordataframes

Problem

I'm attempting to apply a long set of conditions and operations onto a pandas dataframe (see the dataframe below with VTI, upper, lower, etc). I attempted to use apply, but I was having a lot of trouble doing so. My current solution (which works perfectly) relies on a for loop iterating through the dataframe. But my sense is that this is an inefficient way to complete my simulation. I'd appreciate help on the design of my code.



VTI uppelower sell buy AU BU BL date Tok order
44.58 NaN NaN False False False False False 2001-06-15 5 0
44.29 NaN NaN False False False False False 2001-06-18 5 1
44.42 NaN NaN False False False False False 2001-06-19 5 2
44.88 NaN NaN False False False False False 2001-06-20 5 3
45.24 NaN NaN False False False False False 2001-06-21 5 4


If I wanted to run a bunch of conditions and for loops like the below and run the function below (the get row data function) only if the row meets the conditions provided, how would I do so?

My intuition says to use .apply() but I'm not clear how to do it within this scenario. With all the if's and for-loops combined, it's a lot of rows. The below actually outputs an entirely new dataframe. I'm wondering if there are more efficient/better ways to think about the design of this simulation/stock backtesting process.

Get row data simply obtains key data from the dataframe and computes certain information based on globals (like how much capital I have already, how many stocks I have already) and spits out a list. I append all these lists into a dataframe that I call the portfolio.

I've given a snippet of the code that I've already made using a for-loop.

` ##This is only for the sell portion of the algorithm
if val['sell'] == True and tokens == maxtokens:
print 'nothign to sell'

if val['sell'] == True and tokens = sellbuybuffer:
status = 'sold'

#This

Solution

Pandas allows you to filter dataframes efficiently using boolean formulas.

Instead of using a for loop and conditional branching, use the following syntax:

df = portfolio[(portfolio['sell'] == True) & (portfolio['Tok'] < maxtokens)]


To sort a dataframe, you can also simply write:

portfolio = portfolio.sort('VTI', ascending=False)
sold_positions = portfolio[portfolio['BL'] == True].sort('upperlower', ascending=True)

Code Snippets

df = portfolio[(portfolio['sell'] == True) & (portfolio['Tok'] < maxtokens)]
portfolio = portfolio.sort('VTI', ascending=False)
sold_positions = portfolio[portfolio['BL'] == True].sort('upperlower', ascending=True)

Context

StackExchange Code Review Q#43517, answer score: 3

Revisions (0)

No revisions yet.