HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Calculating exponential moving average across millions of rows

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
rowsmillionsexponentialaveragecalculatingmovingacross

Problem

I am calculating the moving average for 5,534,446 rows in one table. I am using C# as my language and MySQL as the database. Below is the code I am using to gather and calculate an exponential moving average for different sets of days. The program is working perfectly but it is taking forever to calculate the values one at a time. Since I am new to programming, I figure this could be three times more efficient. What can I do to improve this program's speed?

```
private void CalculateMovingAverage()
{
l.CreateRunningEntry(3, "CalculateMovingAverage", "Beginning to calulate exponential moving averages for companies.");
conn.ConnectionString = r.getMysqlConnection();

//This is the list for the type of exponential averages we want to calculate
decimal[] movingDays = { 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 100, 200 };

//Go through each type of day moving average so that we calculate for each one
foreach (decimal avgDayLimit in movingDays)
{
//We'll go ahead and calculate the smoothing constant for each moving average type
decimal multiplier = CalcEMAMultiplier(avgDayLimit);

//Run a query to get all of the companies with the required number of records in the hitorical_data table
List symbolList = CompaniesWithHistoricalDays(conn, avgDayLimit);

//For each symbol, we need to make sure we have a simple moving average to start out with
foreach (string companySymbol in symbolList)
{
//Check if there are any existing records in the moving average table
bool maExists = DoesMovingAverageExist(conn, avgDayLimit, companySymbol);

//We will need to calculate SMA if there are no records in moving average table
if (maExists == false)
InsertSMA(conn, avgDayLimit, companySymbol)

Solution

You are making many, many round trips to the database. Stock information like this isn't very large, typically no more than 252 entries per year (trading days in a year, typically 365 minus holidays), per stock symbol. I think you would be better off bulk reading all of the data applicable, processing it all in memory, and then doing bulk updates back to the database.

If you have concerns about reading all of that information, you can always break it into chunks, like iterating over symbols instead. Read all of the data for the first 10, 50, 100 symbols, process and update, then do the same for the next N symbols, etc.

In any case, even though you keep the connection open between queries, you should cut back the number of trips to the database as much as possible (read everything you can, even if it might not be needed), as the overhead of sending the query, executing it, and reading back the results over the network is where your bottleneck is.

Context

StackExchange Code Review Q#120874, answer score: 3

Revisions (0)

No revisions yet.