HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Find value from other dataset based on 6 variables that must be equal in both datasets

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
mustequalothervaluebothdatasetsdatasetthatbasedfind

Problem

I have two datasets that are both within a worksheet, call them Data and IBES. The code checks whether the 6 variables are the same in each dataset and then writes the value from a specific column to the other dataset. To find this value the code runs through 288503 lines which is dramatically slow.

How can I speed up this code?

Public Function GetRightValue()

Dim i As Integer
Dim j As Long

Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
Application.DisplayStatusBar = False
Application.EnableEvents = False
ActiveSheet.DisplayPageBreaks = False

For i = 2 To 1511           'Loop over all values from total dataset
        For j = 2 To 288503      'Loop over all values from IBES file
            If Worksheets("Data").Cells(i, 3) = Worksheets("IBES").Cells(j, 1) Then
                If Worksheets("Data").Cells(i, 7) = Worksheets("IBES").Cells(j, 6) Then
                    If Worksheets("Data").Cells(i, 10) = Worksheets("IBES").Cells(j, 9) Then
                        If Worksheets("Data").Cells(i, 13) = Worksheets("IBES").Cells(j, 11) Then
                            If Worksheets("Data").Cells(i, 8) = Worksheets("IBES").Cells(j, 7) Then
                                If Worksheets("Data").Cells(i, 14).Text = Worksheets("IBES").Cells(j, 13).Text Then
                                    Worksheets("Data").Cells(i, 12) = Worksheets("IBES").Cells(j, 10).Text
                                    Worksheets("Data").Cells(i, 18) = Worksheets("IBES").Cells(j, 16).Text
                                End If
                            End If
                        End If
                    End If
                End If
            End If
        Next j
Next i

Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
Application.DisplayStatusBar = True
Application.EnableEvents = True
ActiveSheet.DisplayPageBreaks = True

End Function

Solution

Speed

When working with data at any kind of scale, the golden rule is Do not interact directly with the worksheet. Accessing worksheet objects has huge computational overhead, and you are accessing it every time you check a cell's contents.

Dirk's answer demonstrates how to put a range of values into an Array. Operating on an Array is typically 10-100x faster than operating on a worksheet. When you're finished, just set (Range) = (Array) to print your data back to your sheet.

Magic Variables

Not the good kind of magic either. A magic variable is any hard-coded value. E.G.

For i = 2 To 1511           'Loop over all values from total dataset
    For j = 2 To 288503       'Loop over all values from IBES file


2 to 1511, 2 to 288503, Magic Variables. Where have those numbers come from? How do you know they're still up to date? I gather they're the final row(s) in your data series. Why not determine them dynamically?

Dim finalRow as long

finalRow = ws.Cells(Rows.Count, colNum).End(xlUp).Row

For i = 2 to finalRow '/ +1 to avoid Headers

    ...


Same goes for your column numbers. They should be named variables along the lines of condition1ColNum, condition2ColNum etc. Additionally, if you know what their headers are, I would start by iterating over your header row, and dynamically determining which columns they're actually in.

The key to all this is: How easy is it to break your Macro? If the answer is just "Add an extra row" or "Add/move a column" then it's not very good.

Also, you see this: Worksheets("Data").Cells(i, 3). You have that sheetName hardcoded 8 times. What if someone renames the sheet? Do you really want to go through and re-type it every time? Instead, use the VBa object model and create a proper Worksheet object:

Dim wsData as Worksheet, wsIbes as Worksheet

Set wsData = sheets("Data")
Set wsIbes = sheets("IBES")


Then you can use it like so:

wsData.Cells(i, 3)


And should the name change, you only have to change it in one place.

Even better, have you ever heard of CodeNames? In VBA, every worksheet object has a "Name" which is displayed in Excel and the user can rename at will unless it's protected. Each sheet also has a "CodeName", which can only be seen / modified in the VBE Window. Additionally, a sheet's CodeName actsas a Worksheet variable. E.G. If I have a sheet called "Data" which has CodeName "wsData" then I can simply write

wsData.Cells(i, 3)


Without having to declare anything. And now, if the user renames your sheet to "Something Something Data", your code won't break, because the codeName will remain unchanged.

For your main loop, you shouldn't be using nested If statements. It's messy and leads to Arrow Code. I would do it something like so:

For i = 2 To wsDataFinalRow 'Loop over all values from total dataset
    For j = 2 To wsIbesFinalRow 'Loop over all values from IBES file
        boolean1 = (Worksheets("Data").Cells(i, condition1DataColNum) = Worksheets("IBES").Cells(j, condition1IbesColNum))
        boolean2 = (Worksheets("Data").Cells(i, condition2DataColNum) = Worksheets("IBES").Cells(j, condition2IbesColNum))
        boolean3 = ...
        ...
        passedTest = boolean1 And boolean2 And boolean3 And ...

        If passedTest then
            ...
        End If

    next j
next i


And now it's much clearer what's going on, and it's much easier to add/move/re-arrange your test conditions at will, and you can add more actions based on a subset of conditions being true without having to write out a whole extra If ... If ... If ... If ... block.

Code Snippets

For i = 2 To 1511           'Loop over all values from total dataset
    For j = 2 To 288503       'Loop over all values from IBES file
Dim finalRow as long

finalRow = ws.Cells(Rows.Count, colNum).End(xlUp).Row

For i = 2 to finalRow '/ +1 to avoid Headers

    ...
Dim wsData as Worksheet, wsIbes as Worksheet

Set wsData = sheets("Data")
Set wsIbes = sheets("IBES")
wsData.Cells(i, 3)
wsData.Cells(i, 3)

Context

StackExchange Code Review Q#113741, answer score: 8

Revisions (0)

No revisions yet.