patternMinor
Find value from other dataset based on 6 variables that must be equal in both datasets
Viewed 0 times
mustequalothervaluebothdatasetsdatasetthatbasedfind
Problem
I have two datasets that are both within a worksheet, call them Data and IBES. The code checks whether the 6 variables are the same in each dataset and then writes the value from a specific column to the other dataset. To find this value the code runs through 288503 lines which is dramatically slow.
How can I speed up this code?
How can I speed up this code?
Public Function GetRightValue()
Dim i As Integer
Dim j As Long
Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
Application.DisplayStatusBar = False
Application.EnableEvents = False
ActiveSheet.DisplayPageBreaks = False
For i = 2 To 1511 'Loop over all values from total dataset
For j = 2 To 288503 'Loop over all values from IBES file
If Worksheets("Data").Cells(i, 3) = Worksheets("IBES").Cells(j, 1) Then
If Worksheets("Data").Cells(i, 7) = Worksheets("IBES").Cells(j, 6) Then
If Worksheets("Data").Cells(i, 10) = Worksheets("IBES").Cells(j, 9) Then
If Worksheets("Data").Cells(i, 13) = Worksheets("IBES").Cells(j, 11) Then
If Worksheets("Data").Cells(i, 8) = Worksheets("IBES").Cells(j, 7) Then
If Worksheets("Data").Cells(i, 14).Text = Worksheets("IBES").Cells(j, 13).Text Then
Worksheets("Data").Cells(i, 12) = Worksheets("IBES").Cells(j, 10).Text
Worksheets("Data").Cells(i, 18) = Worksheets("IBES").Cells(j, 16).Text
End If
End If
End If
End If
End If
End If
Next j
Next i
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
Application.DisplayStatusBar = True
Application.EnableEvents = True
ActiveSheet.DisplayPageBreaks = True
End FunctionSolution
Speed
When working with data at any kind of scale, the golden rule is Do not interact directly with the worksheet. Accessing worksheet objects has huge computational overhead, and you are accessing it every time you check a cell's contents.
Dirk's answer demonstrates how to put a range of values into an Array. Operating on an Array is typically 10-100x faster than operating on a worksheet. When you're finished, just set (Range) = (Array) to print your data back to your sheet.
Magic Variables
Not the good kind of magic either. A magic variable is any hard-coded value. E.G.
Same goes for your column numbers. They should be named variables along the lines of
The key to all this is: How easy is it to break your Macro? If the answer is just "Add an extra row" or "Add/move a column" then it's not very good.
Also, you see this:
Then you can use it like so:
And should the name change, you only have to change it in one place.
Even better, have you ever heard of CodeNames? In VBA, every worksheet object has a "Name" which is displayed in Excel and the user can rename at will unless it's protected. Each sheet also has a "CodeName", which can only be seen / modified in the VBE Window. Additionally, a sheet's CodeName actsas a Worksheet variable. E.G. If I have a sheet called "Data" which has CodeName "wsData" then I can simply write
Without having to declare anything. And now, if the user renames your sheet to "Something Something Data", your code won't break, because the codeName will remain unchanged.
For your main loop, you shouldn't be using nested
And now it's much clearer what's going on, and it's much easier to add/move/re-arrange your test conditions at will, and you can add more actions based on a subset of conditions being true without having to write out a whole extra
When working with data at any kind of scale, the golden rule is Do not interact directly with the worksheet. Accessing worksheet objects has huge computational overhead, and you are accessing it every time you check a cell's contents.
Dirk's answer demonstrates how to put a range of values into an Array. Operating on an Array is typically 10-100x faster than operating on a worksheet. When you're finished, just set (Range) = (Array) to print your data back to your sheet.
Magic Variables
Not the good kind of magic either. A magic variable is any hard-coded value. E.G.
For i = 2 To 1511 'Loop over all values from total dataset
For j = 2 To 288503 'Loop over all values from IBES file2 to 1511, 2 to 288503, Magic Variables. Where have those numbers come from? How do you know they're still up to date? I gather they're the final row(s) in your data series. Why not determine them dynamically?Dim finalRow as long
finalRow = ws.Cells(Rows.Count, colNum).End(xlUp).Row
For i = 2 to finalRow '/ +1 to avoid Headers
...Same goes for your column numbers. They should be named variables along the lines of
condition1ColNum, condition2ColNum etc. Additionally, if you know what their headers are, I would start by iterating over your header row, and dynamically determining which columns they're actually in.The key to all this is: How easy is it to break your Macro? If the answer is just "Add an extra row" or "Add/move a column" then it's not very good.
Also, you see this:
Worksheets("Data").Cells(i, 3). You have that sheetName hardcoded 8 times. What if someone renames the sheet? Do you really want to go through and re-type it every time? Instead, use the VBa object model and create a proper Worksheet object:Dim wsData as Worksheet, wsIbes as Worksheet
Set wsData = sheets("Data")
Set wsIbes = sheets("IBES")Then you can use it like so:
wsData.Cells(i, 3)And should the name change, you only have to change it in one place.
Even better, have you ever heard of CodeNames? In VBA, every worksheet object has a "Name" which is displayed in Excel and the user can rename at will unless it's protected. Each sheet also has a "CodeName", which can only be seen / modified in the VBE Window. Additionally, a sheet's CodeName actsas a Worksheet variable. E.G. If I have a sheet called "Data" which has CodeName "wsData" then I can simply write
wsData.Cells(i, 3)Without having to declare anything. And now, if the user renames your sheet to "Something Something Data", your code won't break, because the codeName will remain unchanged.
For your main loop, you shouldn't be using nested
If statements. It's messy and leads to Arrow Code. I would do it something like so:For i = 2 To wsDataFinalRow 'Loop over all values from total dataset
For j = 2 To wsIbesFinalRow 'Loop over all values from IBES file
boolean1 = (Worksheets("Data").Cells(i, condition1DataColNum) = Worksheets("IBES").Cells(j, condition1IbesColNum))
boolean2 = (Worksheets("Data").Cells(i, condition2DataColNum) = Worksheets("IBES").Cells(j, condition2IbesColNum))
boolean3 = ...
...
passedTest = boolean1 And boolean2 And boolean3 And ...
If passedTest then
...
End If
next j
next iAnd now it's much clearer what's going on, and it's much easier to add/move/re-arrange your test conditions at will, and you can add more actions based on a subset of conditions being true without having to write out a whole extra
If ... If ... If ... If ... block.Code Snippets
For i = 2 To 1511 'Loop over all values from total dataset
For j = 2 To 288503 'Loop over all values from IBES fileDim finalRow as long
finalRow = ws.Cells(Rows.Count, colNum).End(xlUp).Row
For i = 2 to finalRow '/ +1 to avoid Headers
...Dim wsData as Worksheet, wsIbes as Worksheet
Set wsData = sheets("Data")
Set wsIbes = sheets("IBES")wsData.Cells(i, 3)wsData.Cells(i, 3)Context
StackExchange Code Review Q#113741, answer score: 8
Revisions (0)
No revisions yet.