patternMinor
Separating data by a particular cell
Viewed 0 times
particularcellseparatingdata
Problem
I am trying to improve some code I did not write, mostly to make it easier for other people to understand (I found it to be really weirdly written). I attempted to rewrite it, and it does basically the same thing, using roughly the same processes.
However, the old one takes about 2 minutes to run with some data I recorded. The other takes well over an hour to work on the same data. What on earth is going on?
The data format they work on is:
I should add: I would use dictionaries, as I think they would be fastest, except the number of "TextX" is basically random. Sometimes only Text1 appears. Sometimes it goes all the way to Text20, or more.
Old code:
```
sub DivideSheet()
Application.ScreenUpdating = False
Dim name As String
Dim point_name As String
Dim SheetCount, sheetNumber As Integer
Dim RowCount, RowStart As Long
Dim Exist, RowEmpty As Boolean
Sheets("RAW DATA").Columns("A:A").TextToColumns Destination:=Range("A1"), DataType:=xlFixedWidth, _
FieldInfo:=Array(Array(0, 1), Array(12, 1), Array(13, 1), Array(22, 1), Array(23, 1), Array(27, 1), _
Array(31, 1), Array(36, 1)), TrailingMinusNumbers:=True
Exist = False
RowCount = 1
RowStart = 1
SheetCount = 1
point_name = Worksheets(1).Cells(1, 3)
For RowCount = 1 To 1048576 'loop through all the rows in the sheet
If CStr(Sheets(1).Cells(RowCount, 3)) = "" Then 'If the cell isnt a new transmition
'do nothing
ElseIf CStr(Sheets(1).Cells(RowCount, 3)) = point_name Then 'If the new transmition is from the same node
'Do nothing
Else 'If its a new node
For SheetCount = 1 To Sheets.Count 'loop through sheets
If Worksheets(SheetCount).name = point_name Then 'If the sheet name matches point_name
Exist
However, the old one takes about 2 minutes to run with some data I recorded. The other takes well over an hour to work on the same data. What on earth is going on?
The data format they work on is:
I should add: I would use dictionaries, as I think they would be fastest, except the number of "TextX" is basically random. Sometimes only Text1 appears. Sometimes it goes all the way to Text20, or more.
Old code:
```
sub DivideSheet()
Application.ScreenUpdating = False
Dim name As String
Dim point_name As String
Dim SheetCount, sheetNumber As Integer
Dim RowCount, RowStart As Long
Dim Exist, RowEmpty As Boolean
Sheets("RAW DATA").Columns("A:A").TextToColumns Destination:=Range("A1"), DataType:=xlFixedWidth, _
FieldInfo:=Array(Array(0, 1), Array(12, 1), Array(13, 1), Array(22, 1), Array(23, 1), Array(27, 1), _
Array(31, 1), Array(36, 1)), TrailingMinusNumbers:=True
Exist = False
RowCount = 1
RowStart = 1
SheetCount = 1
point_name = Worksheets(1).Cells(1, 3)
For RowCount = 1 To 1048576 'loop through all the rows in the sheet
If CStr(Sheets(1).Cells(RowCount, 3)) = "" Then 'If the cell isnt a new transmition
'do nothing
ElseIf CStr(Sheets(1).Cells(RowCount, 3)) = point_name Then 'If the new transmition is from the same node
'Do nothing
Else 'If its a new node
For SheetCount = 1 To Sheets.Count 'loop through sheets
If Worksheets(SheetCount).name = point_name Then 'If the sheet name matches point_name
Exist
Solution
There are a couple places in your code where you've "de-optimized". I'm not sure which of these adds up to a hour of run-time, but these are the most egregious:
Removing variable declarations makes them implicitly
Note that the original code had this wrong too - lines like...
...only type the last variable as
You also removed the strong typing from
You have it as
You created implicit casts when you extracted the
...you coerce it into a
...before passing it (as a
You forgot
The original code cached the result of the call to
Your code calls it 4 times every time you add a sheet. You actually don't even need it in the loop at all, because if you don't end up with
Getting back to
VBA doesn't short circuit the calls, so first you get the value of
```
Dim rowCount As Long
rowCount = wb1.Sheets(1).UsedRange.Rows.Count
Do While j <= rowCount
Dim cellValue As Variant
cellValu
Removing variable declarations makes them implicitly
Variant. The following are undeclared in DivideSheet(): wb1, i, a, and wsh. This is a huge performance penalty, because every time you use them, the underlying values need to be coerced out of the Variant. You also can't store strongly typed objects in a Variant, so that means every single one of your calls to wb1 and wsh is late-bound instead of early-bound (that's why you don't get any Intellisense when you type wb1. or wsh.. That's a ton of needless overhead, especially when you're in a loop. If you can early-bind, do it. Put Option Explicit at the top of the module, and then declare everything. Put the strong typing back in.Note that the original code had this wrong too - lines like...
Dim RowCount, RowStart As Long...only type the last variable as
Long - the rest are Variant.You also removed the strong typing from
CopyLine. In the old code, it's declared as:Public Sub CopyLine(sheetNumber As Integer, RowStart As Long, rowNumber As Long)You have it as
Public Sub CopyLine(a, i, Point), which means that all of the parameters are implicitly Variant. Granted, you're basically using Variant for everything, but again, you just added at least 3 extra unboxings for every call. Not to mention the extra work that Excel needs to do when you pass them as indexers to Cells. Put the strong typing back in.You created implicit casts when you extracted the
sheetexists function. You have it declared as returning a Long, but you declare a as Integer and then return that with sheetexists = a. Then when you call it here...a = sheetexists(Point)...you coerce it into a
Variant (a is undeclared in DivideSheet), then on the next line you treat is as a Boolean...If a Then...before passing it (as a
Variant) to CopyLine, where it gets used as a Variant of subtype Integer again: Sheets(a). (Whew!). Put the... wait for it... strong typing back in.You forgot
wb1 when you extracted the sheetexists function. You have Set wb1 = ThisWorkbook at the top of DivideSheet...and the repeat that call at the top of both sheetexists and CopyLine. If you want to make the 2 functions a bit more reusable, give them a Workbook parameter and pass that in. If you don't care, skip the variable declaration all together and just use ThisWorkbook explicitly. Note that this isn't the same as ActiveWorkbook - it's a hard reference to the ThisWorkbook class in the project the code is in.The original code cached the result of the call to
Sheets.Count. This code only calls Sheets.Count once when it set up the loop:For SheetCount = 1 To Sheets.Count 'loop through sheets
If Worksheets(SheetCount).name = point_name Then 'If the sheet name matches point_name
Exist = True 'set flag to true
sheetNumber = SheetCount 'Record Sheet number
Exit For 'Exit the for loop
End If
Next SheetCount
If Exist = False Then 'If the Node didnt have a sheet
sheetNumber = SheetCount
Worksheets.Add after:=Worksheets(SheetCount - 1) 'Create a sheet
Worksheets(Sheets.Count).name = point_name 'Name it for the RTU
End IfYour code calls it 4 times every time you add a sheet. You actually don't even need it in the loop at all, because if you don't end up with
Sheets.Count + 1 worksheets after you call Sheets.Add, then there is something very seriously wrong (this is a problem in the original code too). Cache values that you need to reuse.Getting back to
CopyLine, your code isn't equivalent at all. The original function is actually pretty efficient at what it does - it calculates what it needs to copy, then does that in a single operation. Your code doesn't perform a calculation at all - it counts row by row with this code (line continuations added for clarity):While (wb1.Sheets(1).Cells(j, 3) = "" Or _
wb1.Sheets(1).Cells(j, 3) = Point) And _
j <= wb1.Sheets(1).UsedRange.Rows.Count
j = j + 1
WendVBA doesn't short circuit the calls, so first you get the value of
Cells(j, 3) (implicitly) twice, compare it to two different things, and (then this is the huge one) get the exact same value for wb1.Sheets(1).UsedRange.Rows.Count every time through the loop. Again, cache values that you need to reuse. This is much easier if you use a Do loop (there isn't an Exit While statement):```
Dim rowCount As Long
rowCount = wb1.Sheets(1).UsedRange.Rows.Count
Do While j <= rowCount
Dim cellValue As Variant
cellValu
Code Snippets
Dim RowCount, RowStart As LongPublic Sub CopyLine(sheetNumber As Integer, RowStart As Long, rowNumber As Long)a = sheetexists(Point)For SheetCount = 1 To Sheets.Count 'loop through sheets
If Worksheets(SheetCount).name = point_name Then 'If the sheet name matches point_name
Exist = True 'set flag to true
sheetNumber = SheetCount 'Record Sheet number
Exit For 'Exit the for loop
End If
Next SheetCount
If Exist = False Then 'If the Node didnt have a sheet
sheetNumber = SheetCount
Worksheets.Add after:=Worksheets(SheetCount - 1) 'Create a sheet
Worksheets(Sheets.Count).name = point_name 'Name it for the RTU
End IfWhile (wb1.Sheets(1).Cells(j, 3) = "" Or _
wb1.Sheets(1).Cells(j, 3) = Point) And _
j <= wb1.Sheets(1).UsedRange.Rows.Count
j = j + 1
WendContext
StackExchange Code Review Q#154103, answer score: 4
Revisions (0)
No revisions yet.