HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Separating data by a particular cell

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
particularcellseparatingdata

Problem

I am trying to improve some code I did not write, mostly to make it easier for other people to understand (I found it to be really weirdly written). I attempted to rewrite it, and it does basically the same thing, using roughly the same processes.

However, the old one takes about 2 minutes to run with some data I recorded. The other takes well over an hour to work on the same data. What on earth is going on?

The data format they work on is:

I should add: I would use dictionaries, as I think they would be fastest, except the number of "TextX" is basically random. Sometimes only Text1 appears. Sometimes it goes all the way to Text20, or more.

Old code:

```
sub DivideSheet()
Application.ScreenUpdating = False

Dim name As String
Dim point_name As String
Dim SheetCount, sheetNumber As Integer
Dim RowCount, RowStart As Long
Dim Exist, RowEmpty As Boolean

Sheets("RAW DATA").Columns("A:A").TextToColumns Destination:=Range("A1"), DataType:=xlFixedWidth, _
FieldInfo:=Array(Array(0, 1), Array(12, 1), Array(13, 1), Array(22, 1), Array(23, 1), Array(27, 1), _
Array(31, 1), Array(36, 1)), TrailingMinusNumbers:=True

Exist = False
RowCount = 1
RowStart = 1
SheetCount = 1
point_name = Worksheets(1).Cells(1, 3)

For RowCount = 1 To 1048576 'loop through all the rows in the sheet

If CStr(Sheets(1).Cells(RowCount, 3)) = "" Then 'If the cell isnt a new transmition
'do nothing
ElseIf CStr(Sheets(1).Cells(RowCount, 3)) = point_name Then 'If the new transmition is from the same node
'Do nothing
Else 'If its a new node

For SheetCount = 1 To Sheets.Count 'loop through sheets

If Worksheets(SheetCount).name = point_name Then 'If the sheet name matches point_name
Exist

Solution

There are a couple places in your code where you've "de-optimized". I'm not sure which of these adds up to a hour of run-time, but these are the most egregious:

Removing variable declarations makes them implicitly Variant. The following are undeclared in DivideSheet(): wb1, i, a, and wsh. This is a huge performance penalty, because every time you use them, the underlying values need to be coerced out of the Variant. You also can't store strongly typed objects in a Variant, so that means every single one of your calls to wb1 and wsh is late-bound instead of early-bound (that's why you don't get any Intellisense when you type wb1. or wsh.. That's a ton of needless overhead, especially when you're in a loop. If you can early-bind, do it. Put Option Explicit at the top of the module, and then declare everything. Put the strong typing back in.

Note that the original code had this wrong too - lines like...

Dim RowCount, RowStart As Long


...only type the last variable as Long - the rest are Variant.

You also removed the strong typing from CopyLine. In the old code, it's declared as:

Public Sub CopyLine(sheetNumber As Integer, RowStart As Long, rowNumber As Long)


You have it as Public Sub CopyLine(a, i, Point), which means that all of the parameters are implicitly Variant. Granted, you're basically using Variant for everything, but again, you just added at least 3 extra unboxings for every call. Not to mention the extra work that Excel needs to do when you pass them as indexers to Cells. Put the strong typing back in.

You created implicit casts when you extracted the sheetexists function. You have it declared as returning a Long, but you declare a as Integer and then return that with sheetexists = a. Then when you call it here...

a = sheetexists(Point)


...you coerce it into a Variant (a is undeclared in DivideSheet), then on the next line you treat is as a Boolean...

If a Then


...before passing it (as a Variant) to CopyLine, where it gets used as a Variant of subtype Integer again: Sheets(a). (Whew!). Put the... wait for it... strong typing back in.

You forgot wb1 when you extracted the sheetexists function. You have Set wb1 = ThisWorkbook at the top of DivideSheet...and the repeat that call at the top of both sheetexists and CopyLine. If you want to make the 2 functions a bit more reusable, give them a Workbook parameter and pass that in. If you don't care, skip the variable declaration all together and just use ThisWorkbook explicitly. Note that this isn't the same as ActiveWorkbook - it's a hard reference to the ThisWorkbook class in the project the code is in.

The original code cached the result of the call to Sheets.Count. This code only calls Sheets.Count once when it set up the loop:

For SheetCount = 1 To Sheets.Count                      'loop through sheets

        If Worksheets(SheetCount).name = point_name Then      'If the sheet name matches point_name
            Exist = True                                    'set flag to true
            sheetNumber = SheetCount                        'Record Sheet number
            Exit For                                        'Exit the for loop
        End If

    Next SheetCount

    If Exist = False Then                                   'If the Node didnt have a sheet
        sheetNumber = SheetCount
        Worksheets.Add after:=Worksheets(SheetCount - 1)    'Create a sheet
        Worksheets(Sheets.Count).name = point_name            'Name it for the RTU
    End If


Your code calls it 4 times every time you add a sheet. You actually don't even need it in the loop at all, because if you don't end up with Sheets.Count + 1 worksheets after you call Sheets.Add, then there is something very seriously wrong (this is a problem in the original code too). Cache values that you need to reuse.

Getting back to CopyLine, your code isn't equivalent at all. The original function is actually pretty efficient at what it does - it calculates what it needs to copy, then does that in a single operation. Your code doesn't perform a calculation at all - it counts row by row with this code (line continuations added for clarity):

While (wb1.Sheets(1).Cells(j, 3) = "" Or _
       wb1.Sheets(1).Cells(j, 3) = Point) And _
       j <= wb1.Sheets(1).UsedRange.Rows.Count
    j = j + 1
Wend


VBA doesn't short circuit the calls, so first you get the value of Cells(j, 3) (implicitly) twice, compare it to two different things, and (then this is the huge one) get the exact same value for wb1.Sheets(1).UsedRange.Rows.Count every time through the loop. Again, cache values that you need to reuse. This is much easier if you use a Do loop (there isn't an Exit While statement):

```
Dim rowCount As Long
rowCount = wb1.Sheets(1).UsedRange.Rows.Count
Do While j <= rowCount
Dim cellValue As Variant
cellValu

Code Snippets

Dim RowCount, RowStart As Long
Public Sub CopyLine(sheetNumber As Integer, RowStart As Long, rowNumber As Long)
a = sheetexists(Point)
For SheetCount = 1 To Sheets.Count                      'loop through sheets

        If Worksheets(SheetCount).name = point_name Then      'If the sheet name matches point_name
            Exist = True                                    'set flag to true
            sheetNumber = SheetCount                        'Record Sheet number
            Exit For                                        'Exit the for loop
        End If

    Next SheetCount

    If Exist = False Then                                   'If the Node didnt have a sheet
        sheetNumber = SheetCount
        Worksheets.Add after:=Worksheets(SheetCount - 1)    'Create a sheet
        Worksheets(Sheets.Count).name = point_name            'Name it for the RTU
    End If
While (wb1.Sheets(1).Cells(j, 3) = "" Or _
       wb1.Sheets(1).Cells(j, 3) = Point) And _
       j <= wb1.Sheets(1).UsedRange.Rows.Count
    j = j + 1
Wend

Context

StackExchange Code Review Q#154103, answer score: 4

Revisions (0)

No revisions yet.