HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Comparing two lists under 6 columns with 32000 rows in Excel

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
rowscomparingcolumnsexcelwithtwounder32000lists

Problem

I need to import 2 .s19 files to Excel and then compare across 6 Columns and store the output to a txt file. I have my data from column A to F. My current program doesn't store the data using a dictionary or hashtable. It takes too long to complete as there are around 31k lines in the list 1 and 32k lines in list 2. I would like to use a dictionary to store the data but I am unsure how to do it.

My data is in this format:

Source Data                                    Dump Data

            A                B             C               D                E            F
[1] file format(4char) address(6char) data(66char) file format(4char) address(6char) data(66char)


1) If the first line (A2) or (D2) starts with "S011" or "S804" skip and read next line.

2) If the address in Column B matches the address in Column E, check if the adjacent data linked to it is also a match ie. if B10 = E10, check if C10 = F10. If there is a mismatch in data output to txt file in this format:

Source Data
Address
A10+B10+C10
Dump Data
Address
D10+E10+F10
Result: NOK


3) If column B address is not found in Column E ,

Output :
Source Data
Address
A10+B10+C10
Result: NOK


4) If column E address is not found in Column B , check if the string of data in column F are all 'F's excluding the last 2 characters.
if All F's then output OK
if Not All F's then output NOK

Output :
Dump Data
Address
D10+E10+F10
Result: NOK


My current code:

```
Sub Compare()

Worksheets(1).Select
Dim orig_folderpath As String, orig_file As String, orig_filename As String
Dim dump_folderpath As String
Dim OutputPath As String, OutputFilename As String
Dim VarEntry As String
Dim count As Integer
Dim mainWB As Workbook
Dim Result As String
Dim startadd As String, endadd As String, Dec_startadd As String, lgth As Integer

Set mainWB = ActiveWorkbook

dump_folderpath = Range("C2").Value

Application.DisplayAlerts = False
For i = mainWB.Sheets.count To 3 Step -1
Sheets(i).Delete
Next

Solution

Breaking your Code in Subs and Functions: as a first step in trying to optimize, I would recommend splitting this big sub into smaller ones. For instance, this piece of code could be a sub on its own, with a clear purpose and responsibility:

Sub DeleteSheets()
    Application.DisplayAlerts = False
    For i = mainWB.Sheets.count To 3 Step -1
        Sheets(i).Delete
    Next
    Application.DisplayAlerts = True
End Sub


You already split your code with comments and blocks of variable declarations. That should give you a head start on where to extract some code to a new method. This will also help you spotting common pieces of code that you can reuse (which has a lot of benefits). For instance, at the end of the sub you have several constructions that look like this:

For l = 1 To 16
    line1 = line1 & Right(Cells(tcell.Row + 2, tcell.Column + l), 2)
Next


Precedeed with the linex variable declarations and followed by:

line1 = Replace(line1, "X", "?")
....
sMatch1 = data1 Like line1
....
Debug.Print data1 & "  " & line1


If you apply what I tell you above, you could easily refactor all those into just one loop (you probably don't need to have 11 variables, because on each loop iteration you are doing everything you need to do to the line variable), and then your code would be easier to read. In the process of doing this, you will also force yourself to understand better the patterns in your code, and then will be able to optimize them more easily.

Variable Names: also, give meaningful names to your variables. lr, si, sf, etc. are not meaningful on their own, and then you need to use comments to explain what each of these variables are. This makes reading your code more difficult for the person reviewing it (someone like me, or even yourself in the future as well), and requires additional elements (in this case, comments), which would be unnecessary if you used proper variable names (that when read, are self-explanatory).

Beware of Macro Recording Generated Code: as it tends to be too verbose, and to include things you don't really require. This is just a hunch, and I might be wrong, but code that looks like this, in several places of your sub, makes me think that you recorded some steps of the process (which is OK), but didn't clean it afterwards. In a performance-sensitive application, this is not good, as it can lead to unnecessary steps being executed over and over:

With ActiveSheet.QueryTables.Add(Connection:= _
    "TEXT;" & dump_folderpath & "\" & dump_file_(k), Destination:=Range("$D$3"))
    .Name = dump_file_(k)
    .FieldNames = True
    .RowNumbers = False
    .FillAdjacentFormulas = False
    .PreserveFormatting = True
    .RefreshOnFileOpen = False
    .RefreshStyle = xlInsertDeleteCells
    .SavePassword = False
    .SaveData = True
    .AdjustColumnWidth = True
    .RefreshPeriod = 0
    .TextFilePromptOnRefresh = False
    .TextFilePlatform = 437
    .TextFileStartRow = 1
    .TextFileParseType = xlDelimited
    .TextFileTextQualifier = xlTextQualifierDoubleQuote
    .TextFileConsecutiveDelimiter = True
    .TextFileTabDelimiter = False
    .TextFileSemicolonDelimiter = False
    .TextFileCommaDelimiter = False
    .TextFileSpaceDelimiter = True
    .TextFileColumnDataTypes = Array(1)
    .TextFileTrailingMinusNumbers = True
    .Refresh BackgroundQuery:=False
End With


So I would suggest you take a look at that code and start removing lines that seem unnecessary, testing your application every step of the way, until you get an streamlined version of the code. (This also applies to code taken from other sources and used in your own applications).

Cache your Heavily-Used Referenced Values: some cells contain values that you need to use in several places, or in loops. For instance targetcell.Row, targetcell.Column, tcell.Row and tcell.Column are heavily used and make your program calculate those reference over and over. Instead of this, create a targetCellRow variable, assign targetcell.Row to it, and use targetCellRow whenever you need to reference the target cell row. Notice that some of these values are used in multiple loops! Again, when performance counts, this things start to pile up.

Cherry on the Cake: finally, here is some code I posted in another thread which you can add to optimize almost any Excel VBA code that is taking too long.

After you apply these changes, then you can see if you still require to use a dictionary, array or any other construct to improve performance even better. The good thing is that at that point, your code will be in a more proper state to spot and apply required optimizations.

Code Snippets

Sub DeleteSheets()
    Application.DisplayAlerts = False
    For i = mainWB.Sheets.count To 3 Step -1
        Sheets(i).Delete
    Next
    Application.DisplayAlerts = True
End Sub
For l = 1 To 16
    line1 = line1 & Right(Cells(tcell.Row + 2, tcell.Column + l), 2)
Next
line1 = Replace(line1, "X", "?")
....
sMatch1 = data1 Like line1
....
Debug.Print data1 & "  " & line1
With ActiveSheet.QueryTables.Add(Connection:= _
    "TEXT;" & dump_folderpath & "\" & dump_file_(k), Destination:=Range("$D$3"))
    .Name = dump_file_(k)
    .FieldNames = True
    .RowNumbers = False
    .FillAdjacentFormulas = False
    .PreserveFormatting = True
    .RefreshOnFileOpen = False
    .RefreshStyle = xlInsertDeleteCells
    .SavePassword = False
    .SaveData = True
    .AdjustColumnWidth = True
    .RefreshPeriod = 0
    .TextFilePromptOnRefresh = False
    .TextFilePlatform = 437
    .TextFileStartRow = 1
    .TextFileParseType = xlDelimited
    .TextFileTextQualifier = xlTextQualifierDoubleQuote
    .TextFileConsecutiveDelimiter = True
    .TextFileTabDelimiter = False
    .TextFileSemicolonDelimiter = False
    .TextFileCommaDelimiter = False
    .TextFileSpaceDelimiter = True
    .TextFileColumnDataTypes = Array(1)
    .TextFileTrailingMinusNumbers = True
    .Refresh BackgroundQuery:=False
End With

Context

StackExchange Code Review Q#140820, answer score: 2

Revisions (0)

No revisions yet.