patternMinor
Comparing two lists under 6 columns with 32000 rows in Excel
Viewed 0 times
rowscomparingcolumnsexcelwithtwounder32000lists
Problem
I need to import 2 .s19 files to Excel and then compare across 6 Columns and store the output to a txt file. I have my data from column A to F. My current program doesn't store the data using a dictionary or hashtable. It takes too long to complete as there are around 31k lines in the list 1 and 32k lines in list 2. I would like to use a dictionary to store the data but I am unsure how to do it.
My data is in this format:
1) If the first line (A2) or (D2) starts with "S011" or "S804" skip and read next line.
2) If the address in Column B matches the address in Column E, check if the adjacent data linked to it is also a match ie. if B10 = E10, check if C10 = F10. If there is a mismatch in data output to txt file in this format:
3) If column B address is not found in Column E ,
4) If column E address is not found in Column B , check if the string of data in column F are all 'F's excluding the last 2 characters.
if All F's then output OK
if Not All F's then output NOK
My current code:
```
Sub Compare()
Worksheets(1).Select
Dim orig_folderpath As String, orig_file As String, orig_filename As String
Dim dump_folderpath As String
Dim OutputPath As String, OutputFilename As String
Dim VarEntry As String
Dim count As Integer
Dim mainWB As Workbook
Dim Result As String
Dim startadd As String, endadd As String, Dec_startadd As String, lgth As Integer
Set mainWB = ActiveWorkbook
dump_folderpath = Range("C2").Value
Application.DisplayAlerts = False
For i = mainWB.Sheets.count To 3 Step -1
Sheets(i).Delete
Next
My data is in this format:
Source Data Dump Data
A B C D E F
[1] file format(4char) address(6char) data(66char) file format(4char) address(6char) data(66char)1) If the first line (A2) or (D2) starts with "S011" or "S804" skip and read next line.
2) If the address in Column B matches the address in Column E, check if the adjacent data linked to it is also a match ie. if B10 = E10, check if C10 = F10. If there is a mismatch in data output to txt file in this format:
Source Data
Address
A10+B10+C10
Dump Data
Address
D10+E10+F10
Result: NOK3) If column B address is not found in Column E ,
Output :
Source Data
Address
A10+B10+C10
Result: NOK4) If column E address is not found in Column B , check if the string of data in column F are all 'F's excluding the last 2 characters.
if All F's then output OK
if Not All F's then output NOK
Output :
Dump Data
Address
D10+E10+F10
Result: NOKMy current code:
```
Sub Compare()
Worksheets(1).Select
Dim orig_folderpath As String, orig_file As String, orig_filename As String
Dim dump_folderpath As String
Dim OutputPath As String, OutputFilename As String
Dim VarEntry As String
Dim count As Integer
Dim mainWB As Workbook
Dim Result As String
Dim startadd As String, endadd As String, Dec_startadd As String, lgth As Integer
Set mainWB = ActiveWorkbook
dump_folderpath = Range("C2").Value
Application.DisplayAlerts = False
For i = mainWB.Sheets.count To 3 Step -1
Sheets(i).Delete
Next
Solution
Breaking your Code in Subs and Functions: as a first step in trying to optimize, I would recommend splitting this big sub into smaller ones. For instance, this piece of code could be a sub on its own, with a clear purpose and responsibility:
You already split your code with comments and blocks of variable declarations. That should give you a head start on where to extract some code to a new method. This will also help you spotting common pieces of code that you can reuse (which has a lot of benefits). For instance, at the end of the
Precedeed with the
If you apply what I tell you above, you could easily refactor all those into just one loop (you probably don't need to have 11 variables, because on each loop iteration you are doing everything you need to do to the
Variable Names: also, give meaningful names to your variables.
Beware of Macro Recording Generated Code: as it tends to be too verbose, and to include things you don't really require. This is just a hunch, and I might be wrong, but code that looks like this, in several places of your sub, makes me think that you recorded some steps of the process (which is OK), but didn't clean it afterwards. In a performance-sensitive application, this is not good, as it can lead to unnecessary steps being executed over and over:
So I would suggest you take a look at that code and start removing lines that seem unnecessary, testing your application every step of the way, until you get an streamlined version of the code. (This also applies to code taken from other sources and used in your own applications).
Cache your Heavily-Used Referenced Values: some cells contain values that you need to use in several places, or in loops. For instance
Cherry on the Cake: finally, here is some code I posted in another thread which you can add to optimize almost any Excel VBA code that is taking too long.
After you apply these changes, then you can see if you still require to use a dictionary, array or any other construct to improve performance even better. The good thing is that at that point, your code will be in a more proper state to spot and apply required optimizations.
Sub DeleteSheets()
Application.DisplayAlerts = False
For i = mainWB.Sheets.count To 3 Step -1
Sheets(i).Delete
Next
Application.DisplayAlerts = True
End SubYou already split your code with comments and blocks of variable declarations. That should give you a head start on where to extract some code to a new method. This will also help you spotting common pieces of code that you can reuse (which has a lot of benefits). For instance, at the end of the
sub you have several constructions that look like this:For l = 1 To 16
line1 = line1 & Right(Cells(tcell.Row + 2, tcell.Column + l), 2)
NextPrecedeed with the
linex variable declarations and followed by:line1 = Replace(line1, "X", "?")
....
sMatch1 = data1 Like line1
....
Debug.Print data1 & " " & line1If you apply what I tell you above, you could easily refactor all those into just one loop (you probably don't need to have 11 variables, because on each loop iteration you are doing everything you need to do to the
line variable), and then your code would be easier to read. In the process of doing this, you will also force yourself to understand better the patterns in your code, and then will be able to optimize them more easily. Variable Names: also, give meaningful names to your variables.
lr, si, sf, etc. are not meaningful on their own, and then you need to use comments to explain what each of these variables are. This makes reading your code more difficult for the person reviewing it (someone like me, or even yourself in the future as well), and requires additional elements (in this case, comments), which would be unnecessary if you used proper variable names (that when read, are self-explanatory). Beware of Macro Recording Generated Code: as it tends to be too verbose, and to include things you don't really require. This is just a hunch, and I might be wrong, but code that looks like this, in several places of your sub, makes me think that you recorded some steps of the process (which is OK), but didn't clean it afterwards. In a performance-sensitive application, this is not good, as it can lead to unnecessary steps being executed over and over:
With ActiveSheet.QueryTables.Add(Connection:= _
"TEXT;" & dump_folderpath & "\" & dump_file_(k), Destination:=Range("$D$3"))
.Name = dump_file_(k)
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = 437
.TextFileStartRow = 1
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = True
.TextFileTabDelimiter = False
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = False
.TextFileSpaceDelimiter = True
.TextFileColumnDataTypes = Array(1)
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
End WithSo I would suggest you take a look at that code and start removing lines that seem unnecessary, testing your application every step of the way, until you get an streamlined version of the code. (This also applies to code taken from other sources and used in your own applications).
Cache your Heavily-Used Referenced Values: some cells contain values that you need to use in several places, or in loops. For instance
targetcell.Row, targetcell.Column, tcell.Row and tcell.Column are heavily used and make your program calculate those reference over and over. Instead of this, create a targetCellRow variable, assign targetcell.Row to it, and use targetCellRow whenever you need to reference the target cell row. Notice that some of these values are used in multiple loops! Again, when performance counts, this things start to pile up. Cherry on the Cake: finally, here is some code I posted in another thread which you can add to optimize almost any Excel VBA code that is taking too long.
After you apply these changes, then you can see if you still require to use a dictionary, array or any other construct to improve performance even better. The good thing is that at that point, your code will be in a more proper state to spot and apply required optimizations.
Code Snippets
Sub DeleteSheets()
Application.DisplayAlerts = False
For i = mainWB.Sheets.count To 3 Step -1
Sheets(i).Delete
Next
Application.DisplayAlerts = True
End SubFor l = 1 To 16
line1 = line1 & Right(Cells(tcell.Row + 2, tcell.Column + l), 2)
Nextline1 = Replace(line1, "X", "?")
....
sMatch1 = data1 Like line1
....
Debug.Print data1 & " " & line1With ActiveSheet.QueryTables.Add(Connection:= _
"TEXT;" & dump_folderpath & "\" & dump_file_(k), Destination:=Range("$D$3"))
.Name = dump_file_(k)
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = 437
.TextFileStartRow = 1
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = True
.TextFileTabDelimiter = False
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = False
.TextFileSpaceDelimiter = True
.TextFileColumnDataTypes = Array(1)
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
End WithContext
StackExchange Code Review Q#140820, answer score: 2
Revisions (0)
No revisions yet.