patternMinor
Extracting data from CSV file on the internet
Viewed 0 times
filethecsvinternetextractingfromdata
Problem
This is my code I made that extracts data from a .csv file stored on the web. My sheet called 'New Data' stores this data for 2 days. I store 2 days worth of data so that when I run it next time, it pulls the data from the web, compares to the data I pulled yesterday, and then places any of the data that doesn't match (wasn't there yesterday but is now there today) into a new column.
I essentially only actually need the new data, but am stuck storing 2 day's worth of about 4000 lines of data each. My file is extremely large and takes way too long to run (this has alot to do with my poor vba code). If someone could please help me cleanup my code (i.e. remove any unneccessary lines I know I have in there) I would be very greatful!
```
Option Explicit
Sub FindNewIssues()
'for efficiency
Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
'first checks to see if the data is already up-to-date
'continues with further commands only if data NOT up-to-date
If Worksheets("New Data").Range("A10") <> Worksheets("New Data").Range("B2") Then
'copy the old content to a new location
'this will allow us to compare the data from last night (new data) to two days ago (old data)
Sheets("New Data").Range("A9:I10000").Cut Destination:=Sheets("New Data").Range("K9")
'then import the new data from external database
With Worksheets("New Data").QueryTables.Add(Connection:= _
"TEXT;https://(the URL that I can't show).csv", Destination:=Range( _
"$A$9"))
.Name = "Closes"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = True
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = 437
.TextFileStartRow = 1
.TextFilePar
I essentially only actually need the new data, but am stuck storing 2 day's worth of about 4000 lines of data each. My file is extremely large and takes way too long to run (this has alot to do with my poor vba code). If someone could please help me cleanup my code (i.e. remove any unneccessary lines I know I have in there) I would be very greatful!
```
Option Explicit
Sub FindNewIssues()
'for efficiency
Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
'first checks to see if the data is already up-to-date
'continues with further commands only if data NOT up-to-date
If Worksheets("New Data").Range("A10") <> Worksheets("New Data").Range("B2") Then
'copy the old content to a new location
'this will allow us to compare the data from last night (new data) to two days ago (old data)
Sheets("New Data").Range("A9:I10000").Cut Destination:=Sheets("New Data").Range("K9")
'then import the new data from external database
With Worksheets("New Data").QueryTables.Add(Connection:= _
"TEXT;https://(the URL that I can't show).csv", Destination:=Range( _
"$A$9"))
.Name = "Closes"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = True
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = 437
.TextFileStartRow = 1
.TextFilePar
Solution
I actually had to go and check to see how this line would behave. It does in fact work, but it's terribly confusing perhaps more so for a seasoned VB6 dev.
It looks like it's comparing
Which brings me to a micro-optimization. Instead of accessing the
It won't buy you much in the way of runtime, but it does make the code cleaner and easier to maintain.
Are you absolutely 100% positive that the data will never exceed 10k rows? I wouldn't bet my paycheck on it. Find the last used row instead.
That is an extremely odd way to break lines. I would expect it to look like this.
Or this. Pick your poison.
I like the second option because everything lines up nice without having to fuss about with 5 or 6 tabs.
This looks like it came directly from the macro recorder.
You can either spend some time figuring out how many of these are default values, and remove them, or extract the whole block into a procedure, but these are details we don't care about at the level of abstraction
You populate the data from the web query, calculate, clear the range of cells that have the VLoopkup, then put the same formulas back into the range, and then calculate again. This is a pretty expensive operation and I'm not convinced that you need to do anything more than
This is bad. You probably didn't notice it because the indentation was off, and this whole method is too long, but you're not always setting Calculation back to automatic and ScreenUpdating back on. Just moving it back outside of the
If Worksheets("New Data").Range("A10") <> Worksheets("New Data").Range("B2") ThenIt looks like it's comparing
Range objects, but what it's really doing is comparing their values. I assume this is because Value is Range's default property the language is performing some "nice" cast for us under the hood. It's best to be explicit here so that no future maintainer wastes their time being confused by this.If Worksheets("New Data").Range("A10").Value <> Worksheets("New Data").Range("B2").Value ThenWhich brings me to a micro-optimization. Instead of accessing the
Worksheets collection repeatedly, store the "New Data" sheet in a variable and work with it instead.Dim newData As Worksheet
Set newData = newData
If newData.Range("A10").Value <> newData.Range("B2").Value ThenIt won't buy you much in the way of runtime, but it does make the code cleaner and easier to maintain.
newData.Range("A9:I10000").Cut Destination:=newData.Range("K9")Are you absolutely 100% positive that the data will never exceed 10k rows? I wouldn't bet my paycheck on it. Find the last used row instead.
With newData.QueryTables.Add(Connection:= _
"TEXT;https://(the URL that I can't show).csv", Destination:=Range( _
"$A$9"))That is an extremely odd way to break lines. I would expect it to look like this.
With newData.QueryTables.Add(Connection:= "TEXT;https://(the URL that I can't show).csv", _
Destination:=Range("$A$9"))Or this. Pick your poison.
With newData.QueryTables.Add( _
Connection:="TEXT;https://(the URL that I can't show).csv", _
Destination:=Range("$A$9"))I like the second option because everything lines up nice without having to fuss about with 5 or 6 tabs.
This looks like it came directly from the macro recorder.
With newData.QueryTables.Add( _
Connection:="TEXT;https://(the URL that I can't show).csv", _
Destination:=Range("$A$9"))
.Name = "Closes"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = True
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = 437
.TextFileStartRow = 1
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = False
.TextFileTabDelimiter = False
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = True
.TextFileSpaceDelimiter = False
.TextFileColumnDataTypes = Array(3, 2, 2, 2, 2, 1, 1, 1, 1)
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
End WithYou can either spend some time figuring out how many of these are default values, and remove them, or extract the whole block into a procedure, but these are details we don't care about at the level of abstraction
FindNewIssues is at.Application.Calculate
'next need to check to see if there are any new issues
'this is done by looking for errors (non-matches) from a vlookup between the new and old data
Range("T10:W10000").ClearContents
Range("T10").Formula = "=IF(IF(ISERROR(VLOOKUP('New Data'!E10,'New Data'!$O$10:$S$10000,1,FALSE))=TRUE,'New Data'!E10,"""")=0,"""",IF(ISERROR(VLOOKUP('New Data'!E10,'New Data'!$O$10:$S$10000,1,FALSE))=TRUE,'New Data'!E10,""""))"
Range("U10").Formula = "=IF(IF(ISERROR(VLOOKUP('New Data'!B10,'New Data'!$L$10:$S$10000,1,FALSE))=TRUE,'New Data'!B10,"""")=0,"""",IF(ISERROR(VLOOKUP('New Data'!B10,'New Data'!$L$10:$S$10000,1,FALSE))=TRUE,'New Data'!B10,""""))"
Range("T10:U10").AutoFill Destination:=Range("T10:U10000"), Type:=xlFillDefault
Application.CalculateYou populate the data from the web query, calculate, clear the range of cells that have the VLoopkup, then put the same formulas back into the range, and then calculate again. This is a pretty expensive operation and I'm not convinced that you need to do anything more than
Application.Calculate here. I could be wrong (wouldn't be the first time), but you should try seeing what happens if you just recalculate here. It could save a significant amount of time.End If
End If
Application.CutCopyMode = False
'need to send these new issues to their appropriate sheets
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
End IfThis is bad. You probably didn't notice it because the indentation was off, and this whole method is too long, but you're not always setting Calculation back to automatic and ScreenUpdating back on. Just moving it back outside of the
If block isn't good enough either. You need an errorCode Snippets
If Worksheets("New Data").Range("A10") <> Worksheets("New Data").Range("B2") ThenIf Worksheets("New Data").Range("A10").Value <> Worksheets("New Data").Range("B2").Value ThenDim newData As Worksheet
Set newData = newData
If newData.Range("A10").Value <> newData.Range("B2").Value ThennewData.Range("A9:I10000").Cut Destination:=newData.Range("K9")With newData.QueryTables.Add(Connection:= _
"TEXT;https://(the URL that I can't show).csv", Destination:=Range( _
"$A$9"))Context
StackExchange Code Review Q#94673, answer score: 8
Revisions (0)
No revisions yet.