HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Reading contents of a thousand CSV files

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
readingcsvcontentsthousandfiles

Problem

I'm fairly new to VBA and just completed my first script. It's completing the task I want it to do just fine, but it's really slow in doing so. It has to open and read about 1000 CSV files for each CSV filter for specific rows and copy those into tabs in a newly created document. It then has to save and close that new document and open the next CSV.

Are there parts in the script that are not "good-practice" in the sense that they severely slow down execution time?

It takes about 3 seconds per loop iteration, so 50 minutes for all 1,000 files. The machine has also crashed halfway through, although I'm not 100% sure that's because of the script.

```
Sub createLists()

Application.DisplayAlerts = False
Application.ScreenUpdating = False

Dim startDate As Date
Dim endDate As Date
Dim dateLooper As Date
Dim currDate As String

'Set date range for existing files
startDate = #1/1/2012#
endDate = #9/12/2014#

'Array, which contains names for new worksheets as strings
Dim tsN(1 To 12) As String
tsN(1) = "AA11"
tsN(2) = "AA22"
tsN(3) = "AA33"
tsN(4) = "AA44"
tsN(5) = "AA55"
tsN(6) = "AA66"
tsN(7) = "BB11"
tsN(8) = "BB22"
tsN(9) = "BB33"
tsN(10) = "BB44"
tsN(11) = "BB55"
tsN(12) = "BB66"

Dim w1 As Workbook
Dim w2 As Workbook
Dim ws1 As Worksheet
Dim ws2 As Worksheet

Dim localPath As String
localPath = ThisWorkbook.path

'Check for folder "Lists", create if non-existend
Dim fso, folderN
Set fso = CreateObject("Scripting.FileSystemObject")
folderN = localPath & "\Lists\"
If fso.FolderExists(folderN) = False Then MkDir folderN

For dateLooper = startDate To endDate

currDate = Format(dateLooper, "yyyy-mm-dd")

'Open / create workbook objects
Set w1 = Workbooks.Open(Filename:=localPath & "\roh\daten" & currDate & ".CSV", Local:=True)
Set w2 = Workbooks.Add()

'Array, which contains worksheet-objects

Solution

First of all use early binding over late binding to slightly improve the performance of code and to take advantage of intelli-sense.

In VBE select Tools and then References. Scroll down to Microsoft Scripting Runtime and tick the checkbox to attach references to your project. That allows you to change

Dim fso, folderN
Set fso = CreateObject("Scripting.FileSystemObject")


to

Dim fso As FileSystemObject, folderN As String
Set fso = New FileSystemObject


Generally, you want to explicitly dim your variables and avoid using Variant type as it slows down the code by a lot believe or not! Bascially, don't rely on the runtime to workout the correct type for a variable as it takes its time to do so.

There is an .Activate in the loop that creates the worksheets. It's redundant there and just adds overhead -> remove it/comment it out and modify the assignment based on the ActiveSheet to

'Create new file and add/name new worksheets, set references to array ts 
For i = 1 To 12
    With w2.Sheets.Add()
        .Name = tsN(13 - i)
    End With
    If i = 1 Then w2.Worksheets(2).Delete
    Set ts(13 - i) = w2.Sheets(tsN(13 - i))
Next i


The rest seems fine to me due to your current approach for reading the data. Your code seems rather clean, well indented and you have good comments in the right places.

Performance tip:

If performance is crucial to you consider another solution involving for example ADODB library and not opening the .CSV files just to get data out of them. ADODB allows you read the contents of a file into a Recordset object without actually opening the file (think of speed!). Then, you can use a very simple method of the Range class to copy the contents into the spreadsheet - Range.CopyFromRecordset.

You can see examples of how to use ADODB on my blog at http://vba4all.com - ADODB section is at the bottom of the article

Code Snippets

Dim fso, folderN
Set fso = CreateObject("Scripting.FileSystemObject")
Dim fso As FileSystemObject, folderN As String
Set fso = New FileSystemObject
'Create new file and add/name new worksheets, set references to array ts 
For i = 1 To 12
    With w2.Sheets.Add()
        .Name = tsN(13 - i)
    End With
    If i = 1 Then w2.Worksheets(2).Delete
    Set ts(13 - i) = w2.Sheets(tsN(13 - i))
Next i

Context

StackExchange Code Review Q#72141, answer score: 6

Revisions (0)

No revisions yet.