HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Import multiple XML files into Excel

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
excelintoxmlfilesmultipleimport

Problem

I have adapted a VBA code to import all the XML files I have on a folder into Excel. I have tried to run the code with only two files and it takes about 30 seconds. Several minutes to import less than 20 files.

I need to import more than 200,000 files. Can you please help me on how to improve my code?

`Sub ListFiles()
'DECLARE AND SET VARIABLEs
Dim ShellApplication As Object
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Set ShellApplication = CreateObject("Shell.Application").BrowseForFolder(0, "Please choose a folder", 0, OpenAt)
Path = ShellApplication.self.Path
Set ShellApplication = Nothing
[a3] = "XML"
[b3] = "Files"
'DEFAULT PATH FROM HIDDEN SHEET
Call ListMyFiles(Path, True)
End Sub

Sub ListMyFiles(mySourcePath, IncludeSubfolders)
Set MyObject = New Scripting.FileSystemObject
Set mySource = MyObject.GetFolder(mySourcePath)
'With SearchXML
Application.ScreenUpdating = False

'--------------------------------------------------------------------
'FIND XML FILES ONLY, APPLY SEARCH CRIERIA, DISPLAY MATCHES ONLY
For Each myfile In mySource.Files
If Right(myfile.Name, 3) = "XML" Or Right(myfile.Name, 3) = "xml" Then 'IS XML?
LastRow = ActiveSheet.Cells(Rows.Count, 2).End(xlUp).Row

'-------------------------------------------------------------
'IMPORT XML FILE
Application.DisplayAlerts = False
ActiveWorkbook.XmlImport URL:=mySource & "\" & myfile.Name, _
ImportMap:=Nothing, Overwrite:=True, Destination:=Range("$B$" & LastRow + 1)
Cells(LastRow + 1, 1) = myfile.Name

'------------------------------------------------------------
'DELETE MAPS
maps = ActiveWorkbook.XmlMaps.Count
For i = 1 To maps
ActiveWorkbook.XmlMaps(1).Delete
Next i
End If
Next
If IncludeSubfolders Then 'SEARCH SUBFOLDERS FOR SAME CRITERIA
For Each MySubFolder In mySou

Solution

Not knowing what your XMl files are like or what you're trying to extract from them, this is mostly speculative:

Generally, you want to strip away as much unwanted data as early as possible in the process. Rather than trying to import an entire file to the worksheet every time, why not strip your document down to a NodeList, and then either target specific nodes (if you're only after certain data) or iterate over them and extract the data values into an array. Then you can print that Array to a worksheet.

I recently built a Tool to extract data from corporate accounts. It loads an XML doc (typically 100kb), parses the Node Tree, searches for specific nodes using XPath, extracts the displayed text of each and sticks it in an array that's printed to a worksheet every 500 lines or so. It grabs 5 sets of nodes from each document, and parses about 1 Million documents/hour, or nearly 300 documents / second.

Below is an extract showing the basic processing of loading, stripping, searching and retrieving.

Public Sub GetDirectorsFromFile(ByVal strFullFilename As String)

    Dim xDoc As MSXML2.DOMDocument
    Set xDoc = New MSXML2.DOMDocument

    Dim colDirectors As Collection
    Set colDirectors = New Collection

    With xDoc

        If .Load(strFullFilename) Then

            .setProperty "SelectionLanguage", "XPath"
            GetDirectorsFromXml xDoc, colDirectors

        End If

    End With

End Sub

Public Sub GetDirectorsFromXml(ByRef xDoc As MSXML2.DOMDocument, ByRef colDirectors As Collection)

    Dim ixItem As Long
    Dim xPathSearchString As String
    Dim nodes As MSXML2.IXMLDOMNodeList
    Dim nodeText As String

        xPathSearchString = "//*[contains(@name,""NameEntityOfficer"")]"
        Set nodes = xDoc.SelectNodes(xPathSearchString )

        For ixItem = 0 To nodes.Length - 1
            nodeText = nodes.Item(ixItem).text
            colDirectors.Add text
        Next ixItem

End Sub

Code Snippets

Public Sub GetDirectorsFromFile(ByVal strFullFilename As String)

    Dim xDoc As MSXML2.DOMDocument
    Set xDoc = New MSXML2.DOMDocument

    Dim colDirectors As Collection
    Set colDirectors = New Collection

    With xDoc

        If .Load(strFullFilename) Then

            .setProperty "SelectionLanguage", "XPath"
            GetDirectorsFromXml xDoc, colDirectors

        End If

    End With

End Sub

Public Sub GetDirectorsFromXml(ByRef xDoc As MSXML2.DOMDocument, ByRef colDirectors As Collection)

    Dim ixItem As Long
    Dim xPathSearchString As String
    Dim nodes As MSXML2.IXMLDOMNodeList
    Dim nodeText As String

        xPathSearchString = "//*[contains(@name,""NameEntityOfficer"")]"
        Set nodes = xDoc.SelectNodes(xPathSearchString )

        For ixItem = 0 To nodes.Length - 1
            nodeText = nodes.Item(ixItem).text
            colDirectors.Add text
        Next ixItem

End Sub

Context

StackExchange Code Review Q#112996, answer score: 5

Revisions (0)

No revisions yet.