HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Searching a Word Document from Excel

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
searchingexcelworddocumentfrom

Problem

Today I created a script in Excel that migrates data from a specific Word document and copies a portion of it to a cell in Excel (the date to be specific).

The file input is


Last name, First Name, Grade - (Rank) (Completed On YYYY-MM-DD HH:mm:ss)

and each person is on a new "paragraph" (aka line) in Word.

Sub SearchTextFile()
Dim i As Integer
Dim currentCellNumberLname As String
Dim currentCellNumberFname As String
Dim currentCellData As String
Dim firstPlusLastName As String
Dim filePath As String
filePath = "C:\Users\\Desktop\CyberAwareness.docx"

For i = 2 To 4
    currentCellNumberLname = "C" & i
    currentCellNumberFname = "D" & i
    currentCellToAdd = "L" & i
    ActiveSheet.Range(currentCellToAdd).Activate
    ActiveWindow.ScrollRow = ActiveCell.Row
    firstPlusLastName = Range(currentCellNumberLname).Value & ", " & Range(currentCellNumberFname).Value
    Range(currentCellToAdd) = SearchWordDoc(filePath, firstPlusLastName)
Next
End Sub


'Searches word file for name, finds the associated paragraph, and returns the date'
Function SearchWordDoc(strPath, strName)
Set objword = CreateObject("word.application")
Set a = objword.documents.Open(strPath)

For i = 1 To a.Paragraphs.Count
    If InStr(a.Paragraphs(i).Range.Text, strName) <> 0 Then
        SearchWordDoc = Left(Right(a.Paragraphs(i).Range.Text, 22), 11)
    End If
Next i
a.Close
objword.Quit
'    objword.Visible = False'
Set objword = Nothing
End Function


I have about 300 personnel that need information entered into this document and about 8 different columns I need to populate. The script works surprisingly well, however, it is pretty slow and takes ~10 minutes for one column. I'm fine with it since it gets the job done, but I'm curious if there is any way to streamline it further and possibly add error checking (for name mismatches).

Solution

General Notes

First - In addition to replacing the magic numbers, I'd put your file path into a constant at the top of the module for easier maintenance:

Private Const WORD_DOCUMENT_LOCATION As String = "C:\Users\\Desktop\CyberAwareness.docx"


Second - Get a reference to the Worksheet object at the start of the function instead of relying on calls to ActiveSheet. Nothing will make the wheels come off of a long function faster than the user doing something that changes the active Workbook or Worksheet in the middle of your execution.

Dim sheet As Worksheet
Set sheet = ActiveSheet


Third - Using Range addressing C2 or L3 is not only harder to read, it is less efficient as well. Not only do you have to concatenate strings in order to build the addresses, Excel just has to convert them back into numeric indexes. Unless you have a really good reason not to do so, using .Cell(row, column) is almost always better and is much easier to use in loops.

'Instead of this...
sheet.Range("A" & i).Value = foobar
'...do this:
sheet.Cells(i, 1).Value = foobar


Fourth - Get into the habit of declaring the scope of your Subs and Functions to make sure that you are only exposing the ones that you intend to. If you leave these off, they will default to public and start showing up in your function list in the Workbook and be available for auto-complete. If it isn't something you would want to show up in a cell, i.e. =SearchWordDoc("this raises","an error"), declare it as Private.

Finally - Declare your function return types. It isn't clear what the code below returns, and the comment makes it worse - it doesn't return a Date at all, it returns a String.

'Searches word file for name, finds the associated paragraph, and returns the date'
Function SearchWordDoc(strPath, strName)


Taken in conjunction with the item above, the function declaration should really be something like this:

Private Function SearchWordDoc(filepath, name) As String


Performance

As @RubberDuck noted, the biggest performance boost you're going to get is from not repeatedly opening and closing the Word document. Most of the other suggestions are spot on, so I'll limit this answer to the performance aspect of the code.

Based on your existing code, it is apparent that there is only one line per person in your Word document and all of the lines have (or should have) the same structure. What you are essentially doing is performing a lookup on a Word file based on a key. VBA has an object that is designed to perform key lookups that are orders of magnitude faster than Word's .Find method - Scripting.Dictionary. All you need to do is parse the document first as text to build your object. For example (with a reference to Microsoft Scripting Runtime):

Private Function LoadWordRecords(filepath As String) As Scripting.Dictionary

    Dim host As New Word.Application
    Dim doc As Word.Document
    Dim lines() As String

    Set doc = host.Documents.Open(filepath)
    'Take the whole document in one shot, and read to an array of paragraphs...
    lines = Split(doc.Content.text, vbCr)
    '...and dispense with Word.
    doc.Close
    host.Quit

    'Then do whatever you need to do to parse the text into a useful structure:
    Dim output As New Scripting.Dictionary
    Dim items() As String
    Dim i As Integer

    'Parse each line in the Word document to extract a key for the line.
    For i = 0 To UBound(lines)
        'Add whatever you need to validate the paragraph you're parsing.
        'Test the input line to see if it contains a comma.
        If (InStr(1, lines(i), ",") > 0) Then
            'Split into an array from the comma delimited string.
            items = Split(lines(i), ",")
            'Add the resulting item to the Dictionary with the key defined
            'as the first 2 elements and the value as the remainder of the line.
            Call output.Add(Trim$(items(0)) & ", " & Trim$(items(1)), items(2))
        End If
        'Or whatever is convenient.
    Next i

    Set LoadWordRecords = output

End Function


Then, all you have to do in the calling code is grab the Dictionary from your parsing function and grab the lines as needed based on the keys:

```
Public Sub SearchTextFile()

Dim sheet As Worksheet
Set sheet = ActiveSheet

Dim i As Integer
Dim currentCellNumberLname As String
Dim currentCellNumberFname As String
Dim currentCellData As String
Dim firstPlusLastName As String

'Load your lookup object:
Dim entries As Scripting.Dictionary
Set entries = LoadWordRecords(WORD_DOCUMENT_LOCATION)

For i = 2 To 4
currentCellNumberLname = "C" & i
currentCellNumberFname = "D" & i
currentCellToAdd = "L" & i
ActiveSheet.Range(currentCellToAdd).Activate
ActiveWindow.ScrollRow = ActiveCell.row
firstPlusLastName = sheet.Range(currentCellNumberLname).Value & ", " & sheet.Range(currentCel

Code Snippets

Private Const WORD_DOCUMENT_LOCATION As String = "C:\Users\<user>\Desktop\CyberAwareness.docx"
Dim sheet As Worksheet
Set sheet = ActiveSheet
'Instead of this...
sheet.Range("A" & i).Value = foobar
'...do this:
sheet.Cells(i, 1).Value = foobar
'Searches word file for name, finds the associated paragraph, and returns the date'
Function SearchWordDoc(strPath, strName)
Private Function SearchWordDoc(filepath, name) As String

Context

StackExchange Code Review Q#62668, answer score: 6

Revisions (0)

No revisions yet.