patternMinor
Searching a Word Document from Excel
Viewed 0 times
searchingexcelworddocumentfrom
Problem
Today I created a script in Excel that migrates data from a specific Word document and copies a portion of it to a cell in Excel (the date to be specific).
The file input is
Last name, First Name, Grade - (Rank) (Completed On YYYY-MM-DD HH:mm:ss)
and each person is on a new "paragraph" (aka line) in Word.
I have about 300 personnel that need information entered into this document and about 8 different columns I need to populate. The script works surprisingly well, however, it is pretty slow and takes ~10 minutes for one column. I'm fine with it since it gets the job done, but I'm curious if there is any way to streamline it further and possibly add error checking (for name mismatches).
The file input is
Last name, First Name, Grade - (Rank) (Completed On YYYY-MM-DD HH:mm:ss)
and each person is on a new "paragraph" (aka line) in Word.
Sub SearchTextFile()
Dim i As Integer
Dim currentCellNumberLname As String
Dim currentCellNumberFname As String
Dim currentCellData As String
Dim firstPlusLastName As String
Dim filePath As String
filePath = "C:\Users\\Desktop\CyberAwareness.docx"
For i = 2 To 4
currentCellNumberLname = "C" & i
currentCellNumberFname = "D" & i
currentCellToAdd = "L" & i
ActiveSheet.Range(currentCellToAdd).Activate
ActiveWindow.ScrollRow = ActiveCell.Row
firstPlusLastName = Range(currentCellNumberLname).Value & ", " & Range(currentCellNumberFname).Value
Range(currentCellToAdd) = SearchWordDoc(filePath, firstPlusLastName)
Next
End Sub'Searches word file for name, finds the associated paragraph, and returns the date'
Function SearchWordDoc(strPath, strName)
Set objword = CreateObject("word.application")
Set a = objword.documents.Open(strPath)
For i = 1 To a.Paragraphs.Count
If InStr(a.Paragraphs(i).Range.Text, strName) <> 0 Then
SearchWordDoc = Left(Right(a.Paragraphs(i).Range.Text, 22), 11)
End If
Next i
a.Close
objword.Quit
' objword.Visible = False'
Set objword = Nothing
End FunctionI have about 300 personnel that need information entered into this document and about 8 different columns I need to populate. The script works surprisingly well, however, it is pretty slow and takes ~10 minutes for one column. I'm fine with it since it gets the job done, but I'm curious if there is any way to streamline it further and possibly add error checking (for name mismatches).
Solution
General Notes
First - In addition to replacing the magic numbers, I'd put your file path into a constant at the top of the module for easier maintenance:
Second - Get a reference to the Worksheet object at the start of the function instead of relying on calls to
Third - Using Range addressing
Fourth - Get into the habit of declaring the scope of your Subs and Functions to make sure that you are only exposing the ones that you intend to. If you leave these off, they will default to public and start showing up in your function list in the Workbook and be available for auto-complete. If it isn't something you would want to show up in a cell, i.e.
Finally - Declare your function return types. It isn't clear what the code below returns, and the comment makes it worse - it doesn't return a
Taken in conjunction with the item above, the function declaration should really be something like this:
Performance
As @RubberDuck noted, the biggest performance boost you're going to get is from not repeatedly opening and closing the Word document. Most of the other suggestions are spot on, so I'll limit this answer to the performance aspect of the code.
Based on your existing code, it is apparent that there is only one line per person in your Word document and all of the lines have (or should have) the same structure. What you are essentially doing is performing a lookup on a Word file based on a key. VBA has an object that is designed to perform key lookups that are orders of magnitude faster than Word's
Then, all you have to do in the calling code is grab the Dictionary from your parsing function and grab the lines as needed based on the keys:
```
Public Sub SearchTextFile()
Dim sheet As Worksheet
Set sheet = ActiveSheet
Dim i As Integer
Dim currentCellNumberLname As String
Dim currentCellNumberFname As String
Dim currentCellData As String
Dim firstPlusLastName As String
'Load your lookup object:
Dim entries As Scripting.Dictionary
Set entries = LoadWordRecords(WORD_DOCUMENT_LOCATION)
For i = 2 To 4
currentCellNumberLname = "C" & i
currentCellNumberFname = "D" & i
currentCellToAdd = "L" & i
ActiveSheet.Range(currentCellToAdd).Activate
ActiveWindow.ScrollRow = ActiveCell.row
firstPlusLastName = sheet.Range(currentCellNumberLname).Value & ", " & sheet.Range(currentCel
First - In addition to replacing the magic numbers, I'd put your file path into a constant at the top of the module for easier maintenance:
Private Const WORD_DOCUMENT_LOCATION As String = "C:\Users\\Desktop\CyberAwareness.docx"Second - Get a reference to the Worksheet object at the start of the function instead of relying on calls to
ActiveSheet. Nothing will make the wheels come off of a long function faster than the user doing something that changes the active Workbook or Worksheet in the middle of your execution.Dim sheet As Worksheet
Set sheet = ActiveSheetThird - Using Range addressing
C2 or L3 is not only harder to read, it is less efficient as well. Not only do you have to concatenate strings in order to build the addresses, Excel just has to convert them back into numeric indexes. Unless you have a really good reason not to do so, using .Cell(row, column) is almost always better and is much easier to use in loops.'Instead of this...
sheet.Range("A" & i).Value = foobar
'...do this:
sheet.Cells(i, 1).Value = foobarFourth - Get into the habit of declaring the scope of your Subs and Functions to make sure that you are only exposing the ones that you intend to. If you leave these off, they will default to public and start showing up in your function list in the Workbook and be available for auto-complete. If it isn't something you would want to show up in a cell, i.e.
=SearchWordDoc("this raises","an error"), declare it as Private.Finally - Declare your function return types. It isn't clear what the code below returns, and the comment makes it worse - it doesn't return a
Date at all, it returns a String.'Searches word file for name, finds the associated paragraph, and returns the date'
Function SearchWordDoc(strPath, strName)Taken in conjunction with the item above, the function declaration should really be something like this:
Private Function SearchWordDoc(filepath, name) As StringPerformance
As @RubberDuck noted, the biggest performance boost you're going to get is from not repeatedly opening and closing the Word document. Most of the other suggestions are spot on, so I'll limit this answer to the performance aspect of the code.
Based on your existing code, it is apparent that there is only one line per person in your Word document and all of the lines have (or should have) the same structure. What you are essentially doing is performing a lookup on a Word file based on a key. VBA has an object that is designed to perform key lookups that are orders of magnitude faster than Word's
.Find method - Scripting.Dictionary. All you need to do is parse the document first as text to build your object. For example (with a reference to Microsoft Scripting Runtime):Private Function LoadWordRecords(filepath As String) As Scripting.Dictionary
Dim host As New Word.Application
Dim doc As Word.Document
Dim lines() As String
Set doc = host.Documents.Open(filepath)
'Take the whole document in one shot, and read to an array of paragraphs...
lines = Split(doc.Content.text, vbCr)
'...and dispense with Word.
doc.Close
host.Quit
'Then do whatever you need to do to parse the text into a useful structure:
Dim output As New Scripting.Dictionary
Dim items() As String
Dim i As Integer
'Parse each line in the Word document to extract a key for the line.
For i = 0 To UBound(lines)
'Add whatever you need to validate the paragraph you're parsing.
'Test the input line to see if it contains a comma.
If (InStr(1, lines(i), ",") > 0) Then
'Split into an array from the comma delimited string.
items = Split(lines(i), ",")
'Add the resulting item to the Dictionary with the key defined
'as the first 2 elements and the value as the remainder of the line.
Call output.Add(Trim$(items(0)) & ", " & Trim$(items(1)), items(2))
End If
'Or whatever is convenient.
Next i
Set LoadWordRecords = output
End FunctionThen, all you have to do in the calling code is grab the Dictionary from your parsing function and grab the lines as needed based on the keys:
```
Public Sub SearchTextFile()
Dim sheet As Worksheet
Set sheet = ActiveSheet
Dim i As Integer
Dim currentCellNumberLname As String
Dim currentCellNumberFname As String
Dim currentCellData As String
Dim firstPlusLastName As String
'Load your lookup object:
Dim entries As Scripting.Dictionary
Set entries = LoadWordRecords(WORD_DOCUMENT_LOCATION)
For i = 2 To 4
currentCellNumberLname = "C" & i
currentCellNumberFname = "D" & i
currentCellToAdd = "L" & i
ActiveSheet.Range(currentCellToAdd).Activate
ActiveWindow.ScrollRow = ActiveCell.row
firstPlusLastName = sheet.Range(currentCellNumberLname).Value & ", " & sheet.Range(currentCel
Code Snippets
Private Const WORD_DOCUMENT_LOCATION As String = "C:\Users\<user>\Desktop\CyberAwareness.docx"Dim sheet As Worksheet
Set sheet = ActiveSheet'Instead of this...
sheet.Range("A" & i).Value = foobar
'...do this:
sheet.Cells(i, 1).Value = foobar'Searches word file for name, finds the associated paragraph, and returns the date'
Function SearchWordDoc(strPath, strName)Private Function SearchWordDoc(filepath, name) As StringContext
StackExchange Code Review Q#62668, answer score: 6
Revisions (0)
No revisions yet.