HiveBrain v1.2.0
Get Started
← Back to all entries
patternshellMinor

Powershell script for manipulating Excel files

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
scriptexcelmanipulatingpowershellfilesfor

Problem

I have a directory of .xls workbooks with the following naming convention:

001.WIP Monthly Report
002.WIP Joes Custom Report
...
129.PUR Supplier Spend


The number of worksheets in each workbook varies but each worksheet is formatted the same way.

Row 5 holds column headers and rows 6 and beyond hold data:

A B C D
4| | | | ...
5| Org | Project | Task | ...
6| 023 | XYZ | 01304 | ...
7| 010 | ABC | 26453 | ...
8| ... | ... | ... | ...


My goal was to write a script that loops through every workbook in the directory, then loops through their respective worksheets and documents the column headers that they contain.

A sample output of this script would be something like:

A B C D E F
1| | | Org | Project | Task | ...
2| 001 | Sheet1 | X | | X | ...
3| 001 | Sheet2 | X | X | | ...
4| 002 | Sheet1 | | | X | ...
5| ... | ... | ... | ... | ... | ...
6| 129 | Sheet8 | X | X | X | ..


where column A is the first 3 digits of the workbook name, B is the worksheet name, and C and beyond contain the column headers and an X documenting whether or not that worksheet contains that column header. Also, if a worksheet started with "SQL" or "---" I wanted it to be ignored.

Here is the script:

`#a function I found online for practicing good hygiene
function Release-Ref ($ref) {
([System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$ref) -gt 0)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
}
# -----------------------------------------------------

#open up a new instance of excel
$xl = new-object -comobject excel.application
$xl.Visible = $True
$xl.DisplayAlerts = $False

#open up a blank workbook that already exists on the desktop
#I refer to this as the $master workbook
$master = $xl.Workbooks.Open("c:\Users\me\desktop\master.xlsx")
$mws = $m

Solution

I know this question is a bit old, but I ran into it while looking for other Excel/PowerShell help and had recently found one potential fix. A two line change may actually speed up the entire process significantly.

Change from:

#loop through the worksheets in the current workbook
for ($i = 1; $i -le $wb.Sheets.count; $i++)


To:

#loop through the worksheets in the current workbook
$sheetCount = $wb.Sheets.Count
for ($i = 1; $i -le $sheetCount; $i++)


The reason lies in how the COM object works internally, which I can't fully explain myself. This StackOverflow question briefly reviews why not to use 2 dots when referencing COM objects. Second, this blog post identifies this specific change as improving performance from 14 minutes to 10s. I haven't validated myself, but I would expect some improvement. Hope that helps... even if it is over a year late.

Code Snippets

#loop through the worksheets in the current workbook
for ($i = 1; $i -le $wb.Sheets.count; $i++)
#loop through the worksheets in the current workbook
$sheetCount = $wb.Sheets.Count
for ($i = 1; $i -le $sheetCount; $i++)

Context

StackExchange Code Review Q#9209, answer score: 8

Revisions (0)

No revisions yet.