patternshellMinor
Powershell script for manipulating Excel files
Viewed 0 times
scriptexcelmanipulatingpowershellfilesfor
Problem
I have a directory of .xls workbooks with the following naming convention:
The number of worksheets in each workbook varies but each worksheet is formatted the same way.
Row 5 holds column headers and rows 6 and beyond hold data:
My goal was to write a script that loops through every workbook in the directory, then loops through their respective worksheets and documents the column headers that they contain.
A sample output of this script would be something like:
where column A is the first 3 digits of the workbook name, B is the worksheet name, and C and beyond contain the column headers and an X documenting whether or not that worksheet contains that column header. Also, if a worksheet started with "SQL" or "---" I wanted it to be ignored.
Here is the script:
`#a function I found online for practicing good hygiene
function Release-Ref ($ref) {
([System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$ref) -gt 0)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
}
# -----------------------------------------------------
#open up a new instance of excel
$xl = new-object -comobject excel.application
$xl.Visible = $True
$xl.DisplayAlerts = $False
#open up a blank workbook that already exists on the desktop
#I refer to this as the $master workbook
$master = $xl.Workbooks.Open("c:\Users\me\desktop\master.xlsx")
$mws = $m
001.WIP Monthly Report
002.WIP Joes Custom Report
...
129.PUR Supplier Spend
The number of worksheets in each workbook varies but each worksheet is formatted the same way.
Row 5 holds column headers and rows 6 and beyond hold data:
A B C D
4| | | | ...
5| Org | Project | Task | ...
6| 023 | XYZ | 01304 | ...
7| 010 | ABC | 26453 | ...
8| ... | ... | ... | ...
My goal was to write a script that loops through every workbook in the directory, then loops through their respective worksheets and documents the column headers that they contain.
A sample output of this script would be something like:
A B C D E F
1| | | Org | Project | Task | ...
2| 001 | Sheet1 | X | | X | ...
3| 001 | Sheet2 | X | X | | ...
4| 002 | Sheet1 | | | X | ...
5| ... | ... | ... | ... | ... | ...
6| 129 | Sheet8 | X | X | X | ..
where column A is the first 3 digits of the workbook name, B is the worksheet name, and C and beyond contain the column headers and an X documenting whether or not that worksheet contains that column header. Also, if a worksheet started with "SQL" or "---" I wanted it to be ignored.
Here is the script:
`#a function I found online for practicing good hygiene
function Release-Ref ($ref) {
([System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$ref) -gt 0)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
}
# -----------------------------------------------------
#open up a new instance of excel
$xl = new-object -comobject excel.application
$xl.Visible = $True
$xl.DisplayAlerts = $False
#open up a blank workbook that already exists on the desktop
#I refer to this as the $master workbook
$master = $xl.Workbooks.Open("c:\Users\me\desktop\master.xlsx")
$mws = $m
Solution
I know this question is a bit old, but I ran into it while looking for other Excel/PowerShell help and had recently found one potential fix. A two line change may actually speed up the entire process significantly.
Change from:
To:
The reason lies in how the COM object works internally, which I can't fully explain myself. This StackOverflow question briefly reviews why not to use 2 dots when referencing COM objects. Second, this blog post identifies this specific change as improving performance from 14 minutes to 10s. I haven't validated myself, but I would expect some improvement. Hope that helps... even if it is over a year late.
Change from:
#loop through the worksheets in the current workbook
for ($i = 1; $i -le $wb.Sheets.count; $i++)To:
#loop through the worksheets in the current workbook
$sheetCount = $wb.Sheets.Count
for ($i = 1; $i -le $sheetCount; $i++)The reason lies in how the COM object works internally, which I can't fully explain myself. This StackOverflow question briefly reviews why not to use 2 dots when referencing COM objects. Second, this blog post identifies this specific change as improving performance from 14 minutes to 10s. I haven't validated myself, but I would expect some improvement. Hope that helps... even if it is over a year late.
Code Snippets
#loop through the worksheets in the current workbook
for ($i = 1; $i -le $wb.Sheets.count; $i++)#loop through the worksheets in the current workbook
$sheetCount = $wb.Sheets.Count
for ($i = 1; $i -le $sheetCount; $i++)Context
StackExchange Code Review Q#9209, answer score: 8
Revisions (0)
No revisions yet.