HiveBrain v1.2.0
Get Started
← Back to all entries
patternshellMinor

Import Excel, unpivot data, output tab-delimited for SQL Import

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
exceldelimitedsqltabunpivotoutputfordataimport

Problem

I have a client GL file (Excel) that needs to be regularly imported to SQL Server (as tab-delimited txt).

Sample (simplified):

PayCode PayDesc Dept1 Dept2 Dept3
========================================
REG Regular 00001 00001 00002
VAC Vacation 00011 00012 00013
HOL Holiday 00021 00022 00022


However, the number of columns (departments) may change over time, and SQL Server doesn't support importing files with an uncertain column count. So rather than recode the SQL import each time a change occurs, I decided to "unpivot" the data prior to import, so the resulting table structure will always be static:

PayCode PayDesc DeptCode GLCode
====================================
REG Regular Dept1 00001
REG Regular Dept2 00001
REG Regular Dept3 00002
VAC Vacation Dept1 00011
VAC Vacation Dept2 00012
VAC Vacation Dept3 00013
HOL Holiday Dept1 00021
HOL Holiday Dept2 00022
HOL Holiday Dept3 00022


I've got the script functioning, but being a PS lightweight, I thought I would solicit for enhancement ideas. I feel like I'm missing good pipeline opportunities due to inexperience.

Code updated using feedback from briantist

$srcDir = "C:"
$srcTab = "Sheet1"
$srcFile = Join-Path -Path $srcDir -ChildPath "ClientGL.xlsx"
$outFile = Join-Path -Path $srcDir -ChildPath "ClientGL.txt"
$delim = "
t"

## Export from Excel to tab-delimited .txt
Write-Host "Launch Excel"
$Excel = New-Object -comobject Excel.Application
$Excel.Visible = $False
$Excel.displayalerts = $False
Write-Host "Open $srcFile"
$Workbook = $Excel.Workbooks.Open($srcFile)
$Workbook.Worksheets.Item($srcTab).Select()
Write-Host "Export $outFile as Tab-Delimited TXT"
$Workbook.SaveAs("$outFile",-4158)
Write-Host "Quit Excel"
$Excel.Quit()
If(Get-Process -name excel){Stop-Process -name excel}

## Import data and headers
Write-Host "Import $outFile data"
$inData = Import-

Solution

Path Joining

$srcDir = "C:"
$srcFile = "$SrcDir\ClientGL.xlsx"
$srcTab = "Sheet1"
$outFile = "$SrcDir\ClientGL.txt"


Although it may seem unnecessary in this case, you should know that Join-Path exists.

$srcDir = "C:"
$srcFile = $SrcDir | Join-Path -ChildPath "ClientGL.xlsx"
$srcTab = "Sheet1"
$outFile = Join-Path $SrcDir "ClientGL.txt"


The nice thing about using this (again, unnecessary here) is that it takes care of leading and trailing path separators, so the following:

Join-Path "C:\" "\Windows"
Join-Path "C:" "Windows"


Will both produce C:\Windows. That makes it great for user input.

Write-Host vs Write-Verbose

This can be debated, but some (myself included) take a cautious approach with Write-Host. The reason is that it forces the output to the screen, instead of letting the caller decide whether they want status messages.

Ultimately, you take this on a case-by-case, and sometimes Write-Host is the right tool, and maybe it is for you in this case.

Using Write-Verbose instead, you would normally not see the output, unless you call your script with -Verbose, and then you will see it.

To add support for this, you need to add a parameter block to your script, even if you aren't going to take any parameters.

Top of script:

[CmdletBinding()]
param()


That will automatically add support for -Verbose.

Aliases

If(ps excel){kill -name excel}


and

(Get-Content $outFile) | % {$_ -replace '"', ""} | Out-File -FilePath $outFile -Force


ps is an alias for Get-Process.
kill is an alias for Stop-Process.
% is an alias for ForEach-Object.

I love PowerShell's aliases, but when it comes to writing a script that's meant to be re-used often or might ever be read by other people, I am of the school that one should use the full cmdlet/function name, and use full parameter names.

I use aliases when I'm using PowerShell as a shell and typing commands to run immediately, or to quickly prototype something.

For me it's a matter of clarify and readability, but this is of course personal preference (I have met at least one person who feels differently and uses aliases everywhere).

Reading the first line of a file

$inHeaders = (Get-Content $outFile | Select-Object -First 1).Split("t")


If you're using PowerShell 2.0, the above line will read every line of the file, and even though
Select-Object will only give you back the first 1, it has no way of stopping the pipeline. If your file is large, this could take a very long time.

If you're using PowerShell 3+, then the above will actually only read the first line because
Select-Object will stop the pipeline.

Note that
Get-Content has a -TotalCount parameter though, so you can use:

$inHeaders = (Get-Content $outFile -TotalCount 1).Split("t")


and it will read just the first line.

.Split() vs -Split

$inHeaders = (Get-Content $outFile | Select-Object -First 1).Split("t")


Here you're using the
.Split() method of a [String]. PowerShell also has a -Split operator. The operator has a lot of nifty features that from what I've seen no one ever uses, because they don't know they exist or (in my case) they always forget about them when needed.

But the main difference that you probably will remember is that the operator takes a regular expression, and the method takes a string.

There's nothing wrong with either, just wanted you to be aware of the differences.

Accepting Parameters

$srcDir = "C:"
$srcFile = "$SrcDir\ClientGL.xlsx"
$srcTab = "Sheet1"
$outFile = "$SrcDir\ClientGL.txt"


Consider making these parameters that can be passed to the script. You could also make them have defaults that way.

Top of script:

[CmdletBinding()
param(
[Parameter(
Mandatory=$false
)]
[String]
$Path = "C:" ,

[Parameter(
Mandatory=$false
)]
[String]
$srcTab = "Sheet1"
)

$srcFile = "$Path\ClientGL.xlsx"
$outFile = "$Path\ClientGL.txt"


Now you can call the script like so:

.\MyScript.ps1 -Path "C:\Some\Other\Path\" -srcTab "Expenses"


Now you can see why using
Join-Path would save you some headache because you're accepting user input!

Just to break down this
param() block a bit:

[Parameter()] can take multiple comma separated options; I've shown you Mandatory. Mandatory means mandatory to be supplied by the caller not to have a value, so when you use a default value like we are here, you don't want to make the parameter mandatory. Parameters are by default optional, so this was unnecessary; I was just demonstrating it.

The entire
[Parameter()] attribute is also optional, and could be left out.

[String] is specifying the data type for the parameter. PowerShell will try to coerce input values into this type if possible. Data type is also optional.

$Path: this is the name of the parameter. It is the only part that's mandatory. I used $Path instead of $srcDir` because PowerShell has conventions for certain names to represent certain

Context

StackExchange Code Review Q#92674, answer score: 3

Revisions (0)

No revisions yet.