patternshellMinor
Delete lines from a CSV file that contain fields listed in a text file
Viewed 0 times
filedeletecontainfieldscsvtextthatlistedfromlines
Problem
I wrote a powershell script to compare words from a text-file with a csv-column. If the word in the column matches, the line is deleted.
My code works great, but it is very slow, due to saving and copying the csv file after every line. Is there a way to improve it?
$reader = [System.IO.File]::OpenText($fc_file.Text)
try {
for() {
$line = $reader.ReadLine()
if ($line -eq $null) { break }
if ($line -eq "") { break }
# process the line
$fc_suchfeld = $fc_ComboBox.Text
$tempstorage = $scriptPath + "\temp\temp.csv"
Import-Csv $tempfile -Delimiter $delimeter -Encoding $char | where {$_.$fc_suchfeld -notmatch [regex]::Escape($line)} | Export-Csv $tempstorage -Delimiter $delimeter -Encoding $char -notypeinfo
Remove-Item $tempfile
Rename-Item $tempstorage $tempfile_ext
}
}
finally {
$reader.Close()
}My code works great, but it is very slow, due to saving and copying the csv file after every line. Is there a way to improve it?
Solution
[Side note: It's helpful to present complete, working code when asking questions, together with any input files. That makes it easier for us. We can use the same names for files and so on. It makes things easier and less confusing. In the following, I'm just going to show you the basic code with all the irrelevant details stripped out.]
There's no need to walk through the words file line-by-line. We can just load it once into an array at the beginning:
And then we can look in that array when we are processing the CSV records. That's pretty simple:
Notes:
There's no need to walk through the words file line-by-line. We can just load it once into an array at the beginning:
$words = Get-Content WordsFile.txtAnd then we can look in that array when we are processing the CSV records. That's pretty simple:
Import-Csv CsvFileIn.txt |
where TheField -notin $words |
Export-Csv CsvFileOut.txtNotes:
where TheField -notin $wordsis short forwhere {$_.TheField -notin $words}. That syntax was introduced in PowerShell 3.0 I think.
- The
-notinoperator is case-insensitive. If you want it to be case-sensitive, then use-cnotininstead. (That is the same with all string comparisons in PowerShell: they are all case-insensitive.)
- I'm assuming the words file is not huge. If it's huge, then the
-notinoperator may be too slow because it searches the array record-by-record (it does a "linear" lookup, as we say in the business). If it were huge, it would be better to use a .NETHashSetinstead of an array, but if not, there's no need for the extra complexity.
Code Snippets
$words = Get-Content WordsFile.txtImport-Csv CsvFileIn.txt |
where TheField -notin $words |
Export-Csv CsvFileOut.txtContext
StackExchange Code Review Q#157839, answer score: 3
Revisions (0)
No revisions yet.