HiveBrain v1.2.0
Get Started
← Back to all entries
patternshellMinor

Delete lines from a CSV file that contain fields listed in a text file

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
filedeletecontainfieldscsvtextthatlistedfromlines

Problem

I wrote a powershell script to compare words from a text-file with a csv-column. If the word in the column matches, the line is deleted.

$reader = [System.IO.File]::OpenText($fc_file.Text)
try {
    for() {
        $line = $reader.ReadLine()
        if ($line -eq $null) { break }
        if ($line -eq "") { break }
        # process the line
        $fc_suchfeld = $fc_ComboBox.Text
        $tempstorage = $scriptPath + "\temp\temp.csv"
        Import-Csv $tempfile -Delimiter $delimeter -Encoding $char | where {$_.$fc_suchfeld -notmatch [regex]::Escape($line)} | Export-Csv $tempstorage -Delimiter $delimeter -Encoding $char  -notypeinfo
        Remove-Item $tempfile 
        Rename-Item $tempstorage $tempfile_ext           
    }
}
finally {
    $reader.Close()
}


My code works great, but it is very slow, due to saving and copying the csv file after every line. Is there a way to improve it?

Solution

[Side note: It's helpful to present complete, working code when asking questions, together with any input files. That makes it easier for us. We can use the same names for files and so on. It makes things easier and less confusing. In the following, I'm just going to show you the basic code with all the irrelevant details stripped out.]

There's no need to walk through the words file line-by-line. We can just load it once into an array at the beginning:

$words = Get-Content WordsFile.txt


And then we can look in that array when we are processing the CSV records. That's pretty simple:

Import-Csv CsvFileIn.txt | 
    where TheField -notin $words |
    Export-Csv CsvFileOut.txt


Notes:

  • where TheField -notin $words is short for where {$_.TheField -notin $words}. That syntax was introduced in PowerShell 3.0 I think.



  • The -notin operator is case-insensitive. If you want it to be case-sensitive, then use -cnotin instead. (That is the same with all string comparisons in PowerShell: they are all case-insensitive.)



  • I'm assuming the words file is not huge. If it's huge, then the -notin operator may be too slow because it searches the array record-by-record (it does a "linear" lookup, as we say in the business). If it were huge, it would be better to use a .NET HashSet instead of an array, but if not, there's no need for the extra complexity.

Code Snippets

$words = Get-Content WordsFile.txt
Import-Csv CsvFileIn.txt | 
    where TheField -notin $words |
    Export-Csv CsvFileOut.txt

Context

StackExchange Code Review Q#157839, answer score: 3

Revisions (0)

No revisions yet.