HiveBrain v1.2.0
Get Started
← Back to all entries
patternshellMinor

MD5 hash comparison for two folders

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
md5hashcomparisonfolderstwofor

Problem

Using this page for a starting point into comparing MD5 hash values from files in two different folders, I've put something together that outputs either Copied if both files MD5 matches or Check if the MD5 hashes do not match.

I want to find out if there's a better method for accomplishing this.

#Requires -version 4.0
Param
  (
    [parameter(Mandatory=$true,Position=1)][string]$source,
    [parameter(Mandatory=$true,Position=2)][string]$dest
  )
$source = gci $srcfolder -File -Recurse | select FullName,@{Label='Hash'; Expression={(Get-FileHash -Algorithm MD5 $psitem.fullname).hash}}
$dest = gci $destfolder -File -Recurse | select FullName,@{Label='Hash'; Expression={(Get-FileHash -Algorithm MD5 $psitem.fullname).hash}}
#compare $source $dest -Property Hash -PassThru
for ($i=0;$i -lt $dest.Count) {
IF($source.Hash[$i] -match $dest.Hash[$i]){
 "Copied $($source[$i].FullName)"}
Else{
 "Check $($source[$i].FullName)"}
$i++
}

Solution

Better is subjective but I will try and improve the code your have here to make it more robust and scalable. I will try to cover areas of improvement or focus and bring it together in the end. A lot of this is error hardening and trying to help remove potentials flaws in the assumptions the code was making.

Pitfalls with non validated paths passed

Hopefully not a nitpick but you are not doing any validation on your paths so it is possible for it to unexpectedly error out if a non-path based string is passed. Looking at about_Functions_Advanced_Parameters we see one option we can you is ValidateScript. The message returned for a failure now is arguably vague but it is a start. Consider one such parameter given the above suggestion.

[parameter(Mandatory=$true,Position=2)]
[ValidateScript({Test-Path -PathType Container $_ })]
[string]$Destination


Now if a bad path is passed the script will error before it starts processing. While we are not avoiding any bad issues really, in this case, this is a good habit to get into.


Cannot validate argument on parameter 'DifferenceFolder'. The "Test-Path -PathType Container $_ " validation script
for the argument with value "bagels" did not return a result of True.

Parameter names

You have inconsistent parameter names one for $source and one $dest. The latter one is a short form and the other is not. Also, these variables are common associated with copying. However your not doing that with this function. I am going to side with original author of your code snippet and use more meaningful names. $ReferenceFolder and $DifferenceFolder. Don't worry though. If you really wanted your variable names I would like to propose a compimise and use a parameter alias! This way you could use either and the script works just the same.

[alias("Destination")]
[string]$DifferenceFolder,


Also note worthy is that PowerShell will allow you to use short forms automatically as there is no ambiguity. So you can still use -Dest on the command line as there is only one thing that could mean.

Using Alias in production scripts

Many were consider this a faux-pas as it can obfuscate your code and make people (more ones not familiar with them) not understand what is happening. Testing/Code Golf? Sure use aliases as it can make the code more terse. Production? Use the full names to prevent confusion later if someone else or even you down the road are reading your code.

Using Select after Get-ChildItem

The code is creating a new object array that has the fullname of the file and its hash by use of a calculated property. Get-FileHash contrary to what your linked post says (and the documentation is seems) does accept pipeline input. Also the return properties of the cmdlets are path (which contains the fullpath), hash and algorithm. So there is not need to recreate what is already there. Removing the calculated properties saved some processing time in testing as well.

$referenceFolderHashes = Get-ChildItem $ReferenceFolder -File -Recurse | Get-FileHash -Algorithm $Algorithm | Select Path,Hash


Consider not using MD5

Like CodesInChaos states you should consider not using MD5. There is more likely chance that two different files could generate the same hash with MD5. So SHA256 would be an obvious candidate. Still it is only a chance. I have added a parameter called $Algorithm that will handle this. It is not mandatory and by default is "MD5" for your convenience. We use [ValidateSet] to ensure that you can only choose algorithms supported by Get-FileHash

[parameter(Mandatory=$false)]
[ValidateSet("SHA1", "SHA256", "SHA384", "SHA512", "MACTripleDES", "MD5", "RIPEMD160")] 
$Algorithm="MD5"


Your output

I though it odd that you would want to see all the files that copied. It should be easy enough to only see the failures and assume everything else was fine. The part that I was having an issue with was the logic. If the folder contents you are comparing are not in the thousands then Compare-Object should work perfectly fine. So using that we can compare the hash of all files and spit out the ones that do not match.

The part I was having a real issue was that your code was dependent on the directories having the same amount of files. If either directory was missing or had an extra file then your logic would fail since you were comparing element positions and not the contents. This was another reason I opted for Compare-Object

If you do want to see the ones that matches as well I created a [switch] that would help you toggle that off and on again. You can think of it like a boolean. Compare-Object has a parameter for -IncludeEqual we could pass that the value of our switch.

[parameter(Mandatory=$false)]
[switch]$ShowMatches

.....

Compare-Object $referenceFolderHashes $differenceFolderHashes -Property Hash -PassThru -IncludeEqual:$ShowMatches.IsPresent


To continue about the output

So now we have the results of

Code Snippets

[parameter(Mandatory=$true,Position=2)]
[ValidateScript({Test-Path -PathType Container $_ })]
[string]$Destination
[alias("Destination")]
[string]$DifferenceFolder,
$referenceFolderHashes = Get-ChildItem $ReferenceFolder -File -Recurse | Get-FileHash -Algorithm $Algorithm | Select Path,Hash
[parameter(Mandatory=$false)]
[ValidateSet("SHA1", "SHA256", "SHA384", "SHA512", "MACTripleDES", "MD5", "RIPEMD160")] 
$Algorithm="MD5"
[parameter(Mandatory=$false)]
[switch]$ShowMatches

.....

Compare-Object $referenceFolderHashes $differenceFolderHashes -Property Hash -PassThru -IncludeEqual:$ShowMatches.IsPresent

Context

StackExchange Code Review Q#87033, answer score: 7

Revisions (0)

No revisions yet.