HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Find all differences between 2 strings

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
alldifferencesbetweenfindstrings

Problem

I pulled this out of my code bucket and dusted it off earlier today in response to a post over on SO that made me cringe. This was originally written to highlight changes in Excel cells in real time via Worksheet_Change, so it is designed with an eye toward raw speed to avoid blocking the UI. Benchmarks on current hardware are running around 3 seconds to compare 2 1kb strings.

I cleaned it up a bit to modernize the coding style, but I'm mainly looking for input on the algorithm used and suggestions for making the code a bit more understandable for somebody who isn't familiar with it.

It basically works on byte arrays and tracks the current working position in each array with a set of index pointers for the "start" and "end" of each substring that it's working with. The algorithm is similar to a binary search. The entry point function finds the longest matching substring in the two byte arrays, excludes it, logs all the differences to the output, and then recursively calls itself on the slices of the arrays to the right and left of the match:

```
'Returns a comma delimited string containing the positions of differences in the passed byte arrays. Recursive.
'Arrays are not modified, index parameters specify where the pointers are in the arrays on each subsequent call.
Private Function FindDifferences(ByRef first() As Byte, ByRef other() As Byte, Optional ByVal firstStartIndex As Long = -1, _
Optional ByVal firstEndIndex As Long, Optional ByVal otherStartIndex As Long, _
Optional ByVal otherEndIndex As Long) As String

If firstStartIndex = -1 Then
'Find matching substrings and set index markers.
SkipSubstringMatches first, other, firstStartIndex, firstEndIndex, otherStartIndex, otherEndIndex
If firstEndIndex = -1 And otherEndIndex > 0 Then
'All matches in first.
Exit Function
ElseIf otherEndIndex = -1 And firstEndIndex > 0 Then

Solution

Nothing much to complain about with this code, so here's a short list of opportunities that may or may not fall under the realm of personal preference:

Dual-branch conditional assignments like this, where there's only a single, non-side-effecting instruction in each branch:

If topFirst >= topOther Then
    lower = topOther
Else
    lower = topFirst
End If


Can be written as a one-liner assignment with the IIf function:

lower = IIf(topFirst >= topOther, topOther, topFirst)


Single-liner conditional blocks that basically act as guard clauses:

If firstIndex = firstEndIndex Then
    Exit Do
End If


Can be written with the conditional statement syntax:

If firstIndex = firstEndIndex Then Exit Do


Note that doing that removes a nesting level here:

If firstEndIndex = -1 Or otherEndIndex = -1 Then
    Exit Sub
Else
    Do Until first(topFirst) <> other(topOther)
        topFirst = topFirst - 1
        topOther = topOther - 1
        If topFirst < baseFirst Or topOther < baseFirst Then
            Exit Do
        End If
    Loop
    firstEndIndex = topFirst
    otherEndIndex = topOther
End If


Becomes:

If firstEndIndex = -1 Or otherEndIndex = -1 Then Exit Sub
Do Until first(topFirst) <> other(topOther)
    topFirst = topFirst - 1
    topOther = topOther - 1
    If topFirst < baseFirst Or topOther < baseFirst Then
        Exit Do
    End If
Loop
firstEndIndex = topFirst
otherEndIndex = topOther


The declarations-as-close-as-possible-to-first-use style in StringDiffs wrapper function is not consistent with the rest of the code:

Public Function StringDiffs(ByVal first As String, ByVal other As String) As String
    Dim firstChars() As Byte
    Dim otherChars() As Byte
    firstChars = StrConv(first, vbFromUnicode)
    otherChars = StrConv(other, vbFromUnicode)
    StringDiffs = FindDifferences(firstChars, otherChars)
End Function


Would have been:

Public Function StringDiffs(ByVal first As String, ByVal other As String) As String
    Dim firstChars() As Byte
    firstChars = StrConv(first, vbFromUnicode)

    Dim otherChars() As Byte
    otherChars = StrConv(other, vbFromUnicode)

    StringDiffs = FindDifferences(firstChars, otherChars)    
End Function


This bit looks a little packed, some vertical whitespace wouldn't hurt:

If matchLength <> 0 Then
    differences = FindDifferences(first, other, firstStartIndex, firstMatch - 1, otherStartIndex, otherMatch - 1)
    If Len(differences) <> 0 Then returnValue = returnValue & "," & differences
    differences = FindDifferences(first, other, firstMatch + matchLength, firstEndIndex, otherMatch + matchLength, otherEndIndex)
    If Len(differences) <> 0 Then returnValue = returnValue & "," & differences
Else


But otherwise the whole code looks great.

Code Snippets

If topFirst >= topOther Then
    lower = topOther
Else
    lower = topFirst
End If
lower = IIf(topFirst >= topOther, topOther, topFirst)
If firstIndex = firstEndIndex Then
    Exit Do
End If
If firstIndex = firstEndIndex Then Exit Do
If firstEndIndex = -1 Or otherEndIndex = -1 Then
    Exit Sub
Else
    Do Until first(topFirst) <> other(topOther)
        topFirst = topFirst - 1
        topOther = topOther - 1
        If topFirst < baseFirst Or topOther < baseFirst Then
            Exit Do
        End If
    Loop
    firstEndIndex = topFirst
    otherEndIndex = topOther
End If

Context

StackExchange Code Review Q#156670, answer score: 3

Revisions (0)

No revisions yet.