HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Searching string for character not in array

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
searchingarraycharacterfornotstring

Problem

I have a string which I want to check for characters other than those in this list {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"} however this check gets run several thousand times a second.

I am therefore looking for the most efficient way of performing this check, what I have tried so far:

If Not System.Text.RegularExpressions.Regex.IsMatch(Input, "[^0-9\.i]") Then


This is equivalent to the code below which I have also tried

Imports System
Imports System.Linq

Public Module Module1

    Public Sub Main()
        Console.WriteLine(IsValidString())
    End Sub

    Private Function IsValidString() As Integer
        'Dim Input As String = "Hello World" 'This is Invalid
        'Dim Input As String = "13.4i+4" 'This is Invalid
        Dim Input As String = "13.4i" 'This is valid
        If Function()
            Dim IsValid As Boolean = True
            For Each Character In Input
                If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(Character) Then
                    IsValid = False
                End If
            Next

            Return IsValid
        End Function() Then
            Return -1
        Else
            'Do some other stuff
            Return 1 '
        End If
    End Function
End Module


I am looking for optimal performance possibly at the cost of readability as I am willing to write pages on how the code works and what it does if only I can speed up the execution of my app!

Solution

First we'll clean this up a bit, then we'll look at performance.

There is absolutely zero reason to be using an anonymous function. This is useful functionality that you will eventually want to call from somewhere else. As such, it deserves to be extracted into a proper method of it's own.

Private Function IsValidString() As Integer
    'Dim Input As String = "Hello World" 'This is Invalid
    'Dim Input As String = "13.4i+4" 'This is Invalid
    Dim Input As String = "13.4i" 'This is valid
    If IsValidString(Input) Then
        Return -1
    Else
        'Do some other stuff
        Return 1 '
    End If
End Function

Function IsValidString(Input As String) As Boolean
    Dim IsValid As Boolean = True
    For Each Character In Input
        If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(Character) Then
            IsValid = False
        End If
    Next

    Return IsValid
End Function


Next, let's forget entirely about the other IsValidString() method for a second and focus on the code we want to focus on. Watch your casing. It's terribly confusing to make local variables PascalCased just like your classes & methods. Local variables should be camelCased.

Public Sub Main()
    Console.WriteLine(IsValidString("Hello World"))
    Console.WriteLine(IsValidString("13.4i+4"))
    Console.WriteLine(IsValidString("13.4i"))
End Sub

Function IsValidString(input As String) As Boolean
    Dim isValid As Boolean = True
    For Each character In input
        If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(character) Then
            isValid = False
        End If
    Next

    Return isValid
End Function


We could go one step farther and Linq-ifiy it a bit into some very readable code.

Function IsValidString(input As String) As Boolean
    
    Dim validChars = {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}

    Return input.All(Function(c) validChars.Contains(c))
    
End Function


But does it perform? Well, actually, yes it does. I wrote some benchmarking code and it turns out that the Linq version out performs yours by quite a bit. I was kind of surprised by this, but here are the results.

Sample Size: 100
Original; Mostly Invalid: 10ms
Linq; MostlyInvalid: 0ms
Original; Mostly Valid: 0ms
Linq; Mostly Valid: 0ms

Sample Size: 1000
Original; Mostly Invalid: 2ms
Linq; MostlyInvalid: 1ms
Original; Mostly Valid: 2ms
Linq; Mostly Valid: 1ms

Sample Size: 10000
Original; Mostly Invalid: 25ms
Linq; MostlyInvalid: 4ms
Original; Mostly Valid: 25ms
Linq; Mostly Valid: 23ms

Sample Size: 100000
Original; Mostly Invalid: 235ms
Linq; MostlyInvalid: 41ms
Original; Mostly Valid: 231ms
Linq; Mostly Valid: 255ms

Sample Size: 1000000
Original; Mostly Invalid: 2408ms
Linq; MostlyInvalid: 587ms
Original; Mostly Valid: 2347ms
Linq; Mostly Valid: 2083ms

Press any key to continue . . .


As you can see, the Linq version nearly always outperforms your method on mostly invalid input. However, on mostly valid input, it performs roughly the same; it's sometimes better, sometimes worse. I suspect this is because the Linq version bails out of the check earlier on invalid input.

Here's the final version of the benchmarking code. It exceed's .NetFiddle's memory limits, so you'll need to paste it into a C# console program to see the results of the higher sample sizes.

If you're either uncomfortable with Linq, or really looking to just push the performance to the absolute best it can be, then we can just modify your original algorithm to return early when we know that the answer is false.

Function IsValidString(input As String) As Boolean
    For Each character In input
        If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(character) Then
            Return False
        End If
    Next

    Return True
End Function


This outperforms the both version above in every case, mostly valid or mostly invalid and it does it consistently each time I run the updated benchmarks.

Sample Size: 1000
Original; Mostly Invalid: 2ms
Linq; MostlyInvalid: 0ms
Improved; Mostly Invalid: 0ms
Original; Mostly Valid: 2ms
Linq; Mostly Valid: 2ms
Improved; Mostly Valid: 1ms

Sample Size: 10000
Original; Mostly Invalid: 25ms
Linq; MostlyInvalid: 4ms
Improved; Mostly Invalid: 3ms
Original; Mostly Valid: 22ms
Linq; Mostly Valid: 20ms
Improved; Mostly Valid: 17ms

Sample Size: 100000
Original; Mostly Invalid: 242ms
Linq; MostlyInvalid: 43ms
Improved; Mostly Invalid: 33ms
Original; Mostly Valid: 235ms
Linq; Mostly Valid: 238ms
Improved; Mostly Valid: 186ms

Sample Size: 1000000
Original; Mostly Invalid: 2479ms
Linq; MostlyInvalid: 609ms
Improved; Mostly Invalid: 336ms
Original; Mostly Valid: 2358ms
Linq; Mostly Valid: 2059ms
Improved; Mostly Valid: 1704ms


BUT........
Correctness

Is 0323.233.12.i a valid string? Both your and my method says it is. Your regex also suffers from the same problem.

Code Snippets

Private Function IsValidString() As Integer
    'Dim Input As String = "Hello World" 'This is Invalid
    'Dim Input As String = "13.4i+4" 'This is Invalid
    Dim Input As String = "13.4i" 'This is valid
    If IsValidString(Input) Then
        Return -1
    Else
        'Do some other stuff
        Return 1 '
    End If
End Function

Function IsValidString(Input As String) As Boolean
    Dim IsValid As Boolean = True
    For Each Character In Input
        If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(Character) Then
            IsValid = False
        End If
    Next

    Return IsValid
End Function
Public Sub Main()
    Console.WriteLine(IsValidString("Hello World"))
    Console.WriteLine(IsValidString("13.4i+4"))
    Console.WriteLine(IsValidString("13.4i"))
End Sub

Function IsValidString(input As String) As Boolean
    Dim isValid As Boolean = True
    For Each character In input
        If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(character) Then
            isValid = False
        End If
    Next

    Return isValid
End Function
Function IsValidString(input As String) As Boolean
    
    Dim validChars = {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}

    Return input.All(Function(c) validChars.Contains(c))
    
End Function
Sample Size: 100
Original; Mostly Invalid: 10ms
Linq; MostlyInvalid: 0ms
Original; Mostly Valid: 0ms
Linq; Mostly Valid: 0ms

Sample Size: 1000
Original; Mostly Invalid: 2ms
Linq; MostlyInvalid: 1ms
Original; Mostly Valid: 2ms
Linq; Mostly Valid: 1ms

Sample Size: 10000
Original; Mostly Invalid: 25ms
Linq; MostlyInvalid: 4ms
Original; Mostly Valid: 25ms
Linq; Mostly Valid: 23ms

Sample Size: 100000
Original; Mostly Invalid: 235ms
Linq; MostlyInvalid: 41ms
Original; Mostly Valid: 231ms
Linq; Mostly Valid: 255ms

Sample Size: 1000000
Original; Mostly Invalid: 2408ms
Linq; MostlyInvalid: 587ms
Original; Mostly Valid: 2347ms
Linq; Mostly Valid: 2083ms

Press any key to continue . . .
Function IsValidString(input As String) As Boolean
    For Each character In input
        If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(character) Then
            Return False
        End If
    Next

    Return True
End Function

Context

StackExchange Code Review Q#105743, answer score: 6

Revisions (0)

No revisions yet.