patternMinor
Searching string for character not in array
Viewed 0 times
searchingarraycharacterfornotstring
Problem
I have a string which I want to check for characters other than those in this list {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"} however this check gets run several thousand times a second.
I am therefore looking for the most efficient way of performing this check, what I have tried so far:
This is equivalent to the code below which I have also tried
I am looking for optimal performance possibly at the cost of readability as I am willing to write pages on how the code works and what it does if only I can speed up the execution of my app!
I am therefore looking for the most efficient way of performing this check, what I have tried so far:
If Not System.Text.RegularExpressions.Regex.IsMatch(Input, "[^0-9\.i]") ThenThis is equivalent to the code below which I have also tried
Imports System
Imports System.Linq
Public Module Module1
Public Sub Main()
Console.WriteLine(IsValidString())
End Sub
Private Function IsValidString() As Integer
'Dim Input As String = "Hello World" 'This is Invalid
'Dim Input As String = "13.4i+4" 'This is Invalid
Dim Input As String = "13.4i" 'This is valid
If Function()
Dim IsValid As Boolean = True
For Each Character In Input
If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(Character) Then
IsValid = False
End If
Next
Return IsValid
End Function() Then
Return -1
Else
'Do some other stuff
Return 1 '
End If
End Function
End ModuleI am looking for optimal performance possibly at the cost of readability as I am willing to write pages on how the code works and what it does if only I can speed up the execution of my app!
Solution
First we'll clean this up a bit, then we'll look at performance.
There is absolutely zero reason to be using an anonymous function. This is useful functionality that you will eventually want to call from somewhere else. As such, it deserves to be extracted into a proper method of it's own.
Next, let's forget entirely about the other
We could go one step farther and Linq-ifiy it a bit into some very readable code.
But does it perform? Well, actually, yes it does. I wrote some benchmarking code and it turns out that the Linq version out performs yours by quite a bit. I was kind of surprised by this, but here are the results.
As you can see, the Linq version nearly always outperforms your method on mostly invalid input. However, on mostly valid input, it performs roughly the same; it's sometimes better, sometimes worse. I suspect this is because the Linq version bails out of the check earlier on invalid input.
Here's the final version of the benchmarking code. It exceed's .NetFiddle's memory limits, so you'll need to paste it into a C# console program to see the results of the higher sample sizes.
If you're either uncomfortable with Linq, or really looking to just push the performance to the absolute best it can be, then we can just modify your original algorithm to return early when we know that the answer is false.
This outperforms the both version above in every case, mostly valid or mostly invalid and it does it consistently each time I run the updated benchmarks.
BUT........
Correctness
Is
There is absolutely zero reason to be using an anonymous function. This is useful functionality that you will eventually want to call from somewhere else. As such, it deserves to be extracted into a proper method of it's own.
Private Function IsValidString() As Integer
'Dim Input As String = "Hello World" 'This is Invalid
'Dim Input As String = "13.4i+4" 'This is Invalid
Dim Input As String = "13.4i" 'This is valid
If IsValidString(Input) Then
Return -1
Else
'Do some other stuff
Return 1 '
End If
End Function
Function IsValidString(Input As String) As Boolean
Dim IsValid As Boolean = True
For Each Character In Input
If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(Character) Then
IsValid = False
End If
Next
Return IsValid
End FunctionNext, let's forget entirely about the other
IsValidString() method for a second and focus on the code we want to focus on. Watch your casing. It's terribly confusing to make local variables PascalCased just like your classes & methods. Local variables should be camelCased.Public Sub Main()
Console.WriteLine(IsValidString("Hello World"))
Console.WriteLine(IsValidString("13.4i+4"))
Console.WriteLine(IsValidString("13.4i"))
End Sub
Function IsValidString(input As String) As Boolean
Dim isValid As Boolean = True
For Each character In input
If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(character) Then
isValid = False
End If
Next
Return isValid
End FunctionWe could go one step farther and Linq-ifiy it a bit into some very readable code.
Function IsValidString(input As String) As Boolean
Dim validChars = {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}
Return input.All(Function(c) validChars.Contains(c))
End FunctionBut does it perform? Well, actually, yes it does. I wrote some benchmarking code and it turns out that the Linq version out performs yours by quite a bit. I was kind of surprised by this, but here are the results.
Sample Size: 100
Original; Mostly Invalid: 10ms
Linq; MostlyInvalid: 0ms
Original; Mostly Valid: 0ms
Linq; Mostly Valid: 0ms
Sample Size: 1000
Original; Mostly Invalid: 2ms
Linq; MostlyInvalid: 1ms
Original; Mostly Valid: 2ms
Linq; Mostly Valid: 1ms
Sample Size: 10000
Original; Mostly Invalid: 25ms
Linq; MostlyInvalid: 4ms
Original; Mostly Valid: 25ms
Linq; Mostly Valid: 23ms
Sample Size: 100000
Original; Mostly Invalid: 235ms
Linq; MostlyInvalid: 41ms
Original; Mostly Valid: 231ms
Linq; Mostly Valid: 255ms
Sample Size: 1000000
Original; Mostly Invalid: 2408ms
Linq; MostlyInvalid: 587ms
Original; Mostly Valid: 2347ms
Linq; Mostly Valid: 2083ms
Press any key to continue . . .As you can see, the Linq version nearly always outperforms your method on mostly invalid input. However, on mostly valid input, it performs roughly the same; it's sometimes better, sometimes worse. I suspect this is because the Linq version bails out of the check earlier on invalid input.
Here's the final version of the benchmarking code. It exceed's .NetFiddle's memory limits, so you'll need to paste it into a C# console program to see the results of the higher sample sizes.
If you're either uncomfortable with Linq, or really looking to just push the performance to the absolute best it can be, then we can just modify your original algorithm to return early when we know that the answer is false.
Function IsValidString(input As String) As Boolean
For Each character In input
If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(character) Then
Return False
End If
Next
Return True
End FunctionThis outperforms the both version above in every case, mostly valid or mostly invalid and it does it consistently each time I run the updated benchmarks.
Sample Size: 1000
Original; Mostly Invalid: 2ms
Linq; MostlyInvalid: 0ms
Improved; Mostly Invalid: 0ms
Original; Mostly Valid: 2ms
Linq; Mostly Valid: 2ms
Improved; Mostly Valid: 1ms
Sample Size: 10000
Original; Mostly Invalid: 25ms
Linq; MostlyInvalid: 4ms
Improved; Mostly Invalid: 3ms
Original; Mostly Valid: 22ms
Linq; Mostly Valid: 20ms
Improved; Mostly Valid: 17ms
Sample Size: 100000
Original; Mostly Invalid: 242ms
Linq; MostlyInvalid: 43ms
Improved; Mostly Invalid: 33ms
Original; Mostly Valid: 235ms
Linq; Mostly Valid: 238ms
Improved; Mostly Valid: 186ms
Sample Size: 1000000
Original; Mostly Invalid: 2479ms
Linq; MostlyInvalid: 609ms
Improved; Mostly Invalid: 336ms
Original; Mostly Valid: 2358ms
Linq; Mostly Valid: 2059ms
Improved; Mostly Valid: 1704msBUT........
Correctness
Is
0323.233.12.i a valid string? Both your and my method says it is. Your regex also suffers from the same problem.Code Snippets
Private Function IsValidString() As Integer
'Dim Input As String = "Hello World" 'This is Invalid
'Dim Input As String = "13.4i+4" 'This is Invalid
Dim Input As String = "13.4i" 'This is valid
If IsValidString(Input) Then
Return -1
Else
'Do some other stuff
Return 1 '
End If
End Function
Function IsValidString(Input As String) As Boolean
Dim IsValid As Boolean = True
For Each Character In Input
If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(Character) Then
IsValid = False
End If
Next
Return IsValid
End FunctionPublic Sub Main()
Console.WriteLine(IsValidString("Hello World"))
Console.WriteLine(IsValidString("13.4i+4"))
Console.WriteLine(IsValidString("13.4i"))
End Sub
Function IsValidString(input As String) As Boolean
Dim isValid As Boolean = True
For Each character In input
If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(character) Then
isValid = False
End If
Next
Return isValid
End FunctionFunction IsValidString(input As String) As Boolean
Dim validChars = {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}
Return input.All(Function(c) validChars.Contains(c))
End FunctionSample Size: 100
Original; Mostly Invalid: 10ms
Linq; MostlyInvalid: 0ms
Original; Mostly Valid: 0ms
Linq; Mostly Valid: 0ms
Sample Size: 1000
Original; Mostly Invalid: 2ms
Linq; MostlyInvalid: 1ms
Original; Mostly Valid: 2ms
Linq; Mostly Valid: 1ms
Sample Size: 10000
Original; Mostly Invalid: 25ms
Linq; MostlyInvalid: 4ms
Original; Mostly Valid: 25ms
Linq; Mostly Valid: 23ms
Sample Size: 100000
Original; Mostly Invalid: 235ms
Linq; MostlyInvalid: 41ms
Original; Mostly Valid: 231ms
Linq; Mostly Valid: 255ms
Sample Size: 1000000
Original; Mostly Invalid: 2408ms
Linq; MostlyInvalid: 587ms
Original; Mostly Valid: 2347ms
Linq; Mostly Valid: 2083ms
Press any key to continue . . .Function IsValidString(input As String) As Boolean
For Each character In input
If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(character) Then
Return False
End If
Next
Return True
End FunctionContext
StackExchange Code Review Q#105743, answer score: 6
Revisions (0)
No revisions yet.