HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavascriptMajor

Faster JavaScript fuzzy string matching function?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
javascriptfunctionfasterstringmatchingfuzzy

Problem

I'm using the following function to fuzzy match strings:

function fuzzy_match(str,pattern){
    pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
    return (new RegExp(pattern)).test(str);
};


Example:

fuzzy_match("fogo","foo") //true
fuzzy_match("jquery.js","jqjs") //true
fuzzy_match("jquery.js","jr") //false


It's very slow, though. How can I optimize that function?

Solution

function fuzzy_match(str,pattern){
    pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
    return (new RegExp(pattern)).test(str);
};


-
String concatenation is slow. The reason is that you need to allocate new memory anytime you concatenate two strings. A smart compiler may optimise that, but not too much. However, what you are trying to do with that reduce already exists, and it's called join:

pattern = pattern.split("").join(".*")


-
The regex itself can be optimised: . initially grabs as much as possible, then retracts only if the rest doesn't match. This takes a larger than neccessary amount of backtracking to find out. Instead, you could use a reluctant quantifier: .?. This attempts to match as little as possible. If a match is found, it is found without backtracking. However, it still does a lot of backtracking if there is no match. We can do better: a[^b]b (at this point, it doesn't really matter if the quantifier is greedy). A possessive quantifier (a[^b]+b) would be even better, but javascript doesn't support these. Using a character class deprives us of a join, but see the next point.

pattern = pattern.split("").reduce(function(a,b){ return a+'[^'+b+']*'+b; });


-
Since you are complaining about an operation that takes about 3000 ns (as noted in the comments), it can be assumed you are doing a lot of queries. If there are very few patterns, you can cache your regexes. Underscore.js has a handy function that I'll demonstrate, but you could easily build the cache yourself.

var cache = _.memoize(function(pattern){
   return new RegExp(pattern.split("").reduce(function(a,b){
     return a+'[^'+b+']*'+b;
   })
 })
 function fuzzy_match(str,pattern){
   return cache(pattern).test(str)
 };


-
If there is a lot of repetition among the tested strings, you should use these as a pattern instead, and the pattern as the tested string. This is even easier. I will also demonstrate how to scope your variables using the export pattern. Also, a bugfix must be inserted here (we can't assume all characters in the string are alphanumeric), to properly escape non-alphanumeric characters:

var fuzzy_match = (function(){
   var cache = _.memoize(function(str){
     return new RegExp("^"+str.replace(/./g, function(x){
       return /[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/.test(x) ? "\\"+x+"?" : x+"?";
     })+"$");
   });
   return function(str, pattern){
     return cache(str).test(pattern)
   });
 })();


Concerning the last regex:

Given some pattern "ace", the regex you build (/a.c.e/) tests if the string contains the characters of the pattern in the string, in the correct order.

If you want to test if a given string "abcde" is matched some pattern: The pattern must only contain the characters of the string, in the correct order: /^a?b?c?d?e?$/. To do this, we regex-escape every regex special character (pattern source: CoolAj86), make every character optional (?), and flank the regex with string boundary anchors.

Code Snippets

function fuzzy_match(str,pattern){
    pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
    return (new RegExp(pattern)).test(str);
};
pattern = pattern.split("").join(".*")
pattern = pattern.split("").reduce(function(a,b){ return a+'[^'+b+']*'+b; });
var cache = _.memoize(function(pattern){
   return new RegExp(pattern.split("").reduce(function(a,b){
     return a+'[^'+b+']*'+b;
   })
 })
 function fuzzy_match(str,pattern){
   return cache(pattern).test(str)
 };
var fuzzy_match = (function(){
   var cache = _.memoize(function(str){
     return new RegExp("^"+str.replace(/./g, function(x){
       return /[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/.test(x) ? "\\"+x+"?" : x+"?";
     })+"$");
   });
   return function(str, pattern){
     return cache(str).test(pattern)
   });
 })();

Context

StackExchange Code Review Q#23899, answer score: 39

Revisions (0)

No revisions yet.